วรดร โภคินอนันต์
About Me
I am a third-year PhD student in Neuroscience at ENS-PSL, Paris.
I love perception and want to solve sensory loss. Audition is my main domain, followed by vision.
My background is multidisciplinary, including computer science, experimental psychology, machine learning, and engineering. I am now working in neuroscience and neuroengineering.
I am discovering modeling techniques in both AI and neuroscience through statistical physics.
Research Interests
My primary domain of audition, with spatial hearing in both machines and humans as my strongest focus. Of course, my interests are not limited to this topic—I am also deeply interested in other aspects of audition, as well as vision, blindness, and other senses.
I view my work as broadly divided into two areas: one from a scientific motivation and the other from an engineering aspect, both essential to achieving my goal of improving sensory aid technology. The final, third area describes my future research plan in statistical physics modeling.
1. Scientific Perspective
Understanding Core Mechanisms: What are the fascinating mechanisms behind sound localization capabilities in both humans and animals? I am particularly interested in the detailed processes across all parts of the auditory system, including how they integrate with other cognitive functions such as attention and memory. This question was arguably my main driving force in pursuing neuroscience. Currently, I am working on the auditory cortex—an area filled with countless mysteries. However, I plan to explore other regions of the brain in the future as well.
Understanding Core Deficits: Another critical question is: what causes localization deficits in specific parts of the system? By understanding these underlying mechanisms, we may be able to develop solutions for individuals with hearing impairments. I have yet to explore this in depth, but it will be a key focus of my future research.
2. Engineering Perspective
Biomimicry Technology: Audio processing in unseen noise and reverberation remains a major challenge—from traditional signal processing to AI-driven applications. Yet, humans with spatial and binaural hearing effortlessly navigate cocktail party scenarios. How can we replicate this sharp localization and segregation ability? This question drives me to develop auditory-inspired neural networks, which could significantly improve these applications and greatly enhance current hearing aid technology.
Advanced Neural Prosthetics: Another key motivation is developing neural engineering technology that not only compensates for impairments but also directly manipulates signals in the brain—beyond cochlear implants. Future generations of hearing aids should enable individuals to focus and localize sound naturally. This will be my core research direction.
3. Bridging Neuroscience, AI, and Statistical Physics
I have always loved physics—it was my best subject in high school—but I never had the opportunity to be directly involved in the field. When I entered neuroscience, I discovered a new perspective where statistical physics serves not only as a powerful modeling tool but also as a theoretical framework for understanding neural systems. Beyond neuroscience, it offers insights into the fundamental principles of intelligence in AI. As a result, I have actively pursued learning and engagement in this field. While my current neuroscience research is not focused on modeling, and my work in AI is not centered on theoretical machine learning, I plan to shift in that direction in the future.
Furthermore, neuroscience is increasingly embracing modeling techniques that aim to align AI with brain activity and behavior. These approaches may not only provide deeper insights into cognitive processes but also contribute to the development of AI systems that more closely resemble biological intelligence. Investigating how AI models learn—particularly in relation to neural dynamics and computational principles derived from the brain—is likely to be a key focus of my future research.
In the near future, I am motivated to explore the possibility of modeling how interaural phase differences (IPD) and interaural intensity differences (ILD) are integrated in the auditory cortex using statistical physics and Neuro-AI techniques.
This bio-inspired AI architecture aims to leverage tonotopy—the topographic organization of frequencies from the cochlea, where different regions respond to specific frequencies (akin to a biological Fourier transform), to the auditory cortex, where frequency-selective neurons preserve this mapping—to tackle long-standing challenges in signal processing, machine hearing, and robotic audition, particularly those related to noise and reverberation.
What are the benefits of tonotopy in the auditory cortex? Imagine a neural map where each region is tuned to a specific range of frequencies—this is the essence of tonotopy. It serves as a fundamental organizational principle that enables efficient auditory processing and supports higher-order functions such as attention, object recognition, and speech perception. Ultimately, it enhances our ability to extract meaningful sounds from interference more effectively.
Now, imagine a neural network that embeds tonotopic organization as a core principle. Such an approach could significantly enhance sound processing in noisy environments and reverberant spaces, much like human hearing. Just as convolutional neural networks (CNNs) draw inspiration from the topographic organization of vision (known as retinotopy) and share structural similarities, such as localized receptive fields and spatial hierarchies, could a similar approach in auditory AI—integrating aspects of tonotopy—bring us closer to human-like sound perception?
Disclaimer: This model does not fully replicate or explain auditory mechanisms; it merely draws inspiration from them—like how airplanes take inspiration from wings, not fully replicating how birds fly.
Implementation: The current approach utilizes a Transformer-based model, a powerful tool for sequential processing. However, instead of encoding raw waveforms into vectors like wav2vec, this model processes spectrograms or cochleagrams, transforming (encoding) specific time-frequency ranges into structured representation blocks—akin to the topographic organization of neurons in the auditory cortex. Each representation operates with its own attention mechanism, exhibiting distinct frequency selectivity to perform tasks. Overall, this approach mirrors the mapping of time-frequency representations from the cochlea to encoded cortical representations.
Another key implementation is that there are no convolutional layers because convoluted spatial feature extraction cannot fully capture the relationships across the entire frequency map, where sound processing these representations might interact in complex ways. This is where the Vision Transformer has been used due to its patch embedding features.
The first implementation of this model was applied to binaural sound localization, as it provides an ideal test case for evaluating the benefits of frequency selectivity in sound localization (based on the Duplex Theory). Two conference papers presenting this work were accepted at INTERSPEECH 2023 and ICASSP 2024, demonstrating strong results compared to selected benchmark models in the field. The current project focuses on applying this model to speech enhancement, with plans to present the findings either this year or next.
I have changed the name of this model multiple times to reflect its development progress. After many discussions, I found that Cochlea-to-Cortex (coch2cor) captures the concept well, though it may be difficult for audiences unfamiliar with auditory science. I am also considering spec2rep, which might be more intuitive.
The first name I proposed, Frequency-based Audio Vision Transformer (FAViT), directly reflected the implementation—it is a Vision Transformer adapted for audio processing using spectral patch embeddings. The next name, Attention Modulation Vision Transformer (AMViT), incorporated the idea of top-down attention modulation at the representation level, aiming to model real auditory attention mechanisms in the auditory cortex.
We are developing an Autoencoder version of Cochlea-to-Cortex designed to segregate speech from noise. We are calling it TransCoder (Transformer Autoencoder) since it eliminates convolutional layers in the decoder. The development is progressing well, and it should be presented at a conference soon.
There are many more details I could elaborate on. The full thesis on this engineering work will be available here soon. If you have any further questions, please feel free to contact me at: waradon.phokhinanan@ens.psl.eu.
There are many aspects of Cochlea-to-Cortex that I would like to explore further. One major goal is to move beyond the Transformer model and develop an entirely new architecture while preserving the core concept. Additionally, I aim to use this model for Neuro-AI alignment, comparing its learning process with real brain function in specific contexts. I am actively looking for collaborations or a supervisor to support this research. If you are interested, please contact me!
My Research Posters