Speech processing using adaptive auditory receptive fields

Ashwin Bellur; Mounya Elhilali

Authors

Ashwin Bellur Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
Mounya Elhilali Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA http://orcid.org/0000-0003-2597-738X

Keywords:

receptive fields, auditory cortex, speech in noise

Abstract

The auditory system exhibits a remarkable ability to adapt to its listening environment, driven both by sensory-based cues and goal-directed processes. Here, we focus on the role of attentional feedback in facilitating processing of speech sounds in presence of nonstationary noises. We examine a theoretical formulation for retuning of cortical-like receptive fields to enable robust detection of speech sounds in presence of interference. The framework employs modulation-tuned filters aimed at emulating tuning characteristics of neurons at the level of auditory cortex. This bank of filters is then modulated based on goal-directed feedback to enhance separability between the feature representation of speech and nonspeech sounds. We hypothesize that this retuning procedure results in an emphasis of unique speech and nonspeech modulations in a high-dimensional space. We discuss the implications of this retuning on the fidelity of encoding speech sounds in presence of seen and novel noise conditions, and discuss implications of such plasticity in facilitating listening in challenging acoustic environments, hence opening the door to adaptive and intelligent audio technology that can emulate the biological system.

References

Akram, S., Presacco, A., Simon, J.Z., Shamma, S.A., and Babadi, B. (2016). “Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling,” Neuroimage, 124, 906-917. doi: 10.1016/j.neuroimage.2015.09.048

Atiani, S., David, S.V., Elgueda, D., Locastro, M., Radtke-Schuller, S., Shamma, S.A., et al. (2014). “Emergent selectivity for task-relevant stimuli in higher-order auditory cortex,” Neuron, 82, 486-499. doi: 10.1016/j.neuron.2014.02.029

Bellur, A., and Elhilali, M. (2017). “Feedback-driven sensory mapping adaptation for robust speech activity detection,” IEEE T. Audio Speech, 25, 481-492. doi: 10.1109/TASLP.2016.2639322

Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J.W. (2010). “A theory of learning from different domains,” Mach. Learn., 79, 151-175. doi: 10.1007/s10994-009-5152-4

Carlin, M.A., and Elhilali, M. (2015a). “A framework for speech activity detection using adaptive auditory receptive fields,” IEEE T. Audio Speech, 23, 2422-2433. doi: 10.1109/TASLP.2015.2481179

Carlin, M.A., and Elhilali, M. (2015b). “Modeling attention-driven plasticity in auditory cortical receptive fields,” Front. Comput. Neurosci., 9, 106. doi: 10.3389/fncom.2015.00106

Chi, T., Ru, P., and Shamma, S.A. (2005). “Multiresolution spectrotemporal analysis of complex sounds,” J. Acoust. Soc. Am., 118, 887-906.

David, S.V., Fritz, J.B., and Shamma, S.A. (2012). “Task reward structure shapes rapid receptive field plasticity in auditory cortex,” Proc. Natl. Acad. Sci. USA, 109, 2144-2149. doi: 10.1073/pnas.1117717109

Ding, N., and Simon, J.Z. (2012). “Emergence of neural encoding of auditory objects while listening to competing speakers,” Proc. Natl. Acad. Sci. USA, 109, 11854-11859. doi: 10.1073/pnas.1205381109

Duda, R.O., Hart, P.E., and Stork, D.G. (2000). Pattern Classification. Wiley.

Eggermont, J.J. (2001). “Between sound and perception: reviewing the search for a neural code,” Hear. Res., 157, 1-42.

Elhilali, M., Fritz, J.B., Klein, D.J., Simon, J.Z., and Shamma, S.A. (2004). “Dynamics of precise spike timing in primary auditory cortex,” J. Neurosci., 24, 1159-1172. doi: 10.1523/JNEUROSCI.3825-03.2004

Elhilali, M., Fritz, J.B., Chi, T.-S., and Shamma, S.A. (2007). “Auditory cortical receptive fields: Stable entities with plastic abilities,” J. Neurosci., 27, 10372-10382. doi: 10.1523/JNEUROSCI.1462-07.2007

Elhilali, M., Shamma, S.A., Simon, J.Z., and Fritz, J.B. (2013). “A linear systems view to the concept of STRF,” in Handbook of Modern Techniques in Auditory Cortex. Eds. D. Depireux and M. Elhilali (Nova Science Pub Inc), 33-60.

Elliott, T.M., and Theunissen, F.E. (2009). “The modulation transfer function for speech intelligibility,” PLoS Comput. Biol., 5, e1000302.

Engineer, C.T., Perez, C.A., Carraway, R.S., Chang, K.Q., Roland, J.L., and Kilgard, M.P. (2014). “Speech training alters tone frequency tuning in rat primary auditory cortex,” Behav. Brain Res., 258, 166-178. doi: 10.1016/j.bbr.2013.10.021

Ezzat, T., Bouvrie, J.V, and Poggio, T. (2007). “Spectro-temporal analysis of speech using 2-d Gabor filters,” Proc. Interspeech, 506-509.

Fritz, J., Shamma, S., Elhilali, M., and Klein, D. (2003). “Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex,” Nat. Neurosci., 6, 1216-1223. doi: 10.1038/nn1141

Fritz, J.B., Elhilali, M., and Shamma, S.A. (2005). “Rapid task-dependent plasticity in primary auditory cortex,” in Auditory Cortex-Towards a Synthesis of Human and Animal Research. Wds. P. Heil, R. Konig, E. Budinger, and H. Scheich (Mahwah, NJ: Lawrence Erlbaum Associates), 445-466.

Fuglsang, S.A., Dau, T., and Hjortkjær, J. (2017). ”Noise-robust cortical tracking of attended speech in real-world acoustic scenes,” Neuroimage, 156, 435-444. doi: 10.1016/j.neuroimage.2017.04.026

Gauvain, J.-L., and Lee, C.-H. (1994). “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE T. Speech Audio, 2, 291-298.

Leggetter, C.J., and Woodland, P.C. (1995). “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Comput. Speech Lang., 9, 171-185.

Lu, K., Xu, Y., Yin, P., Oxenham, A.J., Fritz, J.B., and Shamma, S.A. (2017). “Temporal coherence structure rapidly shapes neuronal interactions,” Nat. Commun., 8, 13900. doi: 10.1038/ncomms13900

Mesgarani, N., and Chang, E.F. (2012). “Selective cortical representation of attended speaker in multi-talker speech perception,” Nature, 485, 233-236. doi: 10.1038/nature11020

Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs. Springer Science & Business Media.

Nelken, I., and Bar-Yosef, O. (2008). “Neurons and objects: The case of auditory cortex,” Front. Neurosci., 2, 107-113. doi: 10.3389/neuro.01.009.2008

O’Sullivan, J.A., Power, A.J., Mesgarani, N., Rajaram, S., Foxe, J.J., Shinn-Cunningham, B.G., et al. (2014). “Attentional selection in a cocktail party environment can be decoded from single-trial EEG,” Cereb. Cortex., 1697-1706. doi: 10.1093/cercor/bht355

Puvvada, K.C., and Simon, J.Z. (2017). “Cortical representations of speech in a multitalker auditory scene,” J. Neurosci., 37, 9189-9196. doi: 10.1523/ JNEUROSCI.0938-17.2017

Seriès, P., Stocker, A.A., and Simoncelli, E.P. (2009). “Is the homunculus “aware” of sensory adaptation?” Neural Comput., 21, 3271-3304.

Siohan, O., Chesta, C., and Lee, C.-H. (2001). “Joint maximum a posteriori adaptation of transformation and HMM parameters,” IEEE T. Speech Audio, 9, 417-428. doi: 10.1109/89.917687

Theunissen, F.E., Sen, K., and Doupe, A.J. (2000). “Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds,” J. Neurosci., 20, 2315-2331.

Winkowski, D.E., Nagode, D.A., Donaldson, K.J., Yin, P., Shamma, S.A., Fritz, J.B., et al. (2017). “Orbitofrontal cortex neurons respond to sound and activate primary auditory cortex neurons,” Cereb. Cortex, 1-12. doi: 10.1093/cercor/bhw409

Wostmann, M., Herrmann, B., Maess, B., and Obleser, J. (2016). “Spatiotemporal dynamics of auditory attention synchronize with speech,” Proc. Natl. Acad. Sci. USA, 113, 3873-3878. doi: 10.1073/pnas.1523357113

Speech processing using adaptive auditory receptive fields

Authors

Keywords:

Abstract

References

Additional Files

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Language