Speech processing using adaptive auditory receptive fields
Keywords:
receptive fields, auditory cortex, speech in noiseAbstract
The auditory system exhibits a remarkable ability to adapt to its listening environment, driven both by sensory-based cues and goal-directed processes. Here, we focus on the role of attentional feedback in facilitating processing of speech sounds in presence of nonstationary noises. We examine a theoretical formulation for retuning of cortical-like receptive fields to enable robust detection of speech sounds in presence of interference. The framework employs modulation-tuned filters aimed at emulating tuning characteristics of neurons at the level of auditory cortex. This bank of filters is then modulated based on goal-directed feedback to enhance separability between the feature representation of speech and nonspeech sounds. We hypothesize that this retuning procedure results in an emphasis of unique speech and nonspeech modulations in a high-dimensional space. We discuss the implications of this retuning on the fidelity of encoding speech sounds in presence of seen and novel noise conditions, and discuss implications of such plasticity in facilitating listening in challenging acoustic environments, hence opening the door to adaptive and intelligent audio technology that can emulate the biological system.
References
Atiani, S., David, S.V., Elgueda, D., Locastro, M., Radtke-Schuller, S., Shamma, S.A., et al. (2014). “Emergent selectivity for task-relevant stimuli in higher-order auditory cortex,” Neuron, 82, 486-499. doi: 10.1016/j.neuron.2014.02.029
Bellur, A., and Elhilali, M. (2017). “Feedback-driven sensory mapping adaptation for robust speech activity detection,” IEEE T. Audio Speech, 25, 481-492. doi: 10.1109/TASLP.2016.2639322
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J.W. (2010). “A theory of learning from different domains,” Mach. Learn., 79, 151-175. doi: 10.1007/s10994-009-5152-4
Carlin, M.A., and Elhilali, M. (2015a). “A framework for speech activity detection using adaptive auditory receptive fields,” IEEE T. Audio Speech, 23, 2422-2433. doi: 10.1109/TASLP.2015.2481179
Carlin, M.A., and Elhilali, M. (2015b). “Modeling attention-driven plasticity in auditory cortical receptive fields,” Front. Comput. Neurosci., 9, 106. doi: 10.3389/fncom.2015.00106
Chi, T., Ru, P., and Shamma, S.A. (2005). “Multiresolution spectrotemporal analysis of complex sounds,” J. Acoust. Soc. Am., 118, 887-906.
David, S.V., Fritz, J.B., and Shamma, S.A. (2012). “Task reward structure shapes rapid receptive field plasticity in auditory cortex,” Proc. Natl. Acad. Sci. USA, 109, 2144-2149. doi: 10.1073/pnas.1117717109
Ding, N., and Simon, J.Z. (2012). “Emergence of neural encoding of auditory objects while listening to competing speakers,” Proc. Natl. Acad. Sci. USA, 109, 11854-11859. doi: 10.1073/pnas.1205381109
Duda, R.O., Hart, P.E., and Stork, D.G. (2000). Pattern Classification. Wiley.
Eggermont, J.J. (2001). “Between sound and perception: reviewing the search for a neural code,” Hear. Res., 157, 1-42.
Elhilali, M., Fritz, J.B., Klein, D.J., Simon, J.Z., and Shamma, S.A. (2004). “Dynamics of precise spike timing in primary auditory cortex,” J. Neurosci., 24, 1159-1172. doi: 10.1523/JNEUROSCI.3825-03.2004
Elhilali, M., Fritz, J.B., Chi, T.-S., and Shamma, S.A. (2007). “Auditory cortical receptive fields: Stable entities with plastic abilities,” J. Neurosci., 27, 10372-10382. doi: 10.1523/JNEUROSCI.1462-07.2007
Elhilali, M., Shamma, S.A., Simon, J.Z., and Fritz, J.B. (2013). “A linear systems view to the concept of STRF,” in Handbook of Modern Techniques in Auditory Cortex. Eds. D. Depireux and M. Elhilali (Nova Science Pub Inc), 33-60.
Elliott, T.M., and Theunissen, F.E. (2009). “The modulation transfer function for speech intelligibility,” PLoS Comput. Biol., 5, e1000302.
Engineer, C.T., Perez, C.A., Carraway, R.S., Chang, K.Q., Roland, J.L., and Kilgard, M.P. (2014). “Speech training alters tone frequency tuning in rat primary auditory cortex,” Behav. Brain Res., 258, 166-178. doi: 10.1016/j.bbr.2013.10.021
Ezzat, T., Bouvrie, J.V, and Poggio, T. (2007). “Spectro-temporal analysis of speech using 2-d Gabor filters,” Proc. Interspeech, 506-509.
Fritz, J., Shamma, S., Elhilali, M., and Klein, D. (2003). “Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex,” Nat. Neurosci., 6, 1216-1223. doi: 10.1038/nn1141
Fritz, J.B., Elhilali, M., and Shamma, S.A. (2005). “Rapid task-dependent plasticity in primary auditory cortex,” in Auditory Cortex-Towards a Synthesis of Human and Animal Research. Wds. P. Heil, R. Konig, E. Budinger, and H. Scheich (Mahwah, NJ: Lawrence Erlbaum Associates), 445-466.
Fuglsang, S.A., Dau, T., and Hjortkjær, J. (2017). ”Noise-robust cortical tracking of attended speech in real-world acoustic scenes,” Neuroimage, 156, 435-444. doi: 10.1016/j.neuroimage.2017.04.026
Gauvain, J.-L., and Lee, C.-H. (1994). “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE T. Speech Audio, 2, 291-298.
Leggetter, C.J., and Woodland, P.C. (1995). “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Comput. Speech Lang., 9, 171-185.
Lu, K., Xu, Y., Yin, P., Oxenham, A.J., Fritz, J.B., and Shamma, S.A. (2017). “Temporal coherence structure rapidly shapes neuronal interactions,” Nat. Commun., 8, 13900. doi: 10.1038/ncomms13900
Mesgarani, N., and Chang, E.F. (2012). “Selective cortical representation of attended speaker in multi-talker speech perception,” Nature, 485, 233-236. doi: 10.1038/nature11020
Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs. Springer Science & Business Media.
Nelken, I., and Bar-Yosef, O. (2008). “Neurons and objects: The case of auditory cortex,” Front. Neurosci., 2, 107-113. doi: 10.3389/neuro.01.009.2008
O’Sullivan, J.A., Power, A.J., Mesgarani, N., Rajaram, S., Foxe, J.J., Shinn-Cunningham, B.G., et al. (2014). “Attentional selection in a cocktail party environment can be decoded from single-trial EEG,” Cereb. Cortex., 1697-1706. doi: 10.1093/cercor/bht355
Puvvada, K.C., and Simon, J.Z. (2017). “Cortical representations of speech in a multitalker auditory scene,” J. Neurosci., 37, 9189-9196. doi: 10.1523/ JNEUROSCI.0938-17.2017
Seriès, P., Stocker, A.A., and Simoncelli, E.P. (2009). “Is the homunculus “aware” of sensory adaptation?” Neural Comput., 21, 3271-3304.
Siohan, O., Chesta, C., and Lee, C.-H. (2001). “Joint maximum a posteriori adaptation of transformation and HMM parameters,” IEEE T. Speech Audio, 9, 417-428. doi: 10.1109/89.917687
Theunissen, F.E., Sen, K., and Doupe, A.J. (2000). “Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds,” J. Neurosci., 20, 2315-2331.
Winkowski, D.E., Nagode, D.A., Donaldson, K.J., Yin, P., Shamma, S.A., Fritz, J.B., et al. (2017). “Orbitofrontal cortex neurons respond to sound and activate primary auditory cortex neurons,” Cereb. Cortex, 1-12. doi: 10.1093/cercor/bhw409
Wostmann, M., Herrmann, B., Maess, B., and Obleser, J. (2016). “Spatiotemporal dynamics of auditory attention synchronize with speech,” Proc. Natl. Acad. Sci. USA, 113, 3873-3878. doi: 10.1073/pnas.1523357113
Additional Files
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright* and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
*From the 2017 issue onward. The Danavox Jubilee Foundation owns the copyright of all articles published in the 1969-2015 issues. However, authors are still allowed to share the work with an acknowledgement of the work's authorship and initial publication in this journal.