Dynamic and task-dependent encoding of speech and voice in the auditory cortex


  • Milene Bonte Maastricht Brain Imaging Center; Dept. of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
  • Elia Formisano Maastricht Brain Imaging Center; Dept. of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands


Speech is at the core of verbal communication and social interaction. It conveys linguistic content and speaker-specific vocal information that listeners exploit for identification. Cortical processing of speech relies on the formation of abstract representations that are invariant to highly variable acoustic input signals and critically depends on behavioral demands. In a series of EEG and fMRI studies we have recently investigated temporal as well as spatial neural coding mechanisms for forming such abstract representations. We focused on categorical and task-dependent neuronal responses to natural speech sounds (vowels /a/, /i/, /u/) spoken by different speakers. Brain activity was measured during passive listening (fMRI, EEG) and during performance of behavioural tasks on vowel or speaker identity (EEG). Our EEG results show that dynamic changes of sound- evoked responses and phase patterns of cortical oscillations in the alpha band (8-12 Hz) closely reflect the abstraction and analysis of the sounds along the task-relevant dimension. Our fMRI results show that spatially distributed activation patterns in early and higher level auditory cortex encode vowel-invariant representations of speaker identity and speaker- invariant representations of vowel identity. Both the transient and task- dependent realignment of neuronal responses (EEG) and the spatially distributed cortical fingerprints (fMRI) provide robust cortical coding mechanisms for forming abstract representations of auditory (speech) signals.


Beauchemin, M., De Beaumont, L., Vannasing, P., Turcotte, A., Arcand, C., Belin, P., and Lassonde, M. (2006). "Electrophysiological markers of voice familiarity" Eur. J. Neurosci. 23, 3081-3086.

Belin, P., Fecteau, S., and Bedard, C. (2004). "Thinking the voice: neural correlates of voice perception" Trends Cogn Sci 8, 129-135.

Belin, P., and Zatorre, R. J. (2003). "Adaptation to speaker's voice in right anterior temporal lobe" Neuroreport 14, 2105-2109.

Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S., Springer, J. A., Kaufman, J. N., and Possing, E. T. (2000). "Human temporal lobe activation by speech and nonspeech sounds" Cereb. Cortex 10, 512-528.

Bonte, M., Parviainen, T., Hytonen, K., and Salmelin, R. (2006). "Time course of top-down and bottom-up influences on syllable processing in the auditory cortex" Cereb. Cortex 16, 115-123.

Bonte, M., Valente, G., and Formisano, E. (2009). "Dynamic and task-dependent encoding of speech and voice by phase reorganization of cortical oscillations" J. Neurosci. 29, 1699-1706.

Bonte, M. L., Mitterer, H., Zellagui, N., Poelmans, H., and Blomert, L. (2005). "Auditory cortical tuning to statistical regularities in phonology" Clin. Neurophysiol. 116, 2765-2774.

Davis, M. H., and Johnsrude, I. S. (2003). "Hierarchical processing in spoken language comprehension" J. Neurosci. 23, 3423-3431.

De Martino, F., Valente, G., Staeren, N., Ashburner, J., Goebel, R., and Formisano, E. (2008). "Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns" Neuroimage 43, 44-58.

Engel, A. K., Fries, P., and Singer, W. (2001). "Dynamic predictions: oscillations and synchrony in top-down processing" Nat. Rev. Neurosci. 2, 704-716.

Formisano, E., De Martino, F., Bonte, M., and Goebel, R. (2008a). ""Who" is saying "what"? Brain-based decoding of human voice and speech" Science 322, 970-973.

Formisano, E., De Martino, F., and Valente, G. (2008b). "Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning" Magn. Reson. Imaging 26, 921-934.

Hickok, G., and Poeppel, D. (2007). "The cortical organization of speech processing" Nat. Rev. Neurosci. 8, 393-402.

Klatt, D. H., and Klatt, L. C. (1990). "Analysis, synthesis, and perception of voice quality variations among female and male talkers" J. Acoust. Soc. Am. 87, 820-857.

Kilian-Hutten, N., Valente, G., Vroomen, J., and Formisano, E. (2011). "Auditory cortex encodes the perceptual interpretation of ambiguous sound" J. Neurosci. 31, 1715-1720.

Klimesch, W., Sauseng, P., Hanslmayr, S., Gruber, W., and Freunberger, R. (2007). "Event-related phase reorganization may explain evoked neural dynamics" Neurosci. Biobehav. Rev. 31, 1003-1016.

Kujala, J., Pammer, K., Cornelissen, P., Roebroeck, A., Formisano, E., and Salmelin, R. (2007). "Phase coupling in a cerebro-cerebellar network at 8-13 Hz during reading" Cereb. Cortex 17, 1476-1485.

Levy, D. A., Granot, R., and Bentin, S. (2003). "Neural sensitivity to human voices: ERP evidence of task and attentional influences" Psychophysiology 40, 291- 305.

Makeig, S., Westerfield, M., Jung, T. P., Enghoff, S., Townsend, J., Courchesne, E., and Sejnowski, T. J. (2002). "Dynamic brain sources of visual evoked responses" Science 295, 690-694.

Mazaheri, A., and Jensen, O. (2008). "Asymmetric amplitude modulations of brain oscillations generate slow evoked responses" J. Neurosci. 28, 7781-7787.

McClelland, J. L., and Elman, J. L. (1986). "The TRACE model of speech perception" Cognit. Psychol. 18, 1-86.

Murry, T., and Singh, S. (1980). "Multidimensional analysis of male and female voices" J. Acoust. Soc. Am. 68, 1294-1300.

Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J., and Alho, K. (1997). "Language-specific phoneme representations revealed by electric and magnetic brain responses" Nature 385, 432-434.

Norris, D., and McQueen, J. M. (2008). "Shortlist B: a Bayesian model of continuous speech recognition" Psychol. Rev. 115, 357-395.

Obleser, J., Elbert, T., and Eulitz, C. (2004a). "Attentional influences on functional mapping of speech sounds in human auditory cortex" BMC Neurosci. 5, 24.

Obleser, J., Lahiri, A., and Eulitz, C. (2004b). "Magnetic brain response mirrors extraction of phonological features from spoken vowels" J. Cogn. Neurosci. 16, 31-39.

Parviainen, T., Helenius, P., and Salmelin, R. (2005). "Cortical differentiation of speech and nonspeech sounds at 100 ms: implications for dyslexia" Cereb. Cortex 15, 1054-1063.

Poeppel, D., Phillips, C., Yellin, E., Rowley, H. A., Roberts, T. P., and Marantz, A. (1997). "Processing of vowels in supratemporal auditory cortex" Neurosci. Lett. 221, 145-148.

Poeppel, D., Yellin, E., Phillips, C., Roberts, T. P., Rowley, H. A., Wexler, K., and Marantz, A. (1996). "Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds" Brain Res. Cogn. Brain. Res. 4, 231-242.

Rauschecker, J. P., and Scott, S. K. (2009). "Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing" Nat. Neurosci. 12, 718-724.

Salinas, E., and Sejnowski, T. J. (2001). "Correlated neuronal activity and the flow of neural information" Nat. Rev. Neurosci. 2, 539-550.

Shestakova, A., Brattico, E., Soloviev, A., Klucharev, V., and Huotilainen, M. (2004). "Orderly cortical representation of vowel categories presented by multiple exemplars" Brain Res. Cogn. Brain Res. 21, 342-350.

Titova, N., and Näätänen, R. (2001). "Preattentive voice discrimination by the human brain as indexed by the mismatch negativity" Neurosci. Lett. 308, 63- 65.

von Stein, A., Chiang, C., and Konig, P. (2000). "Top-down processing mediated by interareal synchronization" Proc. Natl. Acad. Sci. U S A 97, 14748-14753.

Additional Files



How to Cite

Bonte, M., & Formisano, E. (2011). Dynamic and task-dependent encoding of speech and voice in the auditory cortex. Proceedings of the International Symposium on Auditory and Audiological Research, 3, 263–274. Retrieved from http://proceedings.isaar.eu/index.php/isaarproc/article/view/2011-31



2011/2. Neural representation of complex sounds and speech in the auditory brain