Dynamic and task-dependent encoding of speech and voice in the auditory cortex
Abstract
Speech is at the core of verbal communication and social interaction. It conveys linguistic content and speaker-specific vocal information that listeners exploit for identification. Cortical processing of speech relies on the formation of abstract representations that are invariant to highly variable acoustic input signals and critically depends on behavioral demands. In a series of EEG and fMRI studies we have recently investigated temporal as well as spatial neural coding mechanisms for forming such abstract representations. We focused on categorical and task-dependent neuronal responses to natural speech sounds (vowels /a/, /i/, /u/) spoken by different speakers. Brain activity was measured during passive listening (fMRI, EEG) and during performance of behavioural tasks on vowel or speaker identity (EEG). Our EEG results show that dynamic changes of sound- evoked responses and phase patterns of cortical oscillations in the alpha band (8-12 Hz) closely reflect the abstraction and analysis of the sounds along the task-relevant dimension. Our fMRI results show that spatially distributed activation patterns in early and higher level auditory cortex encode vowel-invariant representations of speaker identity and speaker- invariant representations of vowel identity. Both the transient and task- dependent realignment of neuronal responses (EEG) and the spatially distributed cortical fingerprints (fMRI) provide robust cortical coding mechanisms for forming abstract representations of auditory (speech) signals.
References
Belin, P., Fecteau, S., and Bedard, C. (2004). "Thinking the voice: neural correlates of voice perception" Trends Cogn Sci 8, 129-135.
Belin, P., and Zatorre, R. J. (2003). "Adaptation to speaker's voice in right anterior temporal lobe" Neuroreport 14, 2105-2109.
Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S., Springer, J. A., Kaufman, J. N., and Possing, E. T. (2000). "Human temporal lobe activation by speech and nonspeech sounds" Cereb. Cortex 10, 512-528.
Bonte, M., Parviainen, T., Hytonen, K., and Salmelin, R. (2006). "Time course of top-down and bottom-up influences on syllable processing in the auditory cortex" Cereb. Cortex 16, 115-123.
Bonte, M., Valente, G., and Formisano, E. (2009). "Dynamic and task-dependent encoding of speech and voice by phase reorganization of cortical oscillations" J. Neurosci. 29, 1699-1706.
Bonte, M. L., Mitterer, H., Zellagui, N., Poelmans, H., and Blomert, L. (2005). "Auditory cortical tuning to statistical regularities in phonology" Clin. Neurophysiol. 116, 2765-2774.
Davis, M. H., and Johnsrude, I. S. (2003). "Hierarchical processing in spoken language comprehension" J. Neurosci. 23, 3423-3431.
De Martino, F., Valente, G., Staeren, N., Ashburner, J., Goebel, R., and Formisano, E. (2008). "Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns" Neuroimage 43, 44-58.
Engel, A. K., Fries, P., and Singer, W. (2001). "Dynamic predictions: oscillations and synchrony in top-down processing" Nat. Rev. Neurosci. 2, 704-716.
Formisano, E., De Martino, F., Bonte, M., and Goebel, R. (2008a). ""Who" is saying "what"? Brain-based decoding of human voice and speech" Science 322, 970-973.
Formisano, E., De Martino, F., and Valente, G. (2008b). "Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning" Magn. Reson. Imaging 26, 921-934.
Hickok, G., and Poeppel, D. (2007). "The cortical organization of speech processing" Nat. Rev. Neurosci. 8, 393-402.
Klatt, D. H., and Klatt, L. C. (1990). "Analysis, synthesis, and perception of voice quality variations among female and male talkers" J. Acoust. Soc. Am. 87, 820-857.
Kilian-Hutten, N., Valente, G., Vroomen, J., and Formisano, E. (2011). "Auditory cortex encodes the perceptual interpretation of ambiguous sound" J. Neurosci. 31, 1715-1720.
Klimesch, W., Sauseng, P., Hanslmayr, S., Gruber, W., and Freunberger, R. (2007). "Event-related phase reorganization may explain evoked neural dynamics" Neurosci. Biobehav. Rev. 31, 1003-1016.
Kujala, J., Pammer, K., Cornelissen, P., Roebroeck, A., Formisano, E., and Salmelin, R. (2007). "Phase coupling in a cerebro-cerebellar network at 8-13 Hz during reading" Cereb. Cortex 17, 1476-1485.
Levy, D. A., Granot, R., and Bentin, S. (2003). "Neural sensitivity to human voices: ERP evidence of task and attentional influences" Psychophysiology 40, 291- 305.
Makeig, S., Westerfield, M., Jung, T. P., Enghoff, S., Townsend, J., Courchesne, E., and Sejnowski, T. J. (2002). "Dynamic brain sources of visual evoked responses" Science 295, 690-694.
Mazaheri, A., and Jensen, O. (2008). "Asymmetric amplitude modulations of brain oscillations generate slow evoked responses" J. Neurosci. 28, 7781-7787.
McClelland, J. L., and Elman, J. L. (1986). "The TRACE model of speech perception" Cognit. Psychol. 18, 1-86.
Murry, T., and Singh, S. (1980). "Multidimensional analysis of male and female voices" J. Acoust. Soc. Am. 68, 1294-1300.
Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J., and Alho, K. (1997). "Language-specific phoneme representations revealed by electric and magnetic brain responses" Nature 385, 432-434.
Norris, D., and McQueen, J. M. (2008). "Shortlist B: a Bayesian model of continuous speech recognition" Psychol. Rev. 115, 357-395.
Obleser, J., Elbert, T., and Eulitz, C. (2004a). "Attentional influences on functional mapping of speech sounds in human auditory cortex" BMC Neurosci. 5, 24.
Obleser, J., Lahiri, A., and Eulitz, C. (2004b). "Magnetic brain response mirrors extraction of phonological features from spoken vowels" J. Cogn. Neurosci. 16, 31-39.
Parviainen, T., Helenius, P., and Salmelin, R. (2005). "Cortical differentiation of speech and nonspeech sounds at 100 ms: implications for dyslexia" Cereb. Cortex 15, 1054-1063.
Poeppel, D., Phillips, C., Yellin, E., Rowley, H. A., Roberts, T. P., and Marantz, A. (1997). "Processing of vowels in supratemporal auditory cortex" Neurosci. Lett. 221, 145-148.
Poeppel, D., Yellin, E., Phillips, C., Roberts, T. P., Rowley, H. A., Wexler, K., and Marantz, A. (1996). "Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds" Brain Res. Cogn. Brain. Res. 4, 231-242.
Rauschecker, J. P., and Scott, S. K. (2009). "Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing" Nat. Neurosci. 12, 718-724.
Salinas, E., and Sejnowski, T. J. (2001). "Correlated neuronal activity and the flow of neural information" Nat. Rev. Neurosci. 2, 539-550.
Shestakova, A., Brattico, E., Soloviev, A., Klucharev, V., and Huotilainen, M. (2004). "Orderly cortical representation of vowel categories presented by multiple exemplars" Brain Res. Cogn. Brain Res. 21, 342-350.
Titova, N., and Näätänen, R. (2001). "Preattentive voice discrimination by the human brain as indexed by the mismatch negativity" Neurosci. Lett. 308, 63- 65.
von Stein, A., Chiang, C., and Konig, P. (2000). "Top-down processing mediated by interareal synchronization" Proc. Natl. Acad. Sci. U S A 97, 14748-14753.
Additional Files
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright* and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
*From the 2017 issue onward. The Danavox Jubilee Foundation owns the copyright of all articles published in the 1969-2015 issues. However, authors are still allowed to share the work with an acknowledgement of the work's authorship and initial publication in this journal.