Central auditory processing in the cocktail-party effect

Authors

  • Adelbert Bronkhorst TNO Human Factors, POB 23, 3769 ZG Soesterberg, The Netherlands; Cognitive Psychology Department, Vrije Universiteit Amsterdam, Van der Boechorststraat 1, 1081 BT Amsterdam, The Netherlands

Abstract

When we try to understand one talker in a group of talkers the capacities of our auditory system are stretched to the limit. Using the superposition of incoming sounds as input, it has to identify the target speech, trace it over time, ll in parts masked by other sounds, and nally convert it to a stream of meaningful information. Research into this “cocktail party” effect has proceeded along different lines that for a long time showed little or no overlap. Well-known for most psycho-acousticians are studies of peripheral effects such as (energetic) masking and binaural unmasking. In this presentation an overview is given of three other research lines that have addressed central processing of complex speech stimuli, and relationships between these lines are discussed. The oldest line looked at the role of attention in the selection of the target speech from all signals entering the ears. A more recent line has focused on the process of separating and piecing together acoustic information across time and space, which is referred to as grouping. In the third line, masking is studied but effects of peripheral (un)masking are factored out so that only the excess masking – referred to as informational masking, remains.

References

Arbogast, T., Mason, C., and Kidd, G. (2002). “The effect of spatial separation on informational and energetic masking of speech,” J. Acoust. Soc. Am. 112, 2086- 2098.

Beutelmann, R., and Brand, T. (2006). “Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners,” J. Acoust. Soc. Am. 120, 331-342.

Bregman, A. S. (1990). Auditory scene analysis: the perceptual organisation of sound (MIT Cambridge, MA: Bradford Books).

Broadbent, D. E. (1958). Perception and communication (New York: Pergamon). Broadbent, D. E., and Ladefoged, P. (1957). “On the fusion of sounds reaching different sense organs,” J. Acoust. Soc. Am. 29, 708-710.

Bronkhorst, A. W. (2000). “The cocktail party phenomenon: a review of speech intelligibility in multiple-talker conditions,” Acustica 86, 117-128.

Brungart, D. (2001). “Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101-1109.

Brungart, D. S., and Simpson, B. D. (2002). “Within-ear and across-ear interference in a cocktail-party listening task,” J. Acoust. Soc. Am. 112, 2985-2995.

Brungart, D. S., and Simpson, B. D. (2004). “Within-ear and across-ear interference in a dichotic cocktail party listening task: Effects of masker uncertainty,” J. Acoust. Soc. Am. 115, 301-310.

Cherry, E. C. (1953). “Some experiments on the recognition of speech, with one and with two ears,” J. Acoust. Soc. Am. 25, 975-979.

Cowan, N., and Wood, N. L. (1997). “Constraints on awareness, attention, processing and memory: Some recent investigations with ignored speech,” Consc. Cogn. 6, 182-203.

Culling, J. F., and Summer eld, Q. (1995). “Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay,” J. Acoust. Soc. Am. 98, 785-797.

Cutting, J. E. (1976). “Auditory and linguistic processes in speech perception: inferences from six fusions in dichotic listening,” Psychol. Rev. 83, 114-140.

Darwin, C. J. (1975). “On the dynamic use of prosody in speech perception,” in Structure and Process in Speech Perception: Proceedings of the Symposium on Dynamic Aspects of Speech Perception, edited by A. Cohen and S. G. Noteboom (Springer-Verlag, New York).

Darwin, C. J. (1981). “Perceptual grouping of speech components differing in fundamental frequency and onset time,” Q. J. Exp. Psychol. A 33, 185-208.

Darwin, C. J. (2008). “Listening to speech in the presence of other sounds,” Phil. Trans. R. Soc. B. 363, 1011-1021.

Darwin, C. J., and Carlyon, R. P. (1995). “Auditory grouping,” in The Handbook of Perception and Cognition Vol. 6 Hearing, edited by B. C. J. Moore (London: Academic Press), pp. 387-424.

Darwin, C. J., and Hukin, R. W. (1999). “Auditory objects of attention: the role of interaural time-differences,” J. Exp. Psychol.: Hum. Percept. Perform. 25, 617- 629.

Darwin, C., Brungart, D., and Simpson, B. (2003). “Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers,” J. Acoust. Soc. Am. 114, 2913-2922.

Darwin, C. J., and Hukin, R. W. (2004). “Limits to the role of a common fundamental frequency in the fusion of two sounds with different spatial cues,” J. Acoust. Soc. Am. 116, 502-506.

Deutsch, J. A., and Deutsch, D. (1963). “Attention: Some theoretical considerations,” Psych Rev. 70, 80-90.

Dorman, M. F., Cutting, J. E., and Raphael, L. J. (1975). “Perception of temporal order in vowel sequences with and without formant transitions,” J. Exp. Psychol.: Hum. Percept. Perform. 1, 121-129.

Durlach, N. I., Mason, C. R., Kidd Jr, G., Arbogast, T. L., Colburn, H. S., and Shinn- Cunningham, B. G. (2003a). “Note on informational masking,” J. Acoust. Soc. Am. 113, 2984-2987.

Durlach, N. I., Mason, C. R., Shinn-Cunningham, B. G., Arbogast, T. L., Colburn, H. S., and Kidd Jr, G. (2003b). “Informational masking: counteracting the effects of stimulus uncertainty by decreasing target-masker similarity,” J. Acoust. Soc. Am. 114, 368 - 379.

Ericson, M. A., Brungart, D. S., and Simpson, B. D. (2004). “Factors that in uence intelligibility in multitalker speech displays,” J. Aviation Psych. 14, 311-332.

Escera, C., Yago, E., Corral, M.-J., Corbera, S., and Nuňez, I. (2003). “Attention capture by auditory signi cant stimuli: semantic analysis follows attention switching,” Eur. J. Neurosc. 18, 2408-2412.

French, N. R., and Steinberg, J. C. (1947). “Factors governing the intelligibility of speech sounds,” J. Acoust. Soc. Am. 19, 90-119.

Freyman, R. L., Helfer, K. S., McCall, D. D., and Clifton, R. K. (1999). “The role of perceived spatial separation in the unmasking of speech,” J. Acoust. Soc. Am. 106, 3578-3588.

Freyman, R. L., Balakrishnan. U., and Helfer, K. S. (2004). “Effect of number of masking talkers and auditory priming on informational masking in speech recognition,” J. Acoust. Soc. Am. 115, 2246-2256.

Holender, D. (1986). “Semantic activation without conscious identi cation in dichotic listening, parafoveal vision, and visual masking: A survey and appraisal,” Behav. Brain Sci. 9, 1-66.

Houtgast, T., and Steeneken, H. J. M. (1973). “The modulation transfer function in room acoustics as a predictor of speech intelligibility,” Acustica 28, 66-73.

Kidd Jr, G., Arbogast, T. L. Mason, C. R., and Gallun, F. J. (2005). “The advantage of knowing where to listen,” J. Acout. Soc. Am. 118, 3804-3815.

Kidd Jr, G., Mason, C. R., Richards, V. M., Gallun, F. J., and Durlach, N. I. (2007). “Informational masking,” in Auditory Perception of Sound Sources, edited by W. A. Yost, A. N. Popper and R. R. Fay (Springer US), pp. 143-189.

Moray, N. (1959). “Attention in dichotic listening: Affective cues and the in uence of instructions,” Q. J. Exp. Psych. 11, 56-60.

Rhebergen, K. S., Versfeld, N. J., and Dreschler, W. A. (2006). “Extended speech intelligibility index for the prediction of the speech reception threshold in uctuating noise,” J. Acoust. Soc. Am. 120, 3988-3997.

Rivenez, M., Darwin, C. J., Guillaume, A. (2006). “Processing unattended speech,“ J. Acoust. Soc. Am. 119, 4027-4040.

Schröger, E. (2005). “The mismatch negativity as a tool to study auditory processing,” Acustica 91, 490-501.

Spence, C. and Driver, J. (1997). “Audiovisual links in exogenous covert spatial orienting,” Percept. Psychophys. 59, 1-22.

Teder-Sälejärvi, W. A., and Hillyard, S. A. (1998). ”The gradient of spatial auditory attention in free eld: An event-related potential study,” Percept. Psychophys. 60, 1228-1242.

Treisman, A. (1960). “Contextual cues in selective listening,” Q. J. Exp. Psychol. 12, 242-248.

Treisman, A. (1991). “Search, similarity, and integration of features between and within dimensions,” J. Exp. Psychol.: Hum. Perc. Perf. 17, 652-676.

Van Noorden, L. P. A. S. (1977). “Minimum differences of level and frequency for perceptual ssion of tone sequences ABAB,” J. Acoust. Soc. Am. 61, 1040-1045.

Van Wijngaarden, S. J., and Drullman, R. (2008). “Binaural intelligibility prediction based on the speech transmission index” J. Acoust. Soc. Am. 123, 4514-4523.

Vom Hövel, H. (1984). “Zur Bedeutung der Übertragungseigenschaften des Außenohres sowie des binauralen Hörsystems bei gestörter Sprachübertragung,” dissertation (RWTH Aachen).

Watson, C. (2005). “Some comments on informational masking,” Acustica 91, 502- 512.

Wood, N. L., and Cowan, N. (1995). “The cocktail party phenomenon revisited: How frequent are attention shifts to one’s name in an irrelevant auditory channel?,” J. Exp. Psych.: Learn. Mem. Cogn. 21, 255-260.

Zurek, P. M. (1990). “Binaural advantages and directional effects in speech intelligibility,” In: Acoustical Factors affecting Hearing Aid Performance, edited by G.A. Studebaker and I. Hochberg (Boston: Allyn and Bacon), pp. 255-276.

Additional Files

Published

2009-12-15

How to Cite

Bronkhorst, A. (2009). Central auditory processing in the cocktail-party effect. Proceedings of the International Symposium on Auditory and Audiological Research, 2, 275–288. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2009-28

Issue

Section

2009/3. Speech processing and perception under adverse conditions