The advantage of spatial and vocal characteristics in the recognition of competing speech

Authors

  • Martin D. Vestergaard Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, United Kingdom
  • D. Timothy Ives Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, United Kingdom
  • Roy D. Patterson Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, United Kingdom

Abstract

In multi-speaker environments, listeners take advantage of a variety of cues that characterize the target and distracter speakers to improve speech recognition. Spatial cues like interaural time and intensity differences provide binaural unmasking and a better-ear advantage. Vocal characteristics such as pitch and resonance scale help to disambiguate concurrent speech. Temporal misalignment of competing speech signals can improve recognition by virtue of ‘listening in the dips’. In this paper, we review a series of experiments on the advantage of spatial and vocal characteristics in the recognition of concurrent speech. Syllable pairs were synthesized to simulate different speakers, and the recognition of syllables that varied in spatial and vocal characteristics was measured. The effect of temporal glimpsing was measured by aligning the temporal envelopes of the competing signals in a controlled way. The results show that spatial and vocal cues compete to provide selectivity of concurrent speech sounds. When they are clearly separated in space, vocal characteristics can only further improve performance marginally. However, when they are temporally and spatially aligned, a substantial advantage can be derived from the vocal characteristics. The paper discusses the interaction of spatial and vocal cues, and the patterns of syllable confusions that listeners make.

References

ANSI (1997). S3.5. Methods for the calculation of the speech intelligibility index (American National Standards Institute, New York).

Assmann, P. F., and Summerfield, Q. (1990). “Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies,” J. Acoust. Soc. Am. 88, 680-697.

Assmann, P. F., and Summerfield, Q. (1994). “The contribution of waveform interactions to the perception of concurrent vowels,” J. Acoust. Soc. Am. 95, 471-484.

Brungart, D. S. (2001). “Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101-1109.

Chalikia, M. H., and Bregman, A. S. (1993). “The perceptual segregation of simultaneous vowels with harmonic, shifted, or random components,” Percept. Psychophys. 53, 125-133.

Cherry, E. C. (1953). “Some experiments on the recognition of speech, with one and two ears,” J. Acoust. Soc. Am. 25, 975-979.

Cooke, M. (2006). “A glimpsing model of speech perception in noise,” J. Acoust. Soc. Am. 119, 1562-1573.

Culling, J. F., and Darwin, C. J. (1993). “The role of timbre in the segregation of simultaneous voices with intersecting f0 contours,” Percept. Psychophys. 54, 303-309.

Culling, J. F., and Summerfield, Q. (1995). “Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay,” J. Acoust. Soc. Am. 98, 785-797.

Darwin, C. J., Brungart, D. S., and Simpson, B. D. (2003). “Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers,” J. Acoust. Soc. Am. 114, 2913-2922.

de Cheveigné, A. (1993). “Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing,” J. Acoust. Soc. Am. 93, 3271-3290.

de Cheveigné, A., McAdams, S., and Marin, C. M. H. (1997a). “Concurrent vowel identification. Ii. Effects of phase, harmonicity and task,” J. Acoust. Soc. Am. 101, 2848-2856.

de Cheveigné, A., Kawahara, H., Tsuzaki, M., and Aikawa, K. (1997b). “Concurrent vowel identification. I. Effects of relative amplitude and f0 difference,” J. Acoust. Soc. Am. 101, 2839-2847.

Drennan, W. R., Gatehouse, S., and Lever, C. (2003). “Perceptual segregation of competing speech sounds: The role of spatial location,” J. Acoust. Soc. Am. 114, 2178-2189.

Fant, G. C. M. (1970). Acoustic theory of speech production (Mouton, The Hague).

Ives, D. T., Smith, D. R., and Patterson, R. D. (2005). “Discrimination of speaker size from syllable phrases,” J. Acoust. Soc. Am. 118, 3816-3822.

Ives, D. T., Vestergaard, M. D., and Patterson, R. D. (2009). “Location and acoustic scale cues in concurrent speech recognition,” J. Acoust. Soc. Am. submitted.

Kawahara, H., and Irino, T. (2004). “Underlying principles of a high-quality speech manipulation system straight and its application to speech segregation,” in Speech separation by humans and machines, edited by P. L. Divenyi (Kluwer Academic, Boston MA).

Ladefoged, P., and Broadbent, D. E. (1957). “Information conveyed by vowels,” J. Acoust. Soc. Am. 29, 98-104.

Lee, S., Potamianos, A., and Narayanan, S. (1999). “Acoustics of children's speech: Developmental changes of temporal and spectral parameters,” J. Acoust. Soc. Am. 105, 1455-1468.

Marcus, S. M. (1981). “Acoustic determinants of perceptual center (p-center) location,” Percept. Psychophys. 30, 247-256.

Miller, G. A., and Licklider, J. C. R. (1950). “The intelligibility of interrupted speech,” J. Acoust. Soc. Am. 22, 167-173.

Qin, M. K., and Oxenham, A. J. (2005). “Effects of envelope-vocoder processing on f0 discrimination and concurrent-vowel identification,” Ear Hear. 26, 451-460.

Ritsma, R. J., and Hoekstra, A. (1974). “Frequency selectivity and the tonal residue,” in Facts and models in hearing, edited by E. Zwicker and E. Terhardt (Springer, Berlin).

Smith, D. R., and Patterson, R. D. (2005). “The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age,” J. Acoust. Soc. Am. 118, 3177-3186.

Summerfield, Q., and Assmann, P. F. (1991). “Perception of concurrent vowels: Effects of harmonic misalignment and pitch-period asynchrony,” J. Acoust. Soc. Am. 89, 1364-1377.

Titze, I. R. (1989). “Physiologic and acoustic differences between male and female voices,” J. Acoust. Soc. Am. 85, 1699-1707.

Vestergaard, M. D., and Patterson, R. D. (2009). “Effects of voicing in the recognition of concurrent syllables,” J. Acoust. Soc. Am. 126, 2860-2863.

Vestergaard, M. D., Fyson, N. R. C., and Patterson, R. D. (2009). “The interaction of vocal characteristics and audibility in the recognition of concurrent syllables,” J. Acoust. Soc. Am. 125, 1114-1124.

Vestergaard, M. D., Fyson, N. R. C., and Patterson, R. D. (2011). “The mutual roles of temporal glimpsing and vocal characteristics in cocktail-party listening,” J. Acoust. Soc. Am. in press.

Additional Files

Published

2009-12-15

How to Cite

Vestergaard, M. D., Ives, D. T., & Patterson, R. D. (2009). The advantage of spatial and vocal characteristics in the recognition of competing speech. Proceedings of the International Symposium on Auditory and Audiological Research, 2, 535–544. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2009-55