The advantage of spatial and vocal characteristics in the recognition of competing speech
Abstract
In multi-speaker environments, listeners take advantage of a variety of cues that characterize the target and distracter speakers to improve speech recognition. Spatial cues like interaural time and intensity differences provide binaural unmasking and a better-ear advantage. Vocal characteristics such as pitch and resonance scale help to disambiguate concurrent speech. Temporal misalignment of competing speech signals can improve recognition by virtue of ‘listening in the dips’. In this paper, we review a series of experiments on the advantage of spatial and vocal characteristics in the recognition of concurrent speech. Syllable pairs were synthesized to simulate different speakers, and the recognition of syllables that varied in spatial and vocal characteristics was measured. The effect of temporal glimpsing was measured by aligning the temporal envelopes of the competing signals in a controlled way. The results show that spatial and vocal cues compete to provide selectivity of concurrent speech sounds. When they are clearly separated in space, vocal characteristics can only further improve performance marginally. However, when they are temporally and spatially aligned, a substantial advantage can be derived from the vocal characteristics. The paper discusses the interaction of spatial and vocal cues, and the patterns of syllable confusions that listeners make.
References
Assmann, P. F., and Summerfield, Q. (1990). “Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies,” J. Acoust. Soc. Am. 88, 680-697.
Assmann, P. F., and Summerfield, Q. (1994). “The contribution of waveform interactions to the perception of concurrent vowels,” J. Acoust. Soc. Am. 95, 471-484.
Brungart, D. S. (2001). “Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101-1109.
Chalikia, M. H., and Bregman, A. S. (1993). “The perceptual segregation of simultaneous vowels with harmonic, shifted, or random components,” Percept. Psychophys. 53, 125-133.
Cherry, E. C. (1953). “Some experiments on the recognition of speech, with one and two ears,” J. Acoust. Soc. Am. 25, 975-979.
Cooke, M. (2006). “A glimpsing model of speech perception in noise,” J. Acoust. Soc. Am. 119, 1562-1573.
Culling, J. F., and Darwin, C. J. (1993). “The role of timbre in the segregation of simultaneous voices with intersecting f0 contours,” Percept. Psychophys. 54, 303-309.
Culling, J. F., and Summerfield, Q. (1995). “Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay,” J. Acoust. Soc. Am. 98, 785-797.
Darwin, C. J., Brungart, D. S., and Simpson, B. D. (2003). “Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers,” J. Acoust. Soc. Am. 114, 2913-2922.
de Cheveigné, A. (1993). “Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing,” J. Acoust. Soc. Am. 93, 3271-3290.
de Cheveigné, A., McAdams, S., and Marin, C. M. H. (1997a). “Concurrent vowel identification. Ii. Effects of phase, harmonicity and task,” J. Acoust. Soc. Am. 101, 2848-2856.
de Cheveigné, A., Kawahara, H., Tsuzaki, M., and Aikawa, K. (1997b). “Concurrent vowel identification. I. Effects of relative amplitude and f0 difference,” J. Acoust. Soc. Am. 101, 2839-2847.
Drennan, W. R., Gatehouse, S., and Lever, C. (2003). “Perceptual segregation of competing speech sounds: The role of spatial location,” J. Acoust. Soc. Am. 114, 2178-2189.
Fant, G. C. M. (1970). Acoustic theory of speech production (Mouton, The Hague).
Ives, D. T., Smith, D. R., and Patterson, R. D. (2005). “Discrimination of speaker size from syllable phrases,” J. Acoust. Soc. Am. 118, 3816-3822.
Ives, D. T., Vestergaard, M. D., and Patterson, R. D. (2009). “Location and acoustic scale cues in concurrent speech recognition,” J. Acoust. Soc. Am. submitted.
Kawahara, H., and Irino, T. (2004). “Underlying principles of a high-quality speech manipulation system straight and its application to speech segregation,” in Speech separation by humans and machines, edited by P. L. Divenyi (Kluwer Academic, Boston MA).
Ladefoged, P., and Broadbent, D. E. (1957). “Information conveyed by vowels,” J. Acoust. Soc. Am. 29, 98-104.
Lee, S., Potamianos, A., and Narayanan, S. (1999). “Acoustics of children's speech: Developmental changes of temporal and spectral parameters,” J. Acoust. Soc. Am. 105, 1455-1468.
Marcus, S. M. (1981). “Acoustic determinants of perceptual center (p-center) location,” Percept. Psychophys. 30, 247-256.
Miller, G. A., and Licklider, J. C. R. (1950). “The intelligibility of interrupted speech,” J. Acoust. Soc. Am. 22, 167-173.
Qin, M. K., and Oxenham, A. J. (2005). “Effects of envelope-vocoder processing on f0 discrimination and concurrent-vowel identification,” Ear Hear. 26, 451-460.
Ritsma, R. J., and Hoekstra, A. (1974). “Frequency selectivity and the tonal residue,” in Facts and models in hearing, edited by E. Zwicker and E. Terhardt (Springer, Berlin).
Smith, D. R., and Patterson, R. D. (2005). “The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age,” J. Acoust. Soc. Am. 118, 3177-3186.
Summerfield, Q., and Assmann, P. F. (1991). “Perception of concurrent vowels: Effects of harmonic misalignment and pitch-period asynchrony,” J. Acoust. Soc. Am. 89, 1364-1377.
Titze, I. R. (1989). “Physiologic and acoustic differences between male and female voices,” J. Acoust. Soc. Am. 85, 1699-1707.
Vestergaard, M. D., and Patterson, R. D. (2009). “Effects of voicing in the recognition of concurrent syllables,” J. Acoust. Soc. Am. 126, 2860-2863.
Vestergaard, M. D., Fyson, N. R. C., and Patterson, R. D. (2009). “The interaction of vocal characteristics and audibility in the recognition of concurrent syllables,” J. Acoust. Soc. Am. 125, 1114-1124.
Vestergaard, M. D., Fyson, N. R. C., and Patterson, R. D. (2011). “The mutual roles of temporal glimpsing and vocal characteristics in cocktail-party listening,” J. Acoust. Soc. Am. in press.
Additional Files
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright* and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
*From the 2017 issue onward. The Danavox Jubilee Foundation owns the copyright of all articles published in the 1969-2015 issues. However, authors are still allowed to share the work with an acknowledgement of the work's authorship and initial publication in this journal.