Frequency importance functions for audiovisual speech and complex noise backgrounds
Abstract
Two studies investigated the dependence on listening condition of the relative importance of different regions of the frequency spectrum toward speech intelligibility. For consonant recognition, low-frequency speech information becomes more important under audiovisual (AV) than audio-alone (AA) conditions. The rst study investigated whether this effect holds for broadband sentence materials using a correlation method designed to estimate frequency weighting functions for spectral pro le analysis, but applied to speech. Preliminary results indicate a shift in the frequency-band importance function (FBIF) toward lower frequencies for AV sentences, consistent with the idea that the visual (V) signal provides place-of-articulation information complementary to the voicing and manner cues provided by the low-frequency auditory (A) channels. FBIFs for AA and AV speech may also change in multitalker noise where target-masker segregation is requisite to speech understanding. A second study tested the hypothesis that low frequencies should also be more important than high frequencies for avoiding informational masking (IM) because of the availability of strong pitch cues for segregation. Preliminary results support this hypothesis, showing a small but signi cant increase in IM with increasing frequency for bandpass-filtered speech. Overall, these results show that the frequency dependence of speech intelligibility depends on the type of background noise and whether V information is available. Systematically characterizing these effects may guide dynamic hearing-aid systems that shift the amplification spectrum for different listening situations.
References
Arbogast, T. L., Mason, C. R., and Kidd, G. (2002). “The effect of spatial separation on informational and energetic masking of speech,” J. Acoust. Soc. Am. 112, 2086-2098.
Brungart, D. S. (2001). “Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101-1109.
Byrne, D., Dillon, H., Ching, T., Katsch, R., and Keidser, G. (2001). “NAL-NL1 procedure for fitting nonlinear hearing aids: characteristics and comparisons with other procedures,” J. Am. Acad. Audiol. 12, 37-51.
Calandruccio, L., and Doherty, K. A. (2007). “Spectral weighting strategies for sentences measured by a correlation method,” J. Acoust. Soc. Am. 121, 3827-3836.
Chang, J. E., Bai, J. Y., and Zeng, F. G. (2006). “Unintelligible low-frequency sound enhances simulated cochlear-implant speech recognition in noise,” IEEE Trans. Biomed. Eng. 53, 2598-2601.
Darwin, C. J., and Hukin, R. W. (2000). “Effectiveness of spatial cues, prosody, and talker characteristics in selective attention,” J. Acoust. Soc. Am. 107, 970-977.
Erber, N. P. (2003). “Use of hearing aids by older people: Influence of non-auditory factors (vision, dexterity),” Int. J. Audiol. 42, Suppl. 2, 21-25.
Festen, J. M., and Plomp, R. (1990). “Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing,” J. Acoust. Soc. Am. 88, 1725-1736.
Grant, K. W. (2005). “Frequency-band importance functions for auditory and auditory-visual speech recognition,” J. Acoust. Soc. Am. 117, 2424.
Grant, K.W. and Bernstein, J.G.W. (2007). “Frequency band-importance functions for auditory and auditory-visual sentence recognition,” J. Acoust. Soc. Am., 121, 3044 (Abstract).
Grant, K. W., Bernstein, J.G.W. and Grassi, E. (this volume). “Modeling speech intelligiblity,” in Proceedings of the International Symposium on Auditory and Audiological Research (Helsingor, Denmark, August 29-31, 2007).
Grant, K. W., and Braida, L. D. (1991). “Evaluating the Articulation Index for audiovisual input,” J. Acoust. Soc. Am. 89, 2952-2960.
Grant, K. W., and Seitz, P. F. (2000). “The use of visible speech cues for improving auditory detection of spoken sentences,” J. Acoust. Soc. Am. 108, 1197-1208.
Grant, K. W., and Walden, B. E. (1996). “Evaluating the articulation index for auditory-visual consonant recognition,” J. Acoust. Soc. Am. 100, 2415-2424.
Greenberg, S., Arai, T., and Silipo, R. (1998). “Speech intelligibility derived from exceedingly sparse spectral information,” in Proceedings of the International Conference of Spoken Language Processing (Sydney, Australia, December 1-4).
Helfer, K. S., and Freyman, R. L. (2005). “The role of visual speech cues in reducing energetic and informational masking,” J. Acoust. Soc. Am. 117, 842-849.
Houtsma, A. J. M., and Smurzynski, J. (1990). “Pitch identification and discrimination for complex tones with many harmonics,” J. Acoust. Soc. Am. 87, 304-310.
IEEE (1969). IEEE recommended practice for speech quality measures (Institute of Electrical and Electronic Engineers, New York).
Müsch, H., and Buus, S. (2001). “Using statistical decision theory to predict speech intelligibility. I. Model structure,” J. Acoust. Soc. Am. 109, 2896-2909.
Nilsson, M., Soli, S., and Sullivan, J. A. (1994). “Development of the Hearing In Noise Test for the measurement of speech reception thresholds in quiet and in noise,” J. Acoust. Soc. Am. 95, 1085-1099.
Oxenham, A.J. (this volume). “Pitch perception in normal, impaired and electric hearing,” in Proceedings of the International Symposium on Auditory and Audiological Research (Helsingor, Denmark, August 29-31, 2007).
Richards, V. M., and Zhu, S. (1994). “Relative estimates of combination weights, decision criteria, and internal noise based on correlation coefficients,” J. Acoust. Soc. Am. 95, 423-434.
Snell, K. B., Ison, J. R., and Frisina, D. R. (1994). “The effects of signal frequency and absolute bandwidth on gap detection in noise,” J. Acoust. Soc. Am. 96, 1458-1464.
Turner, C. W., Kwon, B. J., Tanaka, C., Knapp, J., Hubbartt, J. L., and Doherty, K. A. (1998). “Frequency-weighting functions for broadband speech as estimated by a correlational method,” J. Acoust. Soc. Am. 104, 1580-1585.
Additional Files
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright* and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
*From the 2017 issue onward. The Danavox Jubilee Foundation owns the copyright of all articles published in the 1969-2015 issues. However, authors are still allowed to share the work with an acknowledgement of the work's authorship and initial publication in this journal.