Frequency importance functions for audiovisual speech and complex noise backgrounds

Authors

  • Joshua G. W. Bernstein Army Audiology & Speech Center, Walter Reed Army Medical Center, Washington, DC, USA
  • Ken W. Grant Army Audiology & Speech Center, Walter Reed Army Medical Center, Washington, DC, USA

Abstract

Two studies investigated the dependence on listening condition of the relative importance of different regions of the frequency spectrum toward speech intelligibility. For consonant recognition, low-frequency speech information becomes more important under audiovisual (AV) than audio-alone (AA) conditions. The rst study investigated whether this effect holds for broadband sentence materials using a correlation method designed to estimate frequency weighting functions for spectral pro le analysis, but applied to speech. Preliminary results indicate a shift in the frequency-band importance function (FBIF) toward lower frequencies for AV sentences, consistent with the idea that the visual (V) signal provides place-of-articulation information complementary to the voicing and manner cues provided by the low-frequency auditory (A) channels. FBIFs for AA and AV speech may also change in multitalker noise where target-masker segregation is requisite to speech understanding. A second study tested the hypothesis that low frequencies should also be more important than high frequencies for avoiding informational masking (IM) because of the availability of strong pitch cues for segregation. Preliminary results support this hypothesis, showing a small but signi cant increase in IM with increasing frequency for bandpass-filtered speech. Overall, these results show that the frequency dependence of speech intelligibility depends on the type of background noise and whether V information is available. Systematically characterizing these effects may guide dynamic hearing-aid systems that shift the amplification spectrum for different listening situations.

References

Apoux, F., Oliver, C., and Lorenzi, C. (2001). “Temporal envelope expansion of some speech in noise for normal-hearing and hearing-impaired listeners: Effects on identification performance and response times,” Hear. Res. 153, 123-131.

Arbogast, T. L., Mason, C. R., and Kidd, G. (2002). “The effect of spatial separation on informational and energetic masking of speech,” J. Acoust. Soc. Am. 112, 2086-2098.

Brungart, D. S. (2001). “Informational and energetic masking effects in the perception of two simultaneous talkers,” J. Acoust. Soc. Am. 109, 1101-1109.

Byrne, D., Dillon, H., Ching, T., Katsch, R., and Keidser, G. (2001). “NAL-NL1 procedure for fitting nonlinear hearing aids: characteristics and comparisons with other procedures,” J. Am. Acad. Audiol. 12, 37-51.

Calandruccio, L., and Doherty, K. A. (2007). “Spectral weighting strategies for sentences measured by a correlation method,” J. Acoust. Soc. Am. 121, 3827-3836.

Chang, J. E., Bai, J. Y., and Zeng, F. G. (2006). “Unintelligible low-frequency sound enhances simulated cochlear-implant speech recognition in noise,” IEEE Trans. Biomed. Eng. 53, 2598-2601.

Darwin, C. J., and Hukin, R. W. (2000). “Effectiveness of spatial cues, prosody, and talker characteristics in selective attention,” J. Acoust. Soc. Am. 107, 970-977.

Erber, N. P. (2003). “Use of hearing aids by older people: Influence of non-auditory factors (vision, dexterity),” Int. J. Audiol. 42, Suppl. 2, 21-25.

Festen, J. M., and Plomp, R. (1990). “Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing,” J. Acoust. Soc. Am. 88, 1725-1736.

Grant, K. W. (2005). “Frequency-band importance functions for auditory and auditory-visual speech recognition,” J. Acoust. Soc. Am. 117, 2424.

Grant, K.W. and Bernstein, J.G.W. (2007). “Frequency band-importance functions for auditory and auditory-visual sentence recognition,” J. Acoust. Soc. Am., 121, 3044 (Abstract).

Grant, K. W., Bernstein, J.G.W. and Grassi, E. (this volume). “Modeling speech intelligiblity,” in Proceedings of the International Symposium on Auditory and Audiological Research (Helsingor, Denmark, August 29-31, 2007).

Grant, K. W., and Braida, L. D. (1991). “Evaluating the Articulation Index for audiovisual input,” J. Acoust. Soc. Am. 89, 2952-2960.

Grant, K. W., and Seitz, P. F. (2000). “The use of visible speech cues for improving auditory detection of spoken sentences,” J. Acoust. Soc. Am. 108, 1197-1208.

Grant, K. W., and Walden, B. E. (1996). “Evaluating the articulation index for auditory-visual consonant recognition,” J. Acoust. Soc. Am. 100, 2415-2424.

Greenberg, S., Arai, T., and Silipo, R. (1998). “Speech intelligibility derived from exceedingly sparse spectral information,” in Proceedings of the International Conference of Spoken Language Processing (Sydney, Australia, December 1-4).

Helfer, K. S., and Freyman, R. L. (2005). “The role of visual speech cues in reducing energetic and informational masking,” J. Acoust. Soc. Am. 117, 842-849.

Houtsma, A. J. M., and Smurzynski, J. (1990). “Pitch identification and discrimination for complex tones with many harmonics,” J. Acoust. Soc. Am. 87, 304-310.

IEEE (1969). IEEE recommended practice for speech quality measures (Institute of Electrical and Electronic Engineers, New York).

Müsch, H., and Buus, S. (2001). “Using statistical decision theory to predict speech intelligibility. I. Model structure,” J. Acoust. Soc. Am. 109, 2896-2909.

Nilsson, M., Soli, S., and Sullivan, J. A. (1994). “Development of the Hearing In Noise Test for the measurement of speech reception thresholds in quiet and in noise,” J. Acoust. Soc. Am. 95, 1085-1099.

Oxenham, A.J. (this volume). “Pitch perception in normal, impaired and electric hearing,” in Proceedings of the International Symposium on Auditory and Audiological Research (Helsingor, Denmark, August 29-31, 2007).

Richards, V. M., and Zhu, S. (1994). “Relative estimates of combination weights, decision criteria, and internal noise based on correlation coefficients,” J. Acoust. Soc. Am. 95, 423-434.

Snell, K. B., Ison, J. R., and Frisina, D. R. (1994). “The effects of signal frequency and absolute bandwidth on gap detection in noise,” J. Acoust. Soc. Am. 96, 1458-1464.

Turner, C. W., Kwon, B. J., Tanaka, C., Knapp, J., Hubbartt, J. L., and Doherty, K. A. (1998). “Frequency-weighting functions for broadband speech as estimated by a correlational method,” J. Acoust. Soc. Am. 104, 1580-1585.

Additional Files

Published

2007-12-15

How to Cite

Bernstein, J. G. W., & Grant, K. W. (2007). Frequency importance functions for audiovisual speech and complex noise backgrounds. Proceedings of the International Symposium on Auditory and Audiological Research, 1, 365–374. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2007-34

Issue

Section

2007/4. Speech perception and processing