Active hearing, active speaking

Authors

  • Martin Cooke Speech and Hearing Research, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
  • Yan-Chen Lu Speech and Hearing Research, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
  • Youyi Lu Speech and Hearing Research, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK
  • Radu Horaud INRIA Rhône-Alpes, 655, Ave. de l’Europe, 38330 Montbonnot, France

Abstract

A static view of the world permeates most research in speech and hearing. In this idealised situation, sources don’t move and neither do listeners; the acoustic environment doesn’t change; and speakers speak without any effect of auditory input from their own voice or other speakers. Corpora for speech research and most behavioural tasks have grown to reflect the static viewpoint. Yet it is clear that speech and hearing takes place in a world where none of the static assumptions hold, or at least not for long. The dynamic view complicates tradi- tional signal processing approaches, and renders conventional evaluation processes unrepeatable since the observer’s dynamics influence the signals received at the ears. However, the dynamic viewpoint also provides many opportunities for active processes to exploit. Some of these, such as the use of head movements to resolve front-back confusions, are well-known, while others exist solely as hypotheses. This paper reviews known and potential benefits of active processes in both hearing and speech production, and goes on to describe two recent studies which demonstrate the value of such processes. The first shows how dynamic cues can be used to estimate distance in an acoustic environment. The second demonstrates that the changes in speech production which take place when other speakers are active result in increased glimpsing opportunities at the ear of the interlocutor.

References

Ashmead, D. H., Davis, D. L., and Northington, A. (1995). “Contribution of listeners’ approaching motion to auditory distance perception,” J. Exp. Psychol. Hum. Per- cept. Perform., 21, 239-56.

Berglund, E. and Sitte, J. (2005). “Sound source localisation through active audition,” Proc. IEEE Int. Conf. Intelligent Robots and Systems, 509-514.

Campbell, D. R., Palomäki, K. J. and Brown, G. (2005). “A MATLAB simulation of “shoebox” room acoustics for use in research and teaching,” Computing and Information Systems J., 9, 48-51.

Chen, F. R. (1980). “Acoustic characteristics and intelligibility of clear and conversational speech at the segmental level,” Unpublished master’s thesis, Massachusetts Institute of Technology, Cambridge.

Cherry, E. C., (1953). “Some experiments on the recognition of speech with one and with two ears,” J. Acoust. Soc. Am., 25, 975-979.

Cooke, M. P. (2006). “A glimpsing model of speech perception in noise,” J. Acoust. Soc. Am. 119, 1562-1573.

Costeira, J. and Kanade, T. (1998). “A multibody factorization method for independently moving objects,” Int. J. Computer Vision, 29, 159-179.

Dreher, J. J. and O’Neill, J. (1957). “Effects of ambient noise on speaker intelligibility for words and phrases,” J. Acoust. Soc. Am., 29, 1320-1323.

Hanley, T. D. and Steer, M. D. (1949). “Effect of distracting noise upon speaking rate, duration, and intensity,” J. Speech Hear. Disord. 14, 363-368.

Harding, S., Cooke, M. P., and König, P. (2008). “Auditory gist perception: an alternative to attentional selection of auditory streams?” Lecture Notes in Artificial Intelligence 4840, 399-416.

Junqua, J. C. (1993). “The Lombard re ex and its role on human listeners and automatic speech recognizers,” J. Acoust. Soc. Am., 93, 510-524.

Kristjansson, T., Hershey, J., Olsen, P., Rennie, S. and Gopinath, R. (2006). “Super- human multi-talker speech recognition: the IBM 2006 Speech Separation Challenge system,” Proc. Interspeech, Pittsburgh, PA.

Local, J., Kelly, J., and Wells, W. (1986). “Towards a phonology of conversation: turn-taking in urban Tyneside speech,” J. Linguistics, 22, 411-437.

Lombard, E. (1911). “Le signe de l’elevation de la voix. Annales des Maladies de l’Oreille,” du Larynx, du Nez et du Pharynx, 37, 101-119.

Loomis, J. M., Hebert, C., and Cicinelli, J. G. (1990). “Active localization of virtual sounds,” J. Acoust. Soc. Am. 88, 1757-1764.

Lu, Y., and Cooke, M. P. (submitted). “Speech production modifications produced by competing talkers, babble and stationary noise,” submitted to J. Acoust. Soc. Am.

Lu, Y.-C., Cooke, M. P., and Christensen, H. (2007). “Active binaural distance estimation for dynamic sources,” Interspeech, Antwerp, Belgium.

Mackenson, P. (2004). “Auditive localization. Head movements, an additional cue in localization,” Ph. D. Thesis, Technical University of Berlin.

Maybank, S. J. (1993). “Theory of Reconstruction from Image Motion,” in Springer Series in Information Sciences, vol 28. Springer-Verlag.

Moore, R. K. (2007). “Spoken language processing: piecing together the puzzle,” Speech Communication, 49, 418-435.

Okuno, H. G. and Nakadai, K. (2003). “Real-Time Sound Source Localization and Separation Based on Active Audio-Visual Integration,” in Computational Methods in Neural Modeling, Lecture Notes in Computer Science 2686. Springer.

Palmer, A. R., Hall, D. A., Sumner, C. J., Barrett, D. J. K., Jones, S., Nakamoto, K., Moore, D. R. (2007). “Some investigations into non passive listening,” Hearing Research 229, 148-157.

Picheny, M. A., Durlach, N. I., and Braida, L. D. (1986). “Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech,” J. Speech Lang. Hear. Res. 29, 434-446.

Shinn-Cunningham, B. G. (2000). “Learning reverberation: Considerations for spatial auditory displays,” Proc. International Conference on Auditory Display, Atlanta, GA, 126-134.

Speigle, J. M. and Loomis, J. M. (1993). “Auditory distance perception by translating observers,” Proc. IEEE Symposium on research frontiers in virtual reality, San Jose, CA, 92-99.

Summers, W. V., Pisoni, D. B., Bernacki, R. H., Pedlow, R. I., and Stokes, M. A. (1988). “Effects of noise on speech production: Acoustic and perceptual analysis,” J. Acoust. Soc. Am., 84, 917-928.

Thurlow, W. R., Mangels, J. W., and Runge, P. S. (1967). “Head movements during sound localization,” J. Acoust. Soc. Am., 42, 489-493.

Wallach, H. (1940). “The role of head movements and vestibular and visual cues on sound localization,” J . Exp. Psychol., 27, 339-368.

Ward, D. B., Lehmann, E. A., and Williamson, R. C. (2003). “Particle filtering algorithms for tracking an acoustic source in a reverberant environment,” IEEE Trans. Speech Audio Processing, 11, 826-836.

Zahorik, P., Brungart, D. S., and Bronkhorst, A. W. (2005). “Auditory Distance Perception in Humans: A Summary of Past and Present Research,” Acta Acustica united with Acustica, 91, 409-420.

Zelnik-Manor, L., Machline, M., and Irani, M. (2006). “Multi-body Factorization With Uncertainty: Revisiting Motion Consistency,” Int. J. Computer Vision, 68, 27-41.

Additional Files

Published

2007-12-15

How to Cite

Cooke, M., Lu, Y.-C., Lu, Y., & Horaud, R. (2007). Active hearing, active speaking. Proceedings of the International Symposium on Auditory and Audiological Research, 1, 33–46. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2007-04

Issue

Section

2007/1. Auditory signal processing and perception