Towards automatic speech recognition based on cochlear traveling wave delay trajectories

Authors

  • Tamás Harczos Fraunhofer Institute for Digital Media Technology (Fraunhofer IDMT), 98693 Ilmenau, Germany; Faculty of Information Technology, Péter Pázmány Catholic University, Budapest, Hungary
  • Gero Szepannek Department of Statistics, University Dortmund, 44227 Dortmund, Germany
  • Frank Klefenz Fraunhofer Institute for Digital Media Technology (Fraunhofer IDMT), 98693 Ilmenau, Germany

Abstract

The evolution of automatic speech recognition (ASR) points out that employing principles having counterparts in the human auditory system may lead to better performance. Mel- or bark-warping of the frequencies, masking, compression and adaptation are some of these techniques. Hearing has already been modeled up to the cochlear nucleus (CN) to some degree. However, only few people question, whether one of the very first steps, namely the modeling of the basilar membrane delay trajectories, has been modeled and utilized sufficiently fair. To find the answer, we use an extraordinarily precise auditory model, and try to extract the excitation-dependent shapes of the delay trajectories. We use these features without any other spectral information to carry out speech recognition tasks under different noise conditions on the TIMIT database. We found that the shapes of the cochlear delay trajectories carry precious information, which can be extracted even in the presence of noise. This finding may play an important role in next generation cochlear implants.

References

Baumgarte, F. (2000). “Ein psychophysiologisches Gehörmodell zur Nachbildung von Wahrnehmungsschwellen für die Audiocodierung,” Ph.D. thesis, Uni. Han- nover, Germany.

Brückmann, A., Klefenz, F., and Wünsche, A. (2004). “A neural net for 2D-slope and sinusoidal shape detection,” Int. Scient. J. of Computing, 3/1, 21-26.

Epstein, A., Paul, G. U., Vettermann, B., Boulin, C., and Klefenz, F. (2002). “A Paral- lel Systolic Array ASIC for Real-Time Execution of the Hough Transform,” IEEE Trans. Nuclear Science, 49/2, 339-346.

Greenberg, S., Poeppel, D., and Roberts, T. (1998). “A space-time theory of pitch and timbre based on cortical expansion of the cochlear traveling wave delay,” Psy- chophysical and Physiological Advances in Hearing Proceedings, London, UK, 293-300.

Haeb-Umbach, R., Geller, D., Ney H. (1993). “Improvements in connected digit recognition using linear discriminant analysis and mixture densities,” IEEE ICASSP Proceedings, Adelaide, Australia, II / 239-242.

Harczos, T., Szepannek, G., Kátai, A., and Klefenz, F. (2006). “An auditory model based vowel classification,” IEEE BioCAS Proceedings, London, UK, 69-72.

Harczos, T., Klefenz, F., and Kátai, A. (2006). “A neurobiologically inspired vowel recognizer using Hough-transform,” VISAPP Proceedings, 1, 251-256.

Hermansky, H., and Morgan, N. (1994). “Rasta processing of speech,” IEEE Trans. Speech Audio Process., 2, 578-589.

Hodgkin, A. L., and Huxley, A. F. (1952). “A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve,” J. of Physiology, 117, 500-544.

Holmberg, M., Gelbart, D., and Hemmert, W. (2006). “Automatic Speech Recognition With an Adaptation Model Motivated by Auditory Processing,” IEEE Trans. Speech and Language Process., 14/1, 43-49.

Hubel, D. H., Wiesel, T. N., and Stryker, M. P. (1978). “Anatomical demonstration of orientation columns in macaque monkey,” J. Comparative Neurology, 177, 361-380.

Moore, B. C. J. (Editor). (1995). “Hearing,” Academic Press, ISBN 0-1250-5626-5.

Papandreou-Suppappola, A. (Editor). (2003). “Applications in Time-Frequency Signal Processing,” CRC Press, ISBN 0-8493-0065-7.

Sumner, C. J., O’Mard, L. P., Lopez-Poveda, E. A., and Meddis, R. (2002). “A revised model of the inner-hair cell and auditory nerve complex,” J. Acoust. Soc. Am., 111/5, 2178-2189.

Van Swaaij, M., Catthoor, F., and De Man, H. (1990). “Deriving ASIC architectures for the Hough transform,” Parallel Computing, 16, 113-121.

Woodland, P. C., and Young, S. J. (1993). “The HTK tied-state continuous speech recognizer,” EuroSpeech Proceedings, 2207-2210.

Zeng, F.-G., Popper, A. N., and Fay, R. R. (Editors). (2004). “Cochlear Implants: Auditory Prostheses and Electric Hearing,” Springer, ISBN 0-3874-0646-8, 2004.

Zue, V., Seneff, S., and Glass, J. (1990). “Speech database development at MIT: TIMIT and beyond,” Speech Communication, 9, 351-356.

Additional Files

Published

2007-12-15

How to Cite

Harczos, T., Szepannek, G., & Klefenz, F. (2007). Towards automatic speech recognition based on cochlear traveling wave delay trajectories. Proceedings of the International Symposium on Auditory and Audiological Research, 1, 83–92. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2007-08

Issue

Section

2007/1. Auditory signal processing and perception