Towards automatic speech recognition based on cochlear traveling wave delay trajectories
Abstract
The evolution of automatic speech recognition (ASR) points out that employing principles having counterparts in the human auditory system may lead to better performance. Mel- or bark-warping of the frequencies, masking, compression and adaptation are some of these techniques. Hearing has already been modeled up to the cochlear nucleus (CN) to some degree. However, only few people question, whether one of the very first steps, namely the modeling of the basilar membrane delay trajectories, has been modeled and utilized sufficiently fair. To find the answer, we use an extraordinarily precise auditory model, and try to extract the excitation-dependent shapes of the delay trajectories. We use these features without any other spectral information to carry out speech recognition tasks under different noise conditions on the TIMIT database. We found that the shapes of the cochlear delay trajectories carry precious information, which can be extracted even in the presence of noise. This finding may play an important role in next generation cochlear implants.
References
Brückmann, A., Klefenz, F., and Wünsche, A. (2004). “A neural net for 2D-slope and sinusoidal shape detection,” Int. Scient. J. of Computing, 3/1, 21-26.
Epstein, A., Paul, G. U., Vettermann, B., Boulin, C., and Klefenz, F. (2002). “A Paral- lel Systolic Array ASIC for Real-Time Execution of the Hough Transform,” IEEE Trans. Nuclear Science, 49/2, 339-346.
Greenberg, S., Poeppel, D., and Roberts, T. (1998). “A space-time theory of pitch and timbre based on cortical expansion of the cochlear traveling wave delay,” Psy- chophysical and Physiological Advances in Hearing Proceedings, London, UK, 293-300.
Haeb-Umbach, R., Geller, D., Ney H. (1993). “Improvements in connected digit recognition using linear discriminant analysis and mixture densities,” IEEE ICASSP Proceedings, Adelaide, Australia, II / 239-242.
Harczos, T., Szepannek, G., Kátai, A., and Klefenz, F. (2006). “An auditory model based vowel classification,” IEEE BioCAS Proceedings, London, UK, 69-72.
Harczos, T., Klefenz, F., and Kátai, A. (2006). “A neurobiologically inspired vowel recognizer using Hough-transform,” VISAPP Proceedings, 1, 251-256.
Hermansky, H., and Morgan, N. (1994). “Rasta processing of speech,” IEEE Trans. Speech Audio Process., 2, 578-589.
Hodgkin, A. L., and Huxley, A. F. (1952). “A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve,” J. of Physiology, 117, 500-544.
Holmberg, M., Gelbart, D., and Hemmert, W. (2006). “Automatic Speech Recognition With an Adaptation Model Motivated by Auditory Processing,” IEEE Trans. Speech and Language Process., 14/1, 43-49.
Hubel, D. H., Wiesel, T. N., and Stryker, M. P. (1978). “Anatomical demonstration of orientation columns in macaque monkey,” J. Comparative Neurology, 177, 361-380.
Moore, B. C. J. (Editor). (1995). “Hearing,” Academic Press, ISBN 0-1250-5626-5.
Papandreou-Suppappola, A. (Editor). (2003). “Applications in Time-Frequency Signal Processing,” CRC Press, ISBN 0-8493-0065-7.
Sumner, C. J., O’Mard, L. P., Lopez-Poveda, E. A., and Meddis, R. (2002). “A revised model of the inner-hair cell and auditory nerve complex,” J. Acoust. Soc. Am., 111/5, 2178-2189.
Van Swaaij, M., Catthoor, F., and De Man, H. (1990). “Deriving ASIC architectures for the Hough transform,” Parallel Computing, 16, 113-121.
Woodland, P. C., and Young, S. J. (1993). “The HTK tied-state continuous speech recognizer,” EuroSpeech Proceedings, 2207-2210.
Zeng, F.-G., Popper, A. N., and Fay, R. R. (Editors). (2004). “Cochlear Implants: Auditory Prostheses and Electric Hearing,” Springer, ISBN 0-3874-0646-8, 2004.
Zue, V., Seneff, S., and Glass, J. (1990). “Speech database development at MIT: TIMIT and beyond,” Speech Communication, 9, 351-356.
Additional Files
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright* and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
*From the 2017 issue onward. The Danavox Jubilee Foundation owns the copyright of all articles published in the 1969-2015 issues. However, authors are still allowed to share the work with an acknowledgement of the work's authorship and initial publication in this journal.