Fishing for meaningful units in connected speech

Forfattere

  • Peter Juel Henrichsen Center for Computational Modelling of Language (CMOL), Copenhagen Business School, DK-2000 Frederiksberg, Denmark
  • Thomas Ulrich Chistiansen Centre for Applied Hearing Research (CAHR), Technical University of Denmark, DK-2800 Lyngby, Denmark

Resumé

In many branches of spoken language analysis including Automatic Speech Recognition (ASR), the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, shing for phones is dif cult, error-prone, and computationally expensive. We present an experiment, based on machine learning, with an alternative approach. Instead of stipulating a basic set of target units, the determination of the set is considered to be part of the learning task. Given 18 recordings of Danish talkers performing a simple lab task, our algorithm produced a set of acoustically well- de ned units suf cient for identifying all the major semantic elements (be they parts of words, single words or several words), relevant to the task. As the sound encoding used was very simple – fundamental frequency (F0), Harmonicity- to-Noise-Ratio (HNR), and Intensity samples only – the computational complexity involved was far lower than for phonemic recognition. Our ndings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, shing for simple meaningful cues and enhancing these selectively would potentially be a more effective way of achieving intelligibility transfer, which is the end goal for speech transducing technologies.

Referencer

Boersma, P. (1993). “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” IFA Proceedings 17, 97-110.

Boersma, P. (2001). “Praat, a system for doing phonetics by computer”, Glot. International 5:9/10, 341-345.

Bratko, I. (2000). Prolog Programming for Arti cial Intelligence, Third Edition (Addison-Wesley E).

Grønnum, N. (2009). “A Danish phonetically annotated spontaneous speech corpus (DanPASS)”, Speech Communication 51, 594-603.

Henrichsen, P. J. (2004). “Siblings and Cousins; Statistical Methods for Spoken Language Analysis”, Acta Linguistica Hafniensia 36, 7-33.

Yderligere filer

Publiceret

2009-12-15

Citation/Eksport

Henrichsen, P. J., & Chistiansen, T. U. (2009). Fishing for meaningful units in connected speech. Proceedings of the International Symposium on Auditory and Audiological Research, 2, 327–334. Hentet fra https://proceedings.isaar.eu/index.php/isaarproc/article/view/2009-33

Nummer

Sektion

2009/3. Speech processing and perception under adverse conditions