The next generation of audio intelligence: A survey-based perspective on improving audio analysis


  • Björn Schuller GLAM – Group on Language, Audio & Music, Imperial College London, SW7 2AZ London, UK; Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany; audEERING GmbH, 82205 Gilching, Germany
  • Shahin Amiriparian Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany
  • Gil Keren Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany
  • Alice Baird Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany
  • Maximilian Schmitt Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany
  • Nicholas Cummins Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany


Computer Audition, Audio Intelligence, Survey, Auditory Scene Analysis, Source Separation, Audio Ontologies, Audio Diarisation, Audio Understanding


Computer audition has made major progress over the past decades; however it is still far from achieving human-level hearing abilities. Imagine, for example, the sounds associated with putting a water glass onto a table. As humans, we would be able to roughly “hear” the material of the glass, the table, and perhaps even how full the glass is. Current machine listening approaches, on the other hand, would mainly recognise the event of “glass put onto a table”. In this context, this contribution aims to provide key insight into the already made remarkable advances in computer audition. It also identifies deficits in reaching human-like hearing abilities, such as in the given example. We summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio diarisation, source separation, understanding, but also ontologisation. Based on this, concluding avenues are given towards reaching the ambitious goal of “holistic human-parity” machine listening abilities – the next generation of audio intelligence.


Schuller, B., Amiriparian, S., Keren, G., Baird, A., Schmitt, M., & Cummins, N. (2020). The next generation of audio intelligence: A survey-based perspective on improving audio analysis. Proceedings of the International Symposium on Auditory and Audiological Research, 7, 101–112.



