“Psychophysical” modulation transfer functions in a deep neural network trained for natural sound recognition

  • Takuya Koumura NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan http://orcid.org/0000-0002-8380-9598
  • Hiroki Terashima NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan
  • Shigeto Furukawa NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan

Abstract

Representation of amplitude modulation (AM) has been characterized by neurophysiological and psychophysical modulation transfer functions (MTFs). Our recent computational study demonstrated that a deep neural network (DNN) trained for natural sound recognition serves as a good model for explaining the functional significance of neuronal MTFs derived physiologically. The present study addresses the question of whether the DNN can provide insights into AM-related human behaviours such as AM detectability. Specifically, we measured “psychophysical” MTFs in our previously developed DNN model. We presented to the DNN sinusoidally amplitude-modulated white noise with various AM rates, and quantified AM detectability as d′ derived from the model’s internal representations of modulated and non-modulated stimuli. The overall d′ increased along the layer cascade, with human-level detectability observed in the higher layers. In a given layer, the d′ tended to decrease with increasing AM rates and with decreasing AM depth, which is reminiscent of a psychophysical MTF. The results suggest that a DNN trained for natural sound recognition can serve as a model for understanding psychophysical AM detectability. Since our approach is not specific to AM, the present paradigm opens the possibility of exploring a broad range of auditory functions that can be evaluated by psychophysical experiments.

References

Averbeck, B.B., Lee, D. (2006). “Effects of Noise Correlations on Information Encoding and Decoding,” J. Neurophysiol., 95, 3633–3644. doi:10.1152/jn.00919.2005.

Dau, T., Kollmeier, B., Kohlrausch, A. (1997). “Modeling auditory processing of amplitude modulation .1. Detection and masking with narrow-band carriers,” J. Acoust. Soc. Am., 102(5), 2892-2905. doi:10.1121/1.420344.

Gygi, B., Kidd, G.R., Watson, C.S. (2004). “Spectral-temporal factors in the identification of environmental sounds,” J. Acoust. Soc. Am., 115, 1252– 1265. doi:10.1121/1.1635840.

He, K., Zhang, X., Ren, S., Sun, J. (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” Proceedings IEEE International Conference on Computer Vision (ICCV), 1026–1034 . doi:10.1109/ICCV.2015.123.

Joris, P.X., Schreiner, C.E., Rees, A. (2004). “Neural processing of amplitude- modulated sounds,” Physiol. Rev., 84, 541–577. doi:10.1152/physrev.00029.2003.

Kandel, E.R., Schwartz, J.H., Jessell, T.M. (2000). “Principles of Neural Science,” Fourth Edition. New York, NY: McGraw-Hill.

Koumura, T., Terashima, H., Furukawa, S. (2019). “Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition,” J. Neurosci., 39, 5517–5533. doi:10.1523/JNEUROSCI.2914-18.2019.

Lorenzi, C., Soares, C., Vonner, T. (2001). “Second-order temporal modulation transfer functions,” J. Acoust. Soc. Am., 110, 1030–1038. doi:10.1121/1.1383295

Piczak, K.J. (2015). “ESC : Dataset for Environmental Sound Classification,” In 23rd ACM International Conference on Multimedia, 1015–1018.

Shannon, R.V., Zeng, F-G., Kamath, V., Wygonski, J., Ekelid, M. (1995). “Speech Recognition with Primarily Temporal Cues,” Science, 270, 303–304. doi:10.1126/science.270.5234.303.

Viemeister, N.F. (1979). “Temporal modulation transfer functions based upon modulation thresholds,” J. Acoust. Soc. Am., 66, 1364–1380. doi:10.1121/1.383531.

Published
2020-04-20
How to Cite
Koumura, T., Terashima, H., & Furukawa, S. (2020). “Psychophysical” modulation transfer functions in a deep neural network trained for natural sound recognition. Proceedings of the International Symposium on Auditory and Audiological Research, 7, 157-164. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2019-19
Section
2019/3. Machine listening and intelligent auditory signal processing