“Psychophysical” modulation transfer functions in a deep neural network trained for natural sound recognition

Takuya Koumura; Hiroki Terashima; Shigeto Furukawa

“Psychophysical” modulation transfer functions in a deep neural network trained for natural sound recognition

Authors

Takuya Koumura NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan http://orcid.org/0000-0002-8380-9598
Hiroki Terashima NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan
Shigeto Furukawa NTT Communication Science Laboratories 3-1, Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198 Japan

Abstract

Representation of amplitude modulation (AM) has been characterized by neurophysiological and psychophysical modulation transfer functions (MTFs). Our recent computational study demonstrated that a deep neural network (DNN) trained for natural sound recognition serves as a good model for explaining the functional significance of neuronal MTFs derived physiologically. The present study addresses the question of whether the DNN can provide insights into AM-related human behaviours such as AM detectability. Specifically, we measured “psychophysical” MTFs in our previously developed DNN model. We presented to the DNN sinusoidally amplitude-modulated white noise with various AM rates, and quantified AM detectability as d′ derived from the model’s internal representations of modulated and non-modulated stimuli. The overall d′ increased along the layer cascade, with human-level detectability observed in the higher layers. In a given layer, the d′ tended to decrease with increasing AM rates and with decreasing AM depth, which is reminiscent of a psychophysical MTF. The results suggest that a DNN trained for natural sound recognition can serve as a model for understanding psychophysical AM detectability. Since our approach is not specific to AM, the present paradigm opens the possibility of exploring a broad range of auditory functions that can be evaluated by psychophysical experiments.

References

Averbeck, B.B., Lee, D. (2006). “Effects of Noise Correlations on Information Encoding and Decoding,” J. Neurophysiol., 95, 3633–3644. doi:10.1152/jn.00919.2005.
Dau, T., Kollmeier, B., Kohlrausch, A. (1997). “Modeling auditory processing of amplitude modulation .1. Detection and masking with narrow-band carriers,” J. Acoust. Soc. Am., 102(5), 2892-2905. doi:10.1121/1.420344.
Gygi, B., Kidd, G.R., Watson, C.S. (2004). “Spectral-temporal factors in the identification of environmental sounds,” J. Acoust. Soc. Am., 115, 1252– 1265. doi:10.1121/1.1635840.
He, K., Zhang, X., Ren, S., Sun, J. (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” Proceedings IEEE International Conference on Computer Vision (ICCV), 1026–1034 . doi:10.1109/ICCV.2015.123.
Joris, P.X., Schreiner, C.E., Rees, A. (2004). “Neural processing of amplitude- modulated sounds,” Physiol. Rev., 84, 541–577. doi:10.1152/physrev.00029.2003.
Kandel, E.R., Schwartz, J.H., Jessell, T.M. (2000). “Principles of Neural Science,” Fourth Edition. New York, NY: McGraw-Hill.
Koumura, T., Terashima, H., Furukawa, S. (2019). “Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition,” J. Neurosci., 39, 5517–5533. doi:10.1523/JNEUROSCI.2914-18.2019.
Lorenzi, C., Soares, C., Vonner, T. (2001). “Second-order temporal modulation transfer functions,” J. Acoust. Soc. Am., 110, 1030–1038. doi:10.1121/1.1383295
Piczak, K.J. (2015). “ESC : Dataset for Environmental Sound Classification,” In 23rd ACM International Conference on Multimedia, 1015–1018.
Shannon, R.V., Zeng, F-G., Kamath, V., Wygonski, J., Ekelid, M. (1995). “Speech Recognition with Primarily Temporal Cues,” Science, 270, 303–304. doi:10.1126/science.270.5234.303.
Viemeister, N.F. (1979). “Temporal modulation transfer functions based upon modulation thresholds,” J. Acoust. Soc. Am., 66, 1364–1380. doi:10.1121/1.383531.

Auditory Learning in Biological and Artificial Systems

Additional Files

Published

2020-04-20

How to Cite

Koumura, T., Terashima, H., & Furukawa, S. (2020). “Psychophysical” modulation transfer functions in a deep neural network trained for natural sound recognition. Proceedings of the International Symposium on Auditory and Audiological Research, 7, 157–164. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2019-19

Download Citation

Issue

Vol. 7 (2019): Auditory Learning in Biological and Artificial Systems

Section

2019/3. Machine listening and intelligent auditory signal processing

License

Authors who publish with this journal agree to the following terms:

a. Authors retain copyright* and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

*From the 2017 issue onward. The Danavox Jubilee Foundation owns the copyright of all articles published in the 1969-2015 issues. However, authors are still allowed to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

“Psychophysical” modulation transfer functions in a deep neural network trained for natural sound recognition

Authors

Abstract

References

Additional Files

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Language