Instabilities in Convnets for Raw Audio
Abstract: What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal approximations. In our article, we approach this phenomenon from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law between the number and length of the filters, which is reminiscent of discrete wavelet bases.
- T. Necciari, N. Holighaus, P. Balazs, Z. Průša, P. Majdak, and O. Derrien, “Audlet filter banks: A versatile analysis/synthesis framework using auditory frequency scales,” Applied Sciences, vol. 8, no. 1, 2018.
- M.-A. Meier, T. Heaton, and J. Clinton, “The Gutenberg algorithm: Evolutionary Bayesian magnitude estimates for earthquake early warning with a filter bank,” Bulletin of the Seismological Society of America, vol. 105, no. 5, pp. 2774–2786, 2015.
- E. Chassande-Mottin, “Learning approach to the detection of gravitational wave transients,” Physical Review D, vol. 67, no. 10, 2003.
- K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan, “Filter bank common spatial pattern (FBCSP) in brain-computer interface,” in Proc. IEEE IJCNN, 2008, pp. 2390–2397.
- M. Dörfler, T. Grill, R. Bammer, and A. Flexer, “Basic filters for convolutional neural networks applied to music: Training or design?” Neural Computing and Applications, vol. 32, pp. 941–954, 2020.
- M. Ravanelli and Y. Bengio, “Speaker recognition from raw waveform with SincNet,” in Proc. IEEE SLT, 2018, pp. 1021–1028.
- N. Zeghidour, O. Teboul, F. de Chaumont Quitry, and M. Tagliasacchi, “LEAF: A learnable frontend for audio classification,” in Proc. ICLR, 2021.
- N. Zeghidour, N. Usunier, I. Kokkinos, T. Schaiz, G. Synnaeve, and E. Dupoux, “Learning filterbanks from raw speech for phone recognition,” in Proc. IEEE ICASSP, 2018, pp. 5509–5513.
- T. Sainath, R. J. Weiss, K. Wilson, A. W. Senior, and O. Vinyals, “Learning the speech front-end with raw waveform CLDNNs,” in Proc. INTERSPEECH, 2015.
- D. Stowell, “Computational bioacoustics with deep learning: A review and roadmap,” PeerJ, vol. 10, 2022.
- I. López-Espejo, Z.-H. Tan, and J. Jensen, “Exploring filterbank learning for keyword spotting,” in Proc. EUSIPCO, 2021, pp. 331–335.
- J. Schlüter and G. Gutenbrunner, “EfficientLEAF: A faster learnable audio frontend of questionable use,” in Proc. EUSIPCO. IEEE, 2022, pp. 205–208.
- F. J. Bravo Sanchez, M. R. Hossain, N. B. English, and S. T. Moore, “Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture,” Scientific Reports, vol. 11, no. 1, pp. 1–12, 2021.
- V. Lostanlen, D. Haider, H. Han, M. Lagrange, P. Balazs, and M. Ehler, “Fitting auditory filterbanks with multiresolution neural networks,” in Proc. IEEE WASPAA, 2023, pp. 1–5.
- M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, “Parseval networks: Improving robustness to adversarial examples,” Proc. ICML, 2017.
- K. Gupta, F. Kaakai, B. Pesquet-Popescu, J.-C. Pesquet, and F. D. Malliaros, “Multivariate Lipschitz analysis of the stability of neural networks,” Frontiers in Signal Processing, 2022.
- A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural networks: Analysis and efficient estimation,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- M. Ehler, “Preconditioning filter bank decomposition using structured normalized tight frames,” Journal of Applied Mathematics, vol. 2015, pp. 1 – 12, 2015.
- H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations,” The Annals of Mathematical Statistics, vol. 23, no. 4, pp. 493 – 507, 1952.
- L. Birgé and P. Massart, “Minimum contrast estimators on sieves: exponential bounds and rates of convergence,” Bernoulli, vol. 4, no. 3, pp. 329–375, 1998.
- B. Laurent and P. Massart, “Adaptive estimation of a quadratic functional by model selection,” Annals of Statistics, vol. 28, 2000.
- G. Barrera Vargas and P. Manrique-Mirón, “The asymptotic distribution of the condition number for random circulant matrices,” Extremes, vol. 25, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.