Papers
Topics
Authors
Recent
Search
2000 character limit reached

Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications

Published 28 Oct 2024 in cs.SD, cs.AI, cs.MM, and eess.AS | (2410.21478v1)

Abstract: This paper investigates the industrial setting of real-time classification of early media exchanged during the initialization phase of voice calls. We explore the application of state-of-the-art audio tagging models and highlight some limitations when applied to the classification of early media. While most existing approaches leverage convolutional neural networks, we propose a novel approach for low-resource requirements based on gradient-boosted trees. Our approach not only demonstrates a substantial improvement in runtime performance, but also exhibits a comparable accuracy. We show that leveraging knowledge distillation and class aggregation techniques to train a simpler and smaller model accelerates the classification of early media in voice calls. We provide a detailed analysis of the results on a proprietary and publicly available dataset, regarding accuracy and runtime performance. We additionally report a case study of the achieved performance improvements at a regional data center in India.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. B. Goode, “Voice over internet protocol (voip),” Proceedings of the IEEE, vol. 90, no. 9, pp. 1495–1517, 2002.
  2. E. Schooler, J. Rosenberg, H. Schulzrinne, A. Johnston, G. Camarillo, J. Peterson, R. Sparks, and M. J. Handley, “SIP: Session Initiation Protocol.” RFC 3261, July 2002.
  3. H. Nielsen, J. Mogul, L. M. Masinter, R. T. Fielding, J. Gettys, P. J. Leach, and T. Berners-Lee, “Hypertext Transfer Protocol – HTTP/1.1.” RFC 2616, June 1999.
  4. H. Schulzrinne and G. Camarillo, “Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP).” RFC 3960, Dec. 2004.
  5. A. Bugatti, A. Flammini, and P. Migliorati, “Audio classification in speech and music: a comparison between a statistical and a neural approach,” EURASIP Journal on Advances in Signal Processing, vol. 2002, 2002.
  6. Y. Lavner and D. Ruinskiy, “A decision-tree-based algorithm for speech/music classification and segmentation,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2009, pp. 1–14, 2009.
  7. K. Choi, G. Fazekas, and M. Sandler, “Automatic tagging using deep convolutional neural networks,” arXiv preprint arXiv:1606.00298, 2016.
  8. T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere, “The million song dataset,” 2011.
  9. H. Yu, C. Chen, X. Du, Y. Li, A. Rashwan, L. Hou, P. Jin, F. Yang, F. Liu, J. Kim, and J. Li, “TensorFlow Model Garden.” https://github.com/tensorflow/models, 2020.
  10. Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
  11. J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
  12. K. K. Mohammed, E. I. Abd El-Latif, N. E. El-Sayad, A. Darwish, and A. E. Hassanien, “Radio frequency fingerprint-based drone identification and classification using mel spectrograms and pre-trained yamnet neural,” Internet of Things, vol. 23, p. 100879, 2023.
  13. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
  14. A. H. Nour-Eldin and P. Kabal, “Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech,” in Conference of the International Speech Communication Association, 2008.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.