Papers
Topics
Authors
Recent
Search
2000 character limit reached

Siamese SIREN: Audio Compression with Implicit Neural Representations

Published 22 Jun 2023 in cs.SD, cs.AI, cs.LG, and eess.AS | (2306.12957v1)

Abstract: Implicit Neural Representations (INRs) have emerged as a promising method for representing diverse data modalities, including 3D shapes, images, and audio. While recent research has demonstrated successful applications of INRs in image and 3D shape compression, their potential for audio compression remains largely unexplored. Motivated by this, we present a preliminary investigation into the use of INRs for audio compression. Our study introduces Siamese SIREN, a novel approach based on the popular SIREN architecture. Our experimental results indicate that Siamese SIREN achieves superior audio reconstruction fidelity while utilizing fewer network parameters compared to previous INR architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Seeing implicit neural representations as fourier series. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.  2283–2292, 2021.
  2. Fast and easy crowdsourced perceptual audio evaluation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  619–623, 2016. doi: 10.1109/ICASSP.2016.7471749.
  3. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8628–8638, 2021.
  4. Visqol v3: An open source production ready objective speech and audio metric, 2020.
  5. Coin: Compression with implicit neural representations, 2021.
  6. Coin++: Data agnostic neural compression. arXiv preprint arXiv:2201.12904, 2022.
  7. Fréchet audio distance: A metric for evaluating music enhancement algorithms, 2019.
  8. Adam: A method for stochastic optimization, 2017.
  9. Cdpam: Contrastive learning for perceptual audio similarity, 2021.
  10. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4460–4470, 2019.
  11. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  12. Librispeech: an asr corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp.  5206–5210. IEEE, 2015.
  13. Deepsdf: Learning continuous signed distance functions for shape representation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  165–174, 2019.
  14. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2:749–752 vol.2, 2001.
  15. Sainburg, T. timsainb/noisereduce: v1.0, June 2019. URL https://doi.org/10.5281/zenodo.3243139.
  16. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33:7462–7473, 2020.
  17. Implicit neural representations for image compression, 2022.
  18. Hypersound: Generating implicit neural representations of audio signals with hypernetworks, 2022.
  19. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19:2125–2136, 2011.
  20. Fourier features let networks learn high frequency functions in low dimensional domains, 2020.
  21. Automatic musical genre classification of audio signals, 2001. URL http://ismir2001.ismir.net/pdf/tzanetakis.pdf.
  22. Evaluating generative audio systems and their metrics. In International Society for Music Information Retrieval Conference, 2022.
  23. pixelnerf: Neural radiance fields from one or few images. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  4576–4585, 2020.
  24. Towards lightweight controllable audio synthesis with conditional implicit neural representations, 2021.
Citations (6)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.