Siamese SIREN: Audio Compression with Implicit Neural Representations
Abstract: Implicit Neural Representations (INRs) have emerged as a promising method for representing diverse data modalities, including 3D shapes, images, and audio. While recent research has demonstrated successful applications of INRs in image and 3D shape compression, their potential for audio compression remains largely unexplored. Motivated by this, we present a preliminary investigation into the use of INRs for audio compression. Our study introduces Siamese SIREN, a novel approach based on the popular SIREN architecture. Our experimental results indicate that Siamese SIREN achieves superior audio reconstruction fidelity while utilizing fewer network parameters compared to previous INR architectures.
- Seeing implicit neural representations as fourier series. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2283–2292, 2021.
- Fast and easy crowdsourced perceptual audio evaluation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 619–623, 2016. doi: 10.1109/ICASSP.2016.7471749.
- Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8628–8638, 2021.
- Visqol v3: An open source production ready objective speech and audio metric, 2020.
- Coin: Compression with implicit neural representations, 2021.
- Coin++: Data agnostic neural compression. arXiv preprint arXiv:2201.12904, 2022.
- Fréchet audio distance: A metric for evaluating music enhancement algorithms, 2019.
- Adam: A method for stochastic optimization, 2017.
- Cdpam: Contrastive learning for perceptual audio similarity, 2021.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4460–4470, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Librispeech: an asr corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 5206–5210. IEEE, 2015.
- Deepsdf: Learning continuous signed distance functions for shape representation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 165–174, 2019.
- Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2:749–752 vol.2, 2001.
- Sainburg, T. timsainb/noisereduce: v1.0, June 2019. URL https://doi.org/10.5281/zenodo.3243139.
- Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33:7462–7473, 2020.
- Implicit neural representations for image compression, 2022.
- Hypersound: Generating implicit neural representations of audio signals with hypernetworks, 2022.
- An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19:2125–2136, 2011.
- Fourier features let networks learn high frequency functions in low dimensional domains, 2020.
- Automatic musical genre classification of audio signals, 2001. URL http://ismir2001.ismir.net/pdf/tzanetakis.pdf.
- Evaluating generative audio systems and their metrics. In International Society for Music Information Retrieval Conference, 2022.
- pixelnerf: Neural radiance fields from one or few images. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4576–4585, 2020.
- Towards lightweight controllable audio synthesis with conditional implicit neural representations, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.