Papers
Topics
Authors
Recent
Search
2000 character limit reached

Perceptually-motivated Spatial Audio Codec for Higher-Order Ambisonics Compression

Published 24 Jan 2024 in eess.AS and eess.SP | (2401.13401v1)

Abstract: Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices. The number of channels required to deliver high spatial resolution Ambisonic audio, however, can be prohibitive for low-bandwidth applications. Therefore, this paper proposes a compression codec, which is based upon the parametric higher-order Directional Audio Coding (HO-DirAC) model. The encoder downmixes the higher-order Ambisonic (HOA) input audio into a reduced number of signals, which are accompanied by perceptually-motivated scene parameters. The downmixed audio is coded using a perceptual audio coder, whereas the parameters are grouped into perceptual bands, quantized, and downsampled. On the decoder side, low Ambisonic orders are fully recovered. Not fully recoverable HOA components are synthesized according to the parameters. The results of a listening test indicate that the proposed parametric spatial audio codec can improve the adopted perceptual audio coder, especially at low to medium-high bitrates, when applied to fifth-order HOA signals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Michael A Gerzon, “Periphony: With-height sound reproduction,” Journal of the Audio Engineering Society, vol. 21, no. 1, pp. 2–10, 1973.
  2. Ambisonics: A practical 3D audio theory for recording, studio production, sound reinforcement, and virtual reality, Springer Nature, 2019.
  3. “Investigation on localisation accuracy for first and higher order ambisonics reproduced sound sources,” Acta Acustica united with Acustica, vol. 99, no. 4, pp. 642–657, 2013.
  4. “Spatial perception of sound fields recorded by spherical microphone arrays with varying spatial resolution,” J. Acoustical Society of America, vol. 133, no. 5, pp. 2711–2721, 2013.
  5. “Definition of the opus audio codec,” Tech. Rep., 2012.
  6. “Perceptual evaluation of bitrate compressed ambisonic scenes in loudspeaker based reproduction,” in AES International Conference on Immersive and Interactive Audio, 2019.
  7. “Auditory localization in low-bitrate compressed ambisonic scenes,” Applied Sciences, vol. 9, no. 13, pp. 2618, 2019.
  8. “Context-based evaluation of the opus audio codec for spatial audio content in virtual reality,” Journal of the Audio Engineering Society, vol. 71, no. 4, pp. 145–154, 2023.
  9. “MPEG-H audio—the new standard for universal spatial/3D audio coding,” Journal of the Audio Engineering Society, vol. 62, no. 12, pp. 821–830, 2015.
  10. “Frequency domain singular value decomposition for efficient spatial audio coding,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.
  11. “Spatial Audio Compression with Adaptive Singular Value Decomposition Using Reconstructed Frames,” in Proceedings of the AES International Conference on Audio for Virtual and Augmented Reality, 2022, vol. 2022-Augus, pp. 258–263.
  12. “Higher order ambisonics compression method based on independent component analysis,” in 150th Audio Engineering Society Conv., 2021.
  13. 3GPP, “IVAS codec public collaboration,” avaliable at https://forge.3gpp.org /rep/ivas-codec-pc/ivas-codec., 2022.
  14. “Perceptual evaluation of headphone auralization of rooms captured with spherical microphone arrays with respect to spaciousness and timbre,” J. Acoustical Society of America, vol. 145, no. 4, 2019.
  15. Ville Pulkki, “Spatial sound reproduction with directional audio coding,” Journal of the Audio Engineering Society, vol. 55, no. 6, pp. 503–516, 2007.
  16. “Compression of higher-order ambisonic signals using directional audio coding,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 651–665, 2024.
  17. “Optimizing higher-order directional audio coding with adaptive mixing and energy matching for ambisonic compression and upmixing,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023.
  18. “Parametric spatial audio effects based on the multi-directional decomposition of ambisonic sound scenes,” in 2021 24th International Conference on Digital Audio Effects (DAFx). IEEE, 2021, pp. 214–221.
  19. “Time–frequency processing: Methods and tools,” Parametric Time-Frequency Domain Spatial Audio, pp. 1–24, 2017.
  20. “Spatial Filter Bank Design in the Spherical Harmonic Domain,” in 29th European Signal Processing Conference (EUSIPCO). 2021, IEEE.
  21. “Sector-based parametric sound field reproduction in the spherical harmonic domain,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 852–866, 2015.
  22. Boaz Rafaely, Fundamentals of spherical array processing, vol. 8, Springer, 2015.
  23. “Spatial Filter Bank in the Spherical Harmonic Domain: Reconstruction and Application,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2021, IEEE.
  24. Eberhard Zwicker, “Subdivision of the audible frequency range into critical bands (frequenzgruppen),” J. Acoustical Society of America, vol. 33, no. 2, pp. 248–248, 1961.
  25. Michael Burrows, “A block-sorting lossless data compression algorithm,” SRS Research Report, vol. 124, 1994.
  26. David A Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
  27. “Binaural Rendering of Ambisonic Signals via Magnitude Least Squares,” in Fortschritte der Akustik – DAGA, 2018, pp. 339–342.
  28. ITU Radiocommunication Assembly, “ITU-R BS. 1534-3: Method for the subjective assessment of intermediate quality level of audio systems,” 2015.
  29. “On the relative importance of spatial and timbral fidelities in judgments of degraded multichannel audio quality,” J. Acoustical Society of America, vol. 118, no. 2, pp. 968–976, 2005.
  30. “Parametric architecture for the transmission and binaural reproduction of microphone array recordings,” in AES 2023 International Conference on Spatial and Immersive Audio, 2023.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.