Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mono-to-stereo through parametric stereo generation

Published 26 Jun 2023 in cs.SD, cs.LG, and eess.AS | (2306.14647v1)

Abstract: Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we also propose to model the task with generative approaches, allowing to synthesize multiple and equally-plausible stereo renditions from the same mono signal. To achieve this, we consider both autoregressive and masked token modelling approaches. We provide evidence that the proposed PS-based models outperform a competitive classical decorrelation baseline and that, within a PS prediction framework, modern generative models outshine equivalent non-generative counterparts. Overall, our work positions both PS and generative modelling as strong and appealing methodologies for mono-to-stereo upmixing. A discussion of the limitations of these approaches is also provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. M. R. Schroeder, “An artificial stereophonic effect obtained from a single audio signal,” Journal of the Audio Engineering Society, vol. 6, no. 2, p. 74–79, 1958.
  2. B. B. Bauer, “Some techniques toward better stereophonic perspective,” IEEE Trans. on Audio, vol. 11, p. 88–92, 1963.
  3. R. Orban, “A rational technique for synthesizing pseudo-stereo from monophonic sources,” Journal of the Audio Engineering Society, vol. 18, no. 2, p. 157–164, 1970.
  4. C. Faller, “Pseudostereophony revisited,” in Proc. of the Audio Engineering Society Conv. (AES), 2005, p. 118.
  5. M. Fink, S. Kraft, and U. Zölzer, “Downmix-compatible conversion from mono to stereo in time- and frequency-domain,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx), 2015.
  6. C. Uhle and P. Gampp, “Mono-to-stereo upmixing,” in Proc. of the Audio Engineering Society Conv. (AES), 2016, p. 140.
  7. M. Lagrange, L. G. Martins, and G. Tzanetakis, “Semi-automatic mono to stereo up-mixing using sound source formation,” in Proc. of the Audio Engineering Society Conv. (AES), 2007, p. 122.
  8. D. Fitzgerald, “Upmixing from mono - A source separation approach,” in Proc. of the Int. Conf. on Digital Signal Processing (DSP), 2011.
  9. A. Delgado Castro and J. Szymanski, “Semi-automatic mono-to-stereo upmixing via separation of note events,” in Proc. of the AES Conf. on Immersive and Interactive Audio, 2019, p. 12.
  10. J. Pons, S. Pascual, G. Cengarle, and J. Serrà, “Upsampling artifacts in neural audio synthesis,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, p. 3005–3009.
  11. E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. Stöter, “Musical source separation: An introduction,” IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, 2018.
  12. C. J. Steinmetz, J. Pons, S. Pascual, and J. Serrà, “Automatic multitrack mixing with a differentiable mixing console of neural audio effects,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, p. 7175.
  13. C. J. Chun, S. H. Jeong, S. Y. Park, and H. K. Kim, “Extension of monaural to stereophonic sound based on deep neural networks,” in Proc. of the Audio Engineering Society Conv. (AES), 2015, p. 139.
  14. H. Purnhagen, “Low complexity parametric stereo coding in MPEG-4,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx), 2004, p. 163–168.
  15. J. Breebaart, S. van de Par, A. Kohlrausch, and E. Schuijers, “Parametric coding of stereo audio,” EURASIP Journal on Advances in Signal Processing, vol. 2005, no. 9, p. 1305–1322, 2005.
  16. A. Radford, K. Narashiman, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” Technical Report, OpenAI, 2018.
  17. H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “MaskGIT: masked generative image transformer,” in Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, p. 11315–11325.
  18. R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Computation, vol. 1, no. 2, p. 270–280, 1989.
  19. X. I. Chen, N. Mishra, M. Rohaninejad, and P. Abbeel, “PixelSNAIL: an improved autoregressive generative model,” in Proc. of the Int. Conf. on Machine learning (ICML), 2018, p. 864–872.
  20. J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in Proc. of the NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021.
  21. H. Chang, H. Zhang, J. Barber, A. J. Maschinot, J. Lezama, L. Jiang, M.-H. Yang, K. Murphy, W. T. Freeman, M. Rubinstein, Y. Li, and D. Krishnan, “Muse: text-to-image generation via masked generative transformers,” ArXiv: 2301.00704, 2023.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.