Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Published 26 May 2024 in cs.SD and eess.AS | (2405.16687v1)

Abstract: The Charlie Parker Omnibook is a cornerstone of jazz music education, described by pianist Ethan Iverson as "the most important jazz education text ever published". In this work we propose a new transcription pipeline and explore the extent to which state of the art music technology is able to reconstruct these scores directly from the audio without human intervention. Our pipeline includes: a newly trained source separation model for saxophone, a new MIDI transcription model for solo saxophone and an adaptation of an existing MIDI-to-score method for monophonic instruments. To assess this pipeline we also provide an enhanced dataset of Charlie Parker transcriptions as score-audio pairs with accurate MIDI alignments and downbeat annotations. This represents a challenging new benchmark for automatic audio-to-score transcription that we hope will advance research into areas beyond transcribing audio-to-MIDI alone. Together, these form another step towards producing scores that musicians can use directly, without the need for onerous corrections or revisions. To facilitate future research, all model checkpoints and data are made available to download along with code for the transcription pipeline. Improvements in our modular pipeline could one day make the automatic transcription of complex jazz solos a routine possibility, thereby enriching the resources available for music education and preservation.

Abstract PDF HTML Upgrade to Chat

Summary

Reconstructing the Charlie Parker Omnibook: Exploring an Audio-to-Score Automatic Transcription Pipeline

In the pursuit of advancing music education and preservation, the paper titled "Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline" presents a meticulous exploration into a novel transcription pipeline aimed at converting audio recordings into musical scores with minimal human intervention. The paper focuses specifically on the transcriptions of Charlie Parker's improvisational jazz solos, which are complex and culturally significant.

Research Context and Objectives

The research builds upon recent developments in music information retrieval (MIR), emphasizing the need for high-quality automated music transcriptions. The Charlie Parker Omnibook, a vital educational resource, serves as both the subject and testing ground for the proposed pipeline. The paper aims to create a benchmark for automatic transcription systems capable of handling intricate jazz solos, thereby addressing a significant gap in the transcription of jazz music from audio to score.

Methodology

The authors propose a comprehensive transcription pipeline composed of several key components: a newly developed source separation model tailored for saxophone, an updated MIDI transcription model for solo saxophone, and an adapted MIDI-to-score method for monophonic instruments. The pipeline is evaluated using an enhanced dataset of score-audio pairs from the Charlie Parker Omnibook, complete with accurate MIDI alignments and annotations.

Dataset Production

The initial step involved creating a dataset derived from existing digital scores of the Omnibook, comprising 50 tracks digitized to MusicXML format. Alignments of the human-transcribed MIDI with the audio recordings were achieved through Dynamic Time Warping (DTW) and enhanced with finer alignment methods. This dataset, including scores, downbeats, and performance-aligned MIDI files, is made available to support further research.

Transcription Pipeline Details

Beat Tracking: Utilized Madmom for beat estimation, enhancing accuracy by constraining the tempo estimates based on MIDI transcriptions.
Source Separation: The development of a Demucs-based model, specifically trained on the FiloSax dataset, for saxophone separation, achieving optimal signal-to-distortion ratio (SDR) benchmarks.
MIDI Transcription: A new model leveraging a high-resolution transcription strategy adapted from piano to saxophone, outperforming existing methods in F-measure evaluations.
Score Layout: Engaged the qparse package for MIDI-to-score conversion, emphasizing probabilistic grammar-based rhythmic quantization tailored for jazz.

Results

The results demonstrate that the proposed pipeline outperforms pre-existing solutions in source separation and MIDI transcription accuracy. Notably, the source separation component marked an improvement in SDR over baseline methods, and the transcription model exhibited superior precision and recall on both training and tested datasets, showcasing its special adaptation to jazz music's quirks. Moreover, the final score output required fewer edit operations compared to basic quantization approaches, indicating enhanced alignment with human transcriptions.

Discussion and Implications

The research highlights several challenges inherent in automating jazz transcription, such as handling unconventional rhythmic structures and swing elements fundamental to jazz. Despite these hurdles, the modular pipeline paves the way for further enhancements, enabling more reliable and scalable music transcription processes. These improvements hold the potential to revolutionize music education resources and contribute significantly to musicological studies by providing accessible high-quality transcriptions.

Conclusion

The study marks a solid advancement towards fully automated transcription systems that capture complex musical genres like jazz. By releasing both the dataset and pipeline components, the research extends an open invitation for continued exploration and refinement. Future research could further refine these methodologies, enhancing generalization across different musical instruments and styles, facilitating the routine transcription of music at scale, and bridging the gap between audio performances and practical educational tools.

Markdown Report Issue