Reconstructing the Charlie Parker Omnibook: Exploring an Audio-to-Score Automatic Transcription Pipeline
In the pursuit of advancing music education and preservation, the paper titled "Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline" presents a meticulous exploration into a novel transcription pipeline aimed at converting audio recordings into musical scores with minimal human intervention. The paper focuses specifically on the transcriptions of Charlie Parker's improvisational jazz solos, which are complex and culturally significant.
Research Context and Objectives
The research builds upon recent developments in music information retrieval (MIR), emphasizing the need for high-quality automated music transcriptions. The Charlie Parker Omnibook, a vital educational resource, serves as both the subject and testing ground for the proposed pipeline. The paper aims to create a benchmark for automatic transcription systems capable of handling intricate jazz solos, thereby addressing a significant gap in the transcription of jazz music from audio to score.
Methodology
The authors propose a comprehensive transcription pipeline composed of several key components: a newly developed source separation model tailored for saxophone, an updated MIDI transcription model for solo saxophone, and an adapted MIDI-to-score method for monophonic instruments. The pipeline is evaluated using an enhanced dataset of score-audio pairs from the Charlie Parker Omnibook, complete with accurate MIDI alignments and annotations.
Dataset Production
The initial step involved creating a dataset derived from existing digital scores of the Omnibook, comprising 50 tracks digitized to MusicXML format. Alignments of the human-transcribed MIDI with the audio recordings were achieved through Dynamic Time Warping (DTW) and enhanced with finer alignment methods. This dataset, including scores, downbeats, and performance-aligned MIDI files, is made available to support further research.
Transcription Pipeline Details
- Beat Tracking: Utilized Madmom for beat estimation, enhancing accuracy by constraining the tempo estimates based on MIDI transcriptions.
- Source Separation: The development of a Demucs-based model, specifically trained on the FiloSax dataset, for saxophone separation, achieving optimal signal-to-distortion ratio (SDR) benchmarks.
- MIDI Transcription: A new model leveraging a high-resolution transcription strategy adapted from piano to saxophone, outperforming existing methods in F-measure evaluations.
- Score Layout: Engaged the qparse package for MIDI-to-score conversion, emphasizing probabilistic grammar-based rhythmic quantization tailored for jazz.
Results
The results demonstrate that the proposed pipeline outperforms pre-existing solutions in source separation and MIDI transcription accuracy. Notably, the source separation component marked an improvement in SDR over baseline methods, and the transcription model exhibited superior precision and recall on both training and tested datasets, showcasing its special adaptation to jazz music's quirks. Moreover, the final score output required fewer edit operations compared to basic quantization approaches, indicating enhanced alignment with human transcriptions.
Discussion and Implications
The research highlights several challenges inherent in automating jazz transcription, such as handling unconventional rhythmic structures and swing elements fundamental to jazz. Despite these hurdles, the modular pipeline paves the way for further enhancements, enabling more reliable and scalable music transcription processes. These improvements hold the potential to revolutionize music education resources and contribute significantly to musicological studies by providing accessible high-quality transcriptions.
Conclusion
The study marks a solid advancement towards fully automated transcription systems that capture complex musical genres like jazz. By releasing both the dataset and pipeline components, the research extends an open invitation for continued exploration and refinement. Future research could further refine these methodologies, enhancing generalization across different musical instruments and styles, facilitating the routine transcription of music at scale, and bridging the gap between audio performances and practical educational tools.