Steering Autoregressive Music Generation with Recursive Feature Machines

Published 21 Oct 2025 in cs.LG, cs.AI, cs.SD, and eess.AS | (2510.19127v1)

Abstract: Controllable music generation remains a significant challenge, with existing methods often requiring model retraining or introducing audible artifacts. We introduce MusicRFM, a framework that adapts Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over frozen, pre-trained music models by directly steering their internal activations. RFMs analyze a model's internal gradients to produce interpretable "concept directions", or specific axes in the activation space that correspond to musical attributes like notes or chords. We first train lightweight RFM probes to discover these directions within MusicGen's hidden states; then, during inference, we inject them back into the model to guide the generation process in real-time without per-step optimization. We present advanced mechanisms for this control, including dynamic, time-varying schedules and methods for the simultaneous enforcement of multiple musical properties. Our method successfully navigates the trade-off between control and generation quality: we can increase the accuracy of generating a target musical note from 0.23 to 0.82, while text prompt adherence remains within approximately 0.02 of the unsteered baseline, demonstrating effective control with minimal impact on prompt fidelity. We release code to encourage further exploration on RFMs in the music domain.

Abstract PDF Upgrade to Chat

Summary

The paper introduces MusicRFM, a framework that uses Recursive Feature Machines to steer music generation by adjusting internal activations with music-theoretic concepts.
It integrates lightweight RFM probes with MusicGen, applying time-varying injection schedules to enable both single and multi-directional control of musical attributes.
Experimental results indicate improved accuracy in note and chord generation, though balancing control strengths is essential to prevent distributional drift.

Steering Autoregressive Music Generation with Recursive Feature Machines: A Technical Essay

Introduction

The paper "Steering Autoregressive Music Generation with Recursive Feature Machines" (2510.19127) introduces a novel framework, MusicRFM, for controlling music generation models. This framework leverages Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over pre-trained music models by adjusting their internal activations. Unlike traditional methods requiring intensive retraining or potentially introducing artifacts, MusicRFM facilitates real-time adjustments to musical attributes such as pitch, chords, and tempo.

Methodological Framework

Recursive Feature Machines and Concept Directions

RFMs provide a mechanism for identifying interpretable axes within a model's activation space. These axes, termed "concept directions," correlate strongly with music-theoretic attributes like notes and chords. RFMs achieve this by analyzing gradients within the model, extracting orthogonal directions using an Average Gradient Outer Product (AGOP). This hierarchical approach discerns principal axes of sensitivity, leading to robust intervention in the activation space.

Integration with MusicGen

MusicGen serves as the backbone for MusicRFM. The integration involves training lightweight RFM probes across various layers of MusicGen using the SynTheory dataset, which provides detailed supervision on music-theoretic concepts. Once these concept-aligned directions are established, they are injected into MusicGen during inference to guide the generation process without per-step optimization.

Dynamic Control Mechanisms

MusicRFM introduces time-varying schedules for controlling these injections. These include deterministic schedules (linear, exponential, logistic) and stochastic gating, allowing for nuanced control over the trajectory of musical attributes during generation. Furthermore, it supports multi-direction steering, permitting simultaneous or staggered enforcement of multiple musical attributes.

Figure 1: Temporal softmax traces (notes). Curves show the probe probability of the ground-truth note across timesteps for different schedules (linear/exp rise/decay, log. increase, sine).

Experimental Results

Single-Direction Steering

The framework achieves notable improvements in the accuracy of generating specific musical notes and chords. With increasing control coefficients ( $\eta_0$ ), Fréchet Distance (FD) and Maximum Mean Discrepancy (MMD) metrics rise, indicating deviations from reference distributions. However, the alignment with text prompts remains fairly stable, showcasing the framework’s efficacy in maintaining prompt fidelity (Figure 2).

Figure 2: Single-direction steering metrics as a function of control coefficient eta_0. FD increases, MMD follows suit, CLAP alignment remains stable, and probe accuracy improves with stronger control.

Multi-Direction Control

MusicRFM demonstrated robust capability in simultaneously managing multiple musical attributes, though this often resulted in increased distributional drift and decreased prompt adherence. The approach suggests a need for careful balance in control strengths to mitigate artifacts while achieving desired outcomes.

Temporal Dynamics

The application of time-based schedules showcases effective modulation of musical attributes, reflected in the accurate following of prescribed schedules. The dynamic control proves instrumental in realizing transitions and nuanced variations within generated music.

Implications and Future Directions

MusicRFM embodies a significant advancement in the domain of controllable music generation. It fosters practical applications across music production and interactive generative systems, offering means for real-time modulation of musical compositions. Future directions involve extending the framework to more complex real-world music attributes beyond symbolic datasets, harnessing temporally aware feature extraction, and exploring interactive real-time steering in performance contexts.

Moreover, extending this methodology to other autoregressive generative models like OpenAI’s Jukebox suggests potential cross-domain applications, further enriching the generative capabilities across diverse audio domains.

Conclusion

MusicRFM sets a foundational framework for controlled autoregressive music generation. By employing Recursive Feature Machines for steering pre-trained models in the activation space, it balances fine-grained controllability with audio quality and prompt fidelity. This research paves the way for enhanced interpretability in generative models and amplifies their applicability in creative fields, transforming generative dynamics in music and beyond.