Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws

Published 10 May 2025 in cs.LG, stat.ML, cs.AI, and cs.CV | (2505.06699v3)

Abstract: This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting, named $\textbf{model steering}$. While ad-hoc methods have been used in various contexts, including the training of large foundation models, its underlying principles remain insufficiently understood, leading to sub-optimal performance. In this work, we propose a theory-driven framework for model steering called $\textbf{DRRho risk minimization}$, which is rooted in Distributionally Robust Optimization (DRO). Through a generalization analysis, we provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model. To the best of our knowledge, this is the first time such theoretical insights are provided for the new learning paradigm, which significantly enhance our understanding and practice of model steering. Building on these insights and the connection between contrastive learning and DRO, we introduce a novel method for Contrastive Language-Image Pretraining (CLIP) with a reference model, termed DRRho-CLIP. Extensive experiments validate the theoretical insights, reveal a superior scaling law compared to CLIP without a reference model, and demonstrate its strength over existing heuristic approaches.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces model steering, leveraging a reference model to reduce loss variance and establish tighter generalization bounds.
It presents DRRho risk minimization, a DRO-based framework that selects and weights samples using RHO loss to boost data efficiency.
Empirical evaluations on CLIP models show DRRho-CLIP achieves comparable or superior performance with significantly reduced data and improved scaling laws.

This paper, "Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws" (2505.06699), introduces and formalizes a novel learning paradigm called model steering. This paradigm leverages a pre-trained model (the reference model) to guide and enhance the training of a target model, primarily through strategic data selection or weighting. The core idea is distinct from knowledge distillation, as the target model can potentially surpass the reference model's performance.

The paper addresses the lack of theoretical understanding behind existing ad-hoc methods for learning with a reference model. It proposes a theory-driven framework called DRRho risk minimization, which is based on Distributionally Robust Optimization (DRO). Instead of minimizing the standard expected risk $\mathbb{E}[\ell(\vtheta, z)]$, DRRho risk minimization minimizes the worst-case expected RHO loss, defined as $\hat{\ell}(\vtheta, z) = \ell(\vtheta, z) - \ell(\vtheta_{\text{ref}}, z)$, over a set of perturbed data distributions. The DRRho risk is formalized as $F(\vtheta) = \sup_{\mathbf{p} \in \Delta, D_\phi(\mathbf{p}, 1/n) \le \rho/n} \sum_{i=1}^n p_i (\ell(\vtheta, z_i) - \ell(\vtheta_{\text{ref}}, z_i))$.

A key theoretical contribution is a generalization analysis showing why DRRho risk minimization improves generalization and data efficiency. The analysis demonstrates that the excess risk bound of DRRho is related to the variance of the RHO loss, i.e., $\mathrm{Var}(\ell(\vtheta_*, \cdot) - \ell(\vtheta_{\text{ref}}, \cdot))$, which is expected to be smaller than the variance of the standard loss $\mathrm{Var}(\ell(\vtheta_*, \cdot))$ if the reference model is well-trained. This suggests improved generalization compared to standard DRO or Empirical Risk Minimization (ERM). Furthermore, the theory indicates that DRRho can achieve the same level of generalization as a reference model from the same function family with significantly fewer samples ( $\mathcal{O}(\sqrt{m})$ vs $\mathcal{O}(m)$ , where $m$ is the reference model's data size), highlighting its data efficiency potential.

The paper also establishes connections between the DRRho framework and existing heuristic methods for data selection and weighting. For instance, with the CVaR divergence, the DRRho objective relates to selecting samples with the top-k RHO losses. With the KL-divergence, it corresponds to weighting samples based on their RHO loss values, providing a theoretical basis for methods like ActiveCLIP and JEST.

To demonstrate the practical utility of the framework, the authors apply it to Contrastive Language-Image Pretraining (CLIP), introducing DRRho-CLIP. Leveraging the connection between contrastive loss and DRO, they define DRRho contrastive losses for image and text anchors based on the RHO loss concept. For an image anchor $x_i$ and text $y_j$ , the pairwise RHO loss is based on the similarity difference $s(\x_i, \y_j) - s(\x_i, \y_i)$, extended using reference model similarities. The objective becomes minimizing an average of DRRho contrastive losses over anchors. Optimization is performed using an efficient stochastic algorithm based on the SogCLR framework, which is suitable for the global nature of the contrastive loss.

For practical implementation, the paper suggests pre-computing reference model embeddings offline to avoid costly online computation of reference model losses during training, similar to other model steering or distillation approaches.

Extensive experiments are conducted on large-scale datasets (CC12M, DFN-192M, DFN-12M) and various ViT models. The experimental results empirically validate the theoretical claims:

Data Efficiency: DRRho-CLIP is shown to achieve performance comparable to standard CLIP training (FastCLIP) using significantly less data (e.g., 50% reduction), and outperforms standard CLIP training when using the full dataset. Benefits are observed even with a weaker reference model.
Variance Reduction: The variance of the RHO loss is empirically shown to be lower than that of the original loss, supporting the theoretical insight.
Comparison with Baselines: DRRho-CLIP significantly outperforms heuristic RHO-loss based data sampling methods like JEST and JEST (Top-k).
Integration with Distillation: DRRho-CLIP can be seamlessly integrated with knowledge distillation. When the reference model is not significantly stronger, DRRho-CLIP alone performs better than distillation-based methods like MobileCLIP and FastCLIP (w/ Distillation). When the reference model is much stronger, DRRho-CLIP combined with distillation achieves state-of-the-art performance.
Scaling Law: Empirical analysis reveals that DRRho-CLIP exhibits a better scaling law (smaller error exponent $\beta$ ) compared to OpenCLIP, indicating that it utilizes increased computational resources more effectively for performance gains.

In summary, the paper provides a solid theoretical foundation for model steering based on DRO and RHO loss, derives generalization bounds highlighting improved data efficiency and generalization, and translates these concepts into a practical and effective method, DRRho-CLIP, for training CLIP models, which is empirically shown to outperform existing methods and exhibit superior scaling behavior.

Markdown Report Issue