Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control

Published 17 Feb 2026 in cs.IR | (2602.15659v1)

Abstract: The scarcity of high-quality training data presents a fundamental bottleneck to scaling machine learning models. This challenge is particularly acute in recommendation systems, where extreme sparsity in user interactions leads to rugged optimization landscapes and poor generalization. We propose the Recursive Self-Improving Recommendation (RSIR) framework, a paradigm in which a model bootstraps its own performance without reliance on external data or teacher models. RSIR operates in a closed loop: the current model generates plausible user interaction sequences, a fidelity-based quality control mechanism filters them for consistency with user's approximate preference manifold, and a successor model is augmented on the enriched dataset. Our theoretical analysis shows that RSIR acts as a data-driven implicit regularizer, smoothing the optimization landscape and guiding models toward more robust solutions. Empirically, RSIR yields consistent, cumulative gains across multiple benchmarks and architectures. Notably, even smaller models benefit, and weak models can generate effective training curricula for stronger ones. These results demonstrate that recursive self-improvement is a general, model-agnostic approach to overcoming data sparsity, suggesting a scalable path forward for recommender systems and beyond. Our anonymized code is available at https://anonymous.4open.science/r/RSIR-7C5B .

Abstract PDF Upgrade to Chat

Authors (11)

Summary

The paper introduces RSIR, a recursive framework that bootstraps recommendation quality by generating and filtering synthetic user interactions.
It employs fidelity control to ensure generated data remains on the user preference manifold, mitigating off-distribution drift and model bias.
Empirical results show up to 14% recall improvements across benchmarks, demonstrating RSIR's robustness, scalability, and weak-to-strong transfer capability.

Recursive Self-Improvement with Fidelity Control in Recommender Systems

Introduction and Motivation

Conventional recommender systems suffer from chronic data sparsity—users interact with a minute subset of possible items, yielding brittle optimization and suboptimal generalization. Attempts to alleviate this via side information (metadata, LLM-generated labels) or naïve augmentation (reordering, insertion) introduce their own challenges: they are costly, may not respect user preference manifolds, or amplify model biases. The paper "Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control" (2602.15659) introduces the Recursive Self-Improving Recommendation (RSIR) framework, an architecture-agnostic, data-centric paradigm wherein a model recursively generates, filters, and trains on synthetic user interactions—bootstrapping its own generalization without external teachers.

The central thesis is that even weak recommenders encode latent knowledge sufficient to generate, after principled regularization, high-fidelity data that can catalyze cumulative self-improvement for themselves or more expressive architectures.

The RSIR Framework

The RSIR process is defined as a model-centric closed loop comprising four main stages at each recursive iteration: (1) model training on the current dataset, (2) synthetic interaction generation, (3) fidelity-based quality control, and (4) data augmentation and successor training.

Figure 1: Overview of the RSIR framework, detailing the recursive generate-filter-augment-train loop with fidelity control.

Synthetic Sequence Generation. For each user's existing interaction $s_u$ , the model autoregressively generates multiple synthetic continuations, employing a hybrid candidate pool for the next-item selection. With probability $p$ , candidates are drawn from the user’s own history (exploitation); with probability $1-p$, from the global item corpus (exploration). Top- $k$ sampling constrains diversity within plausible candidate sets.

Fidelity-Based Quality Control. Each generated step is only accepted if, after appending the item, at least one of that user’s real withheld items is still ranked among the model’s top $\tau$ predictions. If the constraint is violated, generation terminates. This ensures all synthetic data closely adheres to the user preference manifold, aggressively filtering off-distribution or bias-amplifying trajectories.

Figure 2: RSIR performance sensitivity as a function of the fidelity threshold $\tau$ ; optimality is obtained at intermediate $\tau$ .

Theoretical Analysis: Implicit Regularization, Error Contraction, and Manifold Geometry

RSIR does more than regularize via classical smoothness; it enforces smoothness tangential to the empirical preference manifold. Mathematically, the fidelity check filters perturbations to lie in the manifold tangent space $T_s\mathcal{M}$ , and the composite loss minimized at each iteration includes an implicit penalty:

$\Omega(\theta) \propto \|\mathcal{P}_{\mathcal{M}} \nabla_s f_\theta\|^2$

where $\mathcal{P}_{\mathcal{M}}$ is the orthogonal projector onto $T_s\mathcal{M}$ .

This establishes RSIR as a data-driven, manifold-aware regularizer, contrasting with generic L2/Laplacian penalties that overly restrict expressiveness off-manifold.

Furthermore, the recursive dynamic yields a generalization error bound at iteration $k+1$ :

$\mathcal{E}(\theta_{k+1}) \leq (1-\lambda)\mathcal{E}_0 + \lambda\big[(1-\tilde p_k)\rho\,\mathcal{E}(\theta_k) + \tilde p_k \mathcal{E}_{\max}\big]$

With strict fidelity control (small $\tilde p_k$ ), self-generated data contracts error linearly each round until dominated by the (low) irreducible noise floor set by occasional fidelity violations.

Empirical Evaluation

Single- and Multi-Iteration Performance

Across Amazon-Toy, Amazon-Beauty, Amazon-Sport, and Yelp benchmarks—and across SASRec, CL4SRec, and HSTU backbones—RSIR delivers consistent, architecture-agnostic, statistically significant gains over both classical data augmentation (reordering, insertion) and recent learnable generative augmentation methods (ASReP, DiffuASR, DR4SR). Cumulative improvements reach up to ~14% recall lift after three iterations.

Figure 3: Performance improvement with RSIR on the Amazon-Sport dataset.

Performance gains are robust to whether the successor model is fine-tuned or initialized anew, confirming that densification, not mere memorization, underlies the improvement.

Ablation: Quality Control and Exploration

Without fidelity filtering, uncontrolled recursive generation yields catastrophic performance collapse within a few rounds due to error amplification and off-manifold drift. The importance of the exploration probability $p$ is likewise validated—optimality arises at $p \approx 0.5$ , balancing exploitation of proven interests with careful expansion. Both overly-strict (choking diversity) and overly-permissive (admitting noise) fidelity thresholds degrade performance.

Weak-to-Strong Transfer: Data Curricula

A central empirical finding is that synthetic data generated by weak models can still effect measurable improvement for strong successors (e.g., small Transformer generates data that boosts SOTA generative model). Stronger teachers amplify gains, but the benefit is not restricted to high-capacity models—implicitly regularized model-agnostic data densification fundamentally lowers the error floor.

Figure 4: Improvement rate heatmap for weak-to-strong transfer; weaker teacher models still yield substantial student improvement, validating RSIR's utility beyond large models.

Quality Analysis of Generated Data

RSIR-trajectories not only increase data density (+342% over eight iterations), but substantially increase the generated data's Approximate Entropy, indicating information-richness and genuine diversity expansion, avoiding the degenerate modes introduced by insertion-type augmentation.

Figure 5: Data density increases sharply with each RSIR iteration, validating the framework's effectiveness at mitigating extreme sparsity.

Computational Complexity and Scalability

Per-iteration compute is dominated by model training—data generation is efficient, as synthetic sequence creation leverages cached autoregressive decoding and is truncated early thanks to the fidelity "break" condition. The cost of fidelity checks (potentially a MIPS over the vocabulary) is minimized via clustering-based or ANN-based acceleration. RSIR's end-to-end wall-clock time is competitive or superior to other advanced generative augmentation methods, remaining industrially practical even with retraining from scratch.

Robustness to Data Noise

RSIR exhibits enhanced robustness under growing annotation noise; its gains over baseline increase as the original interaction log is contaminated, confirming the effectiveness of the quality control as a denoising filter and as a credible practical safeguard.

Implications and Future Directions

The RSIR framework fundamentally repositions self-improvement from an emergent behavior of large, complex models to a model- and scale-agnostic regularization property deliverable via recursive, fidelity-constrained, closed-loop data generation and retraining. This opens a trajectory for:

Efficient curriculum generation using weak or small-scale recommenders—crucial for industrial scenarios with limited deployment resources.
Stackable improvement alongside external content-based, metadata, or LLM-injected augmentation; RSIR is not redundant with external knowledge.
Scalable application in extremely sparse domains, as appropriate fidelity control and manifold targeting naturally mitigate distributional drift and prevent model collapse.
Extensions to other generative and sequential modeling domains (e.g., vision, language) where user- or task-specific preference manifolds admit reliable fidelity-based filtering.

Conclusion

RSIR operationalizes recursive, self-regularizing data densification in recommender systems, delivering statistically significant, cumulative performance boosts by enforcing both diversity and high-fidelity adherence to user preference structure. The theoretical and empirical findings generalize across datasets, architectures, and augmentation baselines, setting a principled path towards robust self-improving systems unconstrained by the limitations of external data or teacher dependencies.