Repository-Centric Learning (RCL)
- Repository-Centric Learning (RCL) is a paradigm that internalizes in-depth repository-specific knowledge to enhance codebase mastery in resource-constrained settings.
- The methodology leverages four distinct experience units—design, contextual implementation, evolutionary replay, and semantic alignment—to refine performance on software engineering tasks.
- Empirical results with SWE-Spot-4B demonstrate that RCL-trained models outperform larger task-centric baselines, achieving higher efficiency and reduced inference cost.
SWE-Spot-4B is a family of small, repository-specialized LLMs designed according to the Repository-Centric Learning (RCL) paradigm. This paradigm constitutes a shift away from Task-Centric Learning (TCL), prioritizing vertical depth within a target repository to enable strong multi-task, codebase-specific mastery in highly parameter- and resource-constrained settings. Developed atop Qwen3-4B-Instruct-2507, SWE-Spot-4B demonstrates that vertically concentrated parametric expertise can allow small LLMs to surpass TCL-trained baselines—including those with 8x greater parameter counts—across a spectrum of software engineering (SWE) tasks when evaluated within their respective target repositories (Peng et al., 29 Jan 2026).
1. Paradigm Shift: Repository-Centric Learning versus Task-Centric Learning
TCL trains models horizontally, exposing them to a spread of repositories on a fixed task (e.g., issue fixing). The model thus learns broad patterns over many codebases, but is forced to rely on inference-time retrieval or search to compensate for missing detailed knowledge of any individual repository. This paradigm is brittle when small models are deployed in novel or complex codebases, especially in privacy-constrained or on-premise environments.
RCL inverts this axis: it maximizes the density and interaction depth of experience within a target repository . The model is trained to internalize the architectural, dependency, and semantic invariants of through parametric learning. More formally: where is the set of repository-specific agentic trajectories. This approach reduces the need for retrieval-based recovery during inference and centers "repository mastery" as an essential capability (Peng et al., 29 Jan 2026).
2. Theoretical Foundations and Learning Objectives
The model is supervised on multi-step, state–action trajectories collected from interactive experiences within . The cross-entropy loss minimized over all such trajectories is: Multiple types ("units") of interactive signals form a multi-task curriculum, with the loss decomposed as: Each weights a repository-centric experience unit, allowing independent or balanced emphasis during fine-tuning. Through these dense interactions, internalizes not only token-level distributions but also the latent structure (AST, dependency graph) and functional semantics of the codebase (Peng et al., 29 Jan 2026).
3. The Four Repository-Centric Experience Units
RCX (Repository-Centric Experience) is divided into four units, each imparting a distinct aspect of repository knowledge:
| Unit | Description | Modelled Competency |
|---|---|---|
| Software Design | Structured code walkthroughs, rationale reports | Architectural intent, module roles |
| Contextual Implementation | Agentic fill-in-the-middle with enforced cross-file resolution | Global convention adherence |
| Evolutionary Replay | Undoing/fixing real PRs (introducing and then rectifying bugs) | Debugging, evolutionary constraints |
| Semantic-Runtime Alignment | Writing reproduction tests for historical bugs | Spec formalization, runtime alignment |
These units yield a dense, multi-modal interactive dataset with trajectories per repo ( per unit), typically synthesized from static code, historical PRs, and test frameworks using a strong teacher LM. Each unit provides orthogonal supervision: ablation experiments show that removing any unit degrades one or more task metrics, confirming the necessity of multifaceted experience (Peng et al., 29 Jan 2026).
4. SWE-Spot-4B Model Family: Architecture and Training
SWE-Spot-4B utilizes Qwen3-4B-Instruct-2507 as its base—an autoregressive Transformer with 32 layers, model dimension 4096, and long (up to 48k token) context. Fine-tuning proceeds over two epochs of RCX data using AdamW, with full-weight updates and a batch size of 16 (max sequence length: 32,768 tokens). Training and validation data exclude all commits and pull requests after 2020-12-31, ensuring robustness to future code changes (Peng et al., 29 Jan 2026).
Key empirical benchmarks include:
- SWE-Bench-Verified: Issue resolution (masked issue + repo context code fix)
- TDD-Bench-Verified: Test case generation for bug detection and repair
- FEA-Bench: Feature implementation
- SWE-QA: Repository-scale query answering, judged by LLMs
The model is supervised against Gemini-2.5-Pro teacher outputs and evaluated both on task pass rates (issues, tests, features) and QA scores.
5. Comparative Performance and Efficiency
SWE-Spot-4B achieves the following:
| Model | Size | Issue % | Test % | Feat % | Exec Avg % | QA Score |
|---|---|---|---|---|---|---|
| GPT-4.1-mini | – | 21.79 | 22.27 | 5.70 | 17.85 | 80.28 |
| Qwen3-Coder-30B | 32 B | 16.74 | 11.85 | 3.29 | 11.56 | 65.48 |
| CWM (TCL, Meta) | 32 B | 22.22 | 17.38 | 4.17 | 15.88 | 73.09 |
| Mini-Coder-4B (TCL) | 4 B | 18.76 | 0.63 | 4.61 | 8.70 | 57.30 |
| SWE-Spot-4B (RCL) | 4 B | 19.34 | 22.75 | 5.92 | 17.12 | 78.05 |
Despite a 4B parameter budget, SWE-Spot-4B matches or exceeds 32B open-weight models and efficiency-optimized commercial models (GPT-4.1-mini) across all core SWE tasks. On Django, RCL-trained models surpass TCL's peak with roughly half the data. At equal data budgets, RCL shows higher sample efficiency, faster negative log-likelihood convergence, and reduced inference cost (fewer turns and tokens per solution) (Peng et al., 29 Jan 2026).
6. Ablation, Cross-Repo Generalization, and Practical Deployment
Ablation studies confirm that the synergy between all four RCX units is essential: removing, for example, Evolutionary Replay slashes test task accuracy, while dropping Contextual Implementation severely impairs feature task performance.
RCL-trained models are characteristically robust to "oracle context" at inference; providing perfect file/function hints yields negligible gains, in contrast to TCL baselines. Fine-tuning only with LoRA () does not recover full performance, indicating that deep, parametric adaptation is needed for true repository-specific mastery.
Multi-repository RCL ("joint RCL") can in some cases increase pass rates (e.g., Django, Sympy), but may cause interference in others, suggesting that the optimal scope for parametric specialization remains an open area for investigation (Peng et al., 29 Jan 2026).
Practitioners are advised to:
- Prefer full-weight, RCX-based fine-tuning for on-premise models.
- Prioritize Evolutionary Replay and Contextual Implementation when resources are constrained.
- Periodically re-sample RCX units on new repository commits to maintain up-to-date expertise.
- Explore hybrid deployment: use RCL-specialists for efficiency and coverage in the target repo, with TCL generalists for novel, out-of-distribution tasks.
7. Limitations and Directions for Future Work
SWE-Spot-4B and the RCL approach as presented rely on static, teacher-generated data and supervised fine-tuning. Real-time on-policy RL or reward modeling has not yet been leveraged but may offer additional gains. Inter-repository transfer—both positive and negative—remains to be systematically characterized. There is a recognized demand for continual learning strategies that minimize the cost of repeated adaptation while maintaining repository-specific priors as codebases evolve.
Repository-centric proficiency is not a replacement for general code understanding, but serves as an empirically and theoretically necessary axis for building small, efficient, and highly capable coding agents in diverse deployment contexts (Peng et al., 29 Jan 2026).