Repository-Centric Learning (RCL)

Updated 5 February 2026

Repository-Centric Learning (RCL) is a paradigm that internalizes in-depth repository-specific knowledge to enhance codebase mastery in resource-constrained settings.
The methodology leverages four distinct experience units—design, contextual implementation, evolutionary replay, and semantic alignment—to refine performance on software engineering tasks.
Empirical results with SWE-Spot-4B demonstrate that RCL-trained models outperform larger task-centric baselines, achieving higher efficiency and reduced inference cost.

SWE-Spot-4B is a family of small, repository-specialized LLMs designed according to the Repository-Centric Learning (RCL) paradigm. This paradigm constitutes a shift away from Task-Centric Learning (TCL), prioritizing vertical depth within a target repository to enable strong multi-task, codebase-specific mastery in highly parameter- and resource-constrained settings. Developed atop Qwen3-4B-Instruct-2507, SWE-Spot-4B demonstrates that vertically concentrated parametric expertise can allow small LLMs to surpass TCL-trained baselines—including those with 8x greater parameter counts—across a spectrum of software engineering (SWE) tasks when evaluated within their respective target repositories (Peng et al., 29 Jan 2026).

1. Paradigm Shift: Repository-Centric Learning versus Task-Centric Learning

TCL trains models horizontally, exposing them to a spread of repositories on a fixed task (e.g., issue fixing). The model thus learns broad patterns $p(a \mid s)$ over many codebases, but is forced to rely on inference-time retrieval or search to compensate for missing detailed knowledge of any individual repository. This paradigm is brittle when small models are deployed in novel or complex codebases, especially in privacy-constrained or on-premise environments.

RCL inverts this axis: it maximizes the density and interaction depth of experience within a target repository $R^*$ . The model is trained to internalize the architectural, dependency, and semantic invariants of $R^*$ through parametric learning. More formally: $\theta^* = \arg\max_{\theta} \mathbb{E}_{\tau \sim E_{R}} \left[ \sum_{t=0}^{T-1} \log p_{\theta}(a_t \mid s_t) \right]$ where $E_R$ is the set of repository-specific agentic trajectories. This approach reduces the need for retrieval-based recovery during inference and centers "repository mastery" as an essential capability (Peng et al., 29 Jan 2026).

2. Theoretical Foundations and Learning Objectives

The model is supervised on multi-step, state–action trajectories $\tau = (s_0, a_0, s_1, a_1, ..., s_T)$ collected from interactive experiences within $R^*$ . The cross-entropy loss minimized over all such trajectories is: $\mathcal{L}(\theta) = -\frac{1}{|E_R|}\sum_{\tau \in E_R} \sum_{t=0}^{T-1} \log p_{\theta}(a_t \mid s_t)$ Multiple types ("units") of interactive signals form a multi-task curriculum, with the loss decomposed as: $\mathcal{L}_{\text{RCL}}(\theta) = \sum_{u=1}^4 \lambda_u \, \mathbb{E}_{\tau \in E_R^u}[-\log p_\theta(\tau)]$ Each $\lambda_u$ weights a repository-centric experience unit, allowing independent or balanced emphasis during fine-tuning. Through these dense interactions, $p_\theta$ internalizes not only token-level distributions but also the latent structure $G_R$ (AST, dependency graph) and functional semantics $f_R$ of the codebase (Peng et al., 29 Jan 2026).

3. The Four Repository-Centric Experience Units

RCX (Repository-Centric Experience) is divided into four units, each imparting a distinct aspect of repository knowledge:

Unit	Description	Modelled Competency
Software Design	Structured code walkthroughs, rationale reports	Architectural intent, module roles
Contextual Implementation	Agentic fill-in-the-middle with enforced cross-file resolution	Global convention adherence
Evolutionary Replay	Undoing/fixing real PRs (introducing and then rectifying bugs)	Debugging, evolutionary constraints
Semantic-Runtime Alignment	Writing reproduction tests for historical bugs	Spec formalization, runtime alignment

These units yield a dense, multi-modal interactive dataset with $\approx 8\,000$ trajectories per repo ( $\approx 2\,000$ per unit), typically synthesized from static code, historical PRs, and test frameworks using a strong teacher LM. Each unit provides orthogonal supervision: ablation experiments show that removing any unit degrades one or more task metrics, confirming the necessity of multifaceted experience (Peng et al., 29 Jan 2026).

4. SWE-Spot-4B Model Family: Architecture and Training

SWE-Spot-4B utilizes Qwen3-4B-Instruct-2507 as its base—an autoregressive Transformer with $\sim$ 32 layers, model dimension $\sim$ 4096, and long (up to 48k token) context. Fine-tuning proceeds over two epochs of RCX data using AdamW, with full-weight updates and a batch size of 16 (max sequence length: 32,768 tokens). Training and validation data exclude all commits and pull requests after 2020-12-31, ensuring robustness to future code changes (Peng et al., 29 Jan 2026).

Key empirical benchmarks include:

SWE-Bench-Verified: Issue resolution (masked issue + repo context $\to$ code fix)
TDD-Bench-Verified: Test case generation for bug detection and repair
FEA-Bench: Feature implementation
SWE-QA: Repository-scale query answering, judged by LLMs

The model is supervised against Gemini-2.5-Pro teacher outputs and evaluated both on task pass rates (issues, tests, features) and QA scores.

5. Comparative Performance and Efficiency

SWE-Spot-4B achieves the following:

Model	Size	Issue %	Test %	Feat %	Exec Avg %	QA Score
GPT-4.1-mini	–	21.79	22.27	5.70	17.85	80.28
Qwen3-Coder-30B	32 B	16.74	11.85	3.29	11.56	65.48
CWM (TCL, Meta)	32 B	22.22	17.38	4.17	15.88	73.09
Mini-Coder-4B (TCL)	4 B	18.76	0.63	4.61	8.70	57.30
SWE-Spot-4B (RCL)	4 B	19.34	22.75	5.92	17.12	78.05

Despite a 4B parameter budget, SWE-Spot-4B matches or exceeds 32B open-weight models and efficiency-optimized commercial models (GPT-4.1-mini) across all core SWE tasks. On Django, RCL-trained models surpass TCL's peak with roughly half the data. At equal data budgets, RCL shows higher sample efficiency, faster negative log-likelihood convergence, and reduced inference cost (fewer turns and tokens per solution) (Peng et al., 29 Jan 2026).

6. Ablation, Cross-Repo Generalization, and Practical Deployment

Ablation studies confirm that the synergy between all four RCX units is essential: removing, for example, Evolutionary Replay slashes test task accuracy, while dropping Contextual Implementation severely impairs feature task performance.

RCL-trained models are characteristically robust to "oracle context" at inference; providing perfect file/function hints yields negligible gains, in contrast to TCL baselines. Fine-tuning only with LoRA ( $r=128$ ) does not recover full performance, indicating that deep, parametric adaptation is needed for true repository-specific mastery.

Multi-repository RCL ("joint RCL") can in some cases increase pass rates (e.g., Django, Sympy), but may cause interference in others, suggesting that the optimal scope for parametric specialization remains an open area for investigation (Peng et al., 29 Jan 2026).

Practitioners are advised to:

Prefer full-weight, RCX-based fine-tuning for on-premise models.
Prioritize Evolutionary Replay and Contextual Implementation when resources are constrained.
Periodically re-sample RCX units on new repository commits to maintain up-to-date expertise.
Explore hybrid deployment: use RCL-specialists for efficiency and coverage in the target repo, with TCL generalists for novel, out-of-distribution tasks.

7. Limitations and Directions for Future Work

SWE-Spot-4B and the RCL approach as presented rely on static, teacher-generated data and supervised fine-tuning. Real-time on-policy RL or reward modeling has not yet been leveraged but may offer additional gains. Inter-repository transfer—both positive and negative—remains to be systematically characterized. There is a recognized demand for continual learning strategies that minimize the cost of repeated adaptation while maintaining repository-specific priors as codebases evolve.

Repository-centric proficiency is not a replacement for general code understanding, but serves as an empirically and theoretically necessary axis for building small, efficient, and highly capable coding agents in diverse deployment contexts (Peng et al., 29 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Repository-Centric Learning (RCL).