Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

Published 23 Jun 2025 in cs.AI, cs.CL, and cs.LG | (2506.18777v1)

Abstract: Training LLMs on source code significantly enhances their general-purpose reasoning abilities, but the mechanisms underlying this generalisation are poorly understood. In this paper, we propose Programming by Backprop (PBB) as a potential driver of this effect - teaching a model to evaluate a program for inputs by training on its source code alone, without ever seeing I/O examples. To explore this idea, we finetune LLMs on two sets of programs representing simple maths problems and algorithms: one with source code and I/O examples (w/ IO), the other with source code only (w/o IO). We find evidence that LLMs have some ability to evaluate w/o IO programs for inputs in a range of experimental settings, and make several observations. Firstly, PBB works significantly better when programs are provided as code rather than semantically equivalent language descriptions. Secondly, LLMs can produce outputs for w/o IO programs directly, by implicitly evaluating the program within the forward pass, and more reliably when stepping through the program in-context via chain-of-thought. We further show that PBB leads to more robust evaluation of programs across inputs than training on I/O pairs drawn from a distribution that mirrors naturally occurring data. Our findings suggest a mechanism for enhanced reasoning through code training: it allows LLMs to internalise reusable algorithmic abstractions. Significant scope remains for future work to enable LLMs to more effectively learn from symbolic procedures, and progress in this direction opens other avenues like model alignment by training on formal constitutional principles.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that LLMs trained solely on source code can internalize algorithmic procedures without explicit I/O examples.
It uses a two-stage finetuning paradigm with diverse datasets to reveal that code's structure enhances abstraction learning.
Results indicate that reinforcement learning further improves retroactive generalization, enabling robust transfer across domains.

Programming by Backprop: Internalization of Algorithmic Abstractions in LLMs via Code Training

The paper "Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training" (2506.18777) presents a systematic investigation into the mechanisms by which LLMs internalize algorithmic procedures through exposure to source code, even in the absence of explicit input-output (I/O) examples. The authors introduce the concept of Programming by Backprop (PBB), defined as the ability of an LLM to evaluate programs for arbitrary inputs after being trained solely on their source code, without ever observing corresponding I/O pairs.

Experimental Framework

The study employs a controlled two-stage finetuning paradigm, distinguishing between two groups of programs:

w/ IO group: Programs for which both source code and I/O examples are provided during training.
w/o IO group: Programs for which only the source code is seen during training; I/O pairs are withheld until evaluation.

The authors evaluate whether LLMs can, after such training, correctly compute outputs for new inputs to w/o IO programs, thus demonstrating the internalization of the underlying algorithmic abstraction.

Three datasets are constructed for empirical analysis:

Random Arithmetic: Synthetic Python programs with varying complexity.
Leetcode: Real-world algorithmic problems and solutions.
Ciphers: Custom and standard encryption algorithms, including novel variants to minimize pretraining overlap.

Both direct answer generation and chain-of-thought (CoT) prompting are used to probe the models' reasoning capabilities.

Key Findings

The results provide several notable insights:

Code as a Superior Medium for Abstraction: LLMs trained on code, rather than semantically equivalent natural language descriptions, exhibit significantly higher accuracy in evaluating w/o IO programs. This suggests that the syntactic and structural properties of code facilitate the internalization of reusable algorithmic abstractions.
Implicit and Explicit Program Execution: Models can perform program evaluation both implicitly (direct answer generation) and explicitly (stepwise CoT reasoning). Larger models (e.g., Llama-3.1-8B-Instruct, GPT-4o) demonstrate improved capacity for implicit execution, though CoT remains more reliable, especially for longer or composite programs.
Generalization Across Domains and Compositions: Training on w/ IO programs from one domain (e.g., Leetcode) enables transfer to structurally distinct w/o IO programs (e.g., custom ciphers). Furthermore, GPT-4o is able to evaluate compositions of independently learned programs without explicit CoT, a capability not observed in smaller models.
Reinforcement Learning Enhances Retroactive Generalization: When the second stage of training employs reinforcement learning (RL) rather than supervised finetuning (SFT), models exhibit improved retroactive generalization—i.e., the ability to apply previously learned procedural knowledge to new inputs for earlier-seen programs. This aligns with the hypothesis that RL encourages more generalizable strategies than SFT.
Mitigation of Data Distribution Biases: PBB-trained models display more uniform performance across input parameter variations compared to models trained solely on I/O pairs sampled from naturally imbalanced distributions. This is particularly evident in the cipher experiments, where PBB reduces the "embers of autoregression" effect, leading to more robust generalization.

Numerical Results

On random arithmetic tasks, accuracy on w/o IO programs increases with model scale and is consistently higher when using CoT prompting.
In the cipher domain, GPT-4o trained via PBB achieves nontrivial accuracy on custom ciphers with zero pretraining overlap, and its performance is less sensitive to parameter frequency than models trained on biased I/O data.
For composite functions, GPT-4o demonstrates the ability to retrieve and execute multiple program definitions in sequence, a capability not present in smaller Llama models.

Implications

Practical Implications:

Efficient Algorithm Injection: PBB enables the injection of new algorithmic capabilities into LLMs without the need for extensive demonstration data, which is often costly or infeasible to obtain for novel tasks.
Model Alignment and Safety: The ability to internalize symbolic procedures from code suggests a pathway for aligning LLM behavior with formal principles or constitutional rules, potentially improving interpretability and controllability.
Robustness to Data Biases: Training on code abstracts away from the idiosyncrasies of naturally occurring I/O distributions, leading to more reliable generalization across the input space.

Theoretical Implications:

The findings support the view that LLMs can internalize reusable, input-general algorithmic abstractions, rather than merely memorizing surface patterns or relying on heuristics.
The superior efficacy of code over natural language for abstraction learning raises questions about the inductive biases of current architectures and pretraining regimes.
The observed benefits of RL for retroactive generalization suggest that on-policy data and negative sampling play a critical role in enabling models to apply learned procedures flexibly.

Limitations and Future Directions

The study is conducted in a controlled finetuning regime with synthetic and moderately complex real-world tasks. The extent to which these findings scale to pretraining on massive, heterogeneous corpora remains an open question. Additionally, while PBB is shown to be effective for relatively simple algorithms, its efficacy for more complex, multi-step procedures warrants further investigation.

Future research directions include:

Scaling PBB to pretraining and evaluating its impact on emergent reasoning abilities.
Systematic exploration of synthetic code generation to facilitate abstraction learning.
Investigating the alignment of LLMs to formal principles via symbolic code training.
Mechanistic interpretability studies to elucidate how algorithmic abstractions are represented within model weights.

Conclusion

This work provides compelling evidence that LLMs, when trained on code, acquire internal representations of algorithmic procedures that are reusable across tasks and domains. The Programming by Backprop paradigm offers a principled approach for endowing LLMs with new capabilities and improving their generalization, robustness, and alignment. The results motivate further exploration of code-centric training regimes and their implications for the development of more capable and controllable LLMs.

Markdown