Per-Instance Program Synthesis (PIPS)

Updated 28 October 2025

Per-Instance Program Synthesis (PIPS) is a paradigm that generates programs tailored to satisfy specific input-output examples rather than generalizing across a domain.
It leverages diverse methodologies—from neural aggregation and human-LLM collaboration to probabilistic search and RL-guided techniques—to optimize synthesis efficiency and correctness.
By addressing intrinsic complexity challenges, PIPS enhances practical tractability and error feedback, paving the way for scalable, instance-specific automation.

Per-Instance Program Synthesis (PIPS) refers to the paradigm in which a program is synthesized to satisfy the specification for a single instance or a specific collection of task-specific input-output examples rather than aspiring to generate a general-purpose program for broad classes of instances. This task has gained prominence due to applications spanning data manipulation, code automation, software analysis, algorithmic reasoning for LLMs, and interaction-rich synthesis tools. PIPS unifies and challenges methodologies from programming languages, machine learning, and human-computer interaction by centering synthesis efficiency, correctness, generalization, and practical tractability on a per-instance basis.

1. Problem Definition and Conceptual Framework

Per-Instance Program Synthesis formalizes the objective: Given a task specification (such as a set of input-output pairs, a sketch with holes, or a behavioral trace), discover a program that meets the specification for that particular instance.

The formalization is typically as follows:

Classical PIPS: For a fixed set of IO pairs $X = \{(x_i, y_i)\}_{i=1}^N$ , synthesize a program $p^*$ such that $\forall (x_i, y_i) \in X,\ p^*(x_i) = y_i$ .
Logic-based PIPS: For a formula $\phi(\cdot)$ over functions/programs and an instance, find a (possibly finite-state) program $f$ such that $\forall x \in D,\, \phi(f(x), x)$ .
Meta-complexity PIPS: Decide for each instance whether a synthesizing program even exists, as studied in the context of the arithmetical hierarchy (Kim, 2024).

This per-instance focus distinguishes PIPS from more traditional inductive synthesis, where generalization to an entire input domain is primary.

2. Principal Methodologies

The diversity of PIPS research is reflected in the array of methodologies developed:

2.1 Two-stage Neural Aggregation Approaches

Neural Per-Example Program Synthesis (N-PEPS): A two-stage paradigm comprising per-example solution discovery and global aggregation (Shrivastava et al., 2021):

Stage 1 (Per-Example): Independently synthesize programs $p_i$ that solve \emph{single} IO pairs, leveraging the tractability of partial solutions.
Stage 2 (Aggregation): Employ a Cross Aggregator (CA) module—a transformer-style, multi-head attention network—to fuse execution cues from per-example solutions, producing programs that may exhibit behaviors requiring statement invention or cross-example generalization. The CA mechanism uses embeddings of per-example execution states and modulates cross-attention with per-example solution scores.

This decoupling exploits the empirical observation that per-example synthesis is much easier than joint synthesis and shows dramatic empirical improvements over previous state-of-the-art, especially under strict resource limits.

2.2 Human-LLM Collaborative Structuring

Structured Inductive Programming with LLMs: Interactive synthesis for rich tasks, such as image-to-image transformations, leverages LLMs and declarative structuring via Data Flow Diagrams (DFD) (Surana et al., 15 Jun 2025).

Task Decomposition: LLMs propose and refine decompositions of the global task into sub-tasks with a DFD, with human adversarial/protocol-driven feedback (RATIFY/REFUTE/REVISE/REJECT).
Module Synthesis: LLM-human interaction iterates per DFD node to ensure correctness; correct code is frozen and reused.
Program Assembly: Human-in-the-loop ensures ratification, pruning, and compositional reuse.

Empirical results in the IPARC benchmark demonstrate reliable per-instance program synthesis across all categories, with studies emphasizing the necessity of collaborative structuring and error protocols.

2.3 Probabilistic Search Structures

P-Tree Programming (PTP): Eschews explicit populations for a compact probabilistic prototype tree that encodes the entire search space (Oesch, 2017). Each PIPS instance proceeds by:

Generation: Traversing the tree by sampling choices per node, forming an instance program.
Evaluation and Learning: Each instance’s error is propagated up the tree; probability distributions over choices are updated via rank-based power-law, enabling efficient, memory-minimal, per-instance adaptability.

PTP’s local error propagation and non-discarding of sub-expression information sharply contrast with standard genetic programming, and this approach demonstrates strong per-instance responsiveness.

2.4 Probabilistic Synthesis for Black-box Components

Presyn: Targets PIPS where the specification is provided by black-box library behavior (Collie et al., 2020).

Modeling: Two-stage probabilistic modeling—the IID model predicts fragments (program blocks)—and a Markov model predicts plausible fragment compositions.
Synthesis Procedure: Candidate sketches are concretized and checked against black-box I/O equivalency. This hybrid method enables instance-specific synthesis in environments with rich control flow.

Presyn achieves high coverage and minimal user input across benchmark suites and real-world libraries.

2.5 RL-Guided Search in Program Space

Reinforcement Learning Guided Tree Search (RLGTS): Treats PIPS as a Markov Decision Process (Simmons-Edler et al., 2018).

State: Encodes the current state of the partial program over all IO examples.
Action: Proposal of the next line of code.
Reward: Blends correctness (output matching) and efficiency (program length).
Search: A dueling Q-network evaluates/prioritizes candidate construction, while a search tree records partially built programs.

RLGTS provides strong empirical success, particularly on complex, non-convex synthesis benchmarks.

2.6 Iterative LLM-driven PIPS with Structural Feedback

Instance-level LLM Program Synthesis: Integrates symbolic input abstraction, a confidence-based switch between CoT and PoT, and a structural feedback loop (Stein et al., 26 Oct 2025).

Confidence Switch: For each instance, the system predicts whether to attempt direct code synthesis or natural language reasoning, based on LLM-estimated criteria.
Structural Evaluation: Programs are iteratively refined to eliminate trivial, syntactic, or semantic defects, guided by static and dynamic code analysis.
Symbolic Input Extraction: Input is explicitly mapped to a symbolic schema (with possible iterative repair).

This approach provides systematic accuracy improvements, particularly on algorithmic and structured reasoning tasks.

3. Theoretical Foundations and Complexity

Research on PIPS reveals fundamental complexity-theoretic results:

General PIPS Complexity: The synthesis decision problem for Turing-complete languages is $\Sigma^0_3$ -complete in the arithmetical hierarchy; i.e., determining whether a program exists that meets a given (possibly infinite) specification is as hard as deciding membership in $\Sigma^0_3$ (Kim, 2024). This is formalized as:

$\exists p.~ \forall \sigma.~ \exists v.~ \sem(p)(\sigma)(v) \wedge \phi(\sigma, p, v)$

Finite Example Domains: When the specification is a finite set of examples (as in classic PBE), synthesis complexity drops to $p^*$ 0 (recursively enumerable).
Loop-Free/Decidable Semantics: For purely loop-free programs and decidable specifications, synthesis is $p^*$ 1-complete.

This suggests the observed practical necessity of heuristic, search-based, or interactive methods for PIPS, as general completeness is unattainable without restricting the problem in some way.

4. Human-Interaction and Feedback in PIPS

An important direction—particularly for scenarios with incomplete specifications or ambiguous intent—is to leverage user interaction (Le et al., 2017):

Incremental algorithms speed up synthesis by reducing the DSL to a subset that is already known to satisfy all accumulated constraints.
Step-based problem formulation (decomposing compound tasks into sub-tasks with named sub-expressions) localizes constraint application and reduces cognitive load.
Feedback-based intent refinement uses a hypothesizer module to generate clarifying queries, minimizing the number of rounds to convergence while reducing ambiguity.

Empirical evaluation shows improvements in interaction efficiency and correctness, especially in fielded systems like FlashFill and FlashExtract.

5. Synthesis from Partial and Ambiguous Traces

For tasks where the program’s visible effects are only partially observed (e.g., API call logs with missing pure functions), new techniques decompose synthesis into rewrite and local function synthesis phases (Ferreira et al., 20 Apr 2025):

Initial Solution: Constructs an overfitted multi-branch replay program.
Rewrite System: Applies semantics-preserving rewrites to generalize and compress the program, introducing hidden pure functions when needed.
Cost Optimization: The search is guided by parameterizable syntactic or reuse-based cost metrics to avoid overfitting and promote code generality.

Correctness is defined via trace subsumption, and local PBE is used to fill in missing pure computations, demonstrating practical feasibility in automation and workflow domains.

6. Applications, Empirical Results, and Open Challenges

PIPS methodologies have been deployed in neural program induction, static analysis, software library migration, human-in-the-loop toolchains, and general LLM-driven reasoning. Key empirical highlights:

Significant empirical gains in success ratio for N-PEPS with cross-aggregation (from ~77% to ~87% in strict settings) (Shrivastava et al., 2021).
LLM-driven PIPS achieves up to 8.6% harmonic mean accuracy improvement over strong baselines on reasoning tasks (Stein et al., 26 Oct 2025).
Interactive and collaborative approaches enable complete coverage on controlled benchmarks such as IPARC (Surana et al., 15 Jun 2025).
PTP and RL-guided approaches outperform traditional evolutionary and stochastic search on compositional and complex tasks (Oesch, 2017, Simmons-Edler et al., 2018).
Theoretical limitations guarantee that general per-instance synthesis will, in the worst case, be uncomputable in finite time absent further constraints (Kim, 2024).

Open research questions remain on the trade-off between expressiveness and tractability, the most effective modality for guidance (human, probabilistic, neural), and scalable methods for structured, interactive PIPS across a wide range of domains and input types.

Summary Table: Representative PIPS Approaches and Their Properties

Approach	Synthesis Methodology	Key Features
N-PEPS (Shrivastava et al., 2021)	Neural per-example + aggregation	Two-stage, attention aggregation
Syren (Ferreira et al., 20 Apr 2025)	Rewrite + local PBE	Trace subsumption, cost optimization
PTP (Oesch, 2017)	Probabilistic prototype tree	Per-choice adaptation, no population
Presyn (Collie et al., 2020)	Probabilistic fragment modeling	Black-box component, Markov sketch
RLGTS (Simmons-Edler et al., 2018)	RL + search tree	MDP formulation, Q-network guidance
Structured LLM Synthesis (Surana et al., 15 Jun 2025)	Human-LLM structuring (DFDs)	Protocols, reusable submodules
Interactive Synthesis (Le et al., 2017)	Incremental, feedback-driven, stepwise	VSA, hypothesizer, sub-DSLs
LLM Per-instance w/Feedback (Stein et al., 26 Oct 2025)	Confidence switch, iterative repair	Symbolic input, structural eval

Each of these methods exemplifies the core principle of PIPS: optimizing the synthesis process for instance-level correctness, efficiency, and adaptability, often by modularizing the synthesis task, leveraging structure and feedback, and aligning search strategies with empirical and theoretical constraints.