Bidirectional Inlining in Repositories

Updated 8 January 2026

The paper demonstrates how partitioning the repository's dependency graph via upstream inlining and downstream retrieval improves semantic accuracy in generated code.
The methodology extracts actual call-site contexts and supplements them with precise callee definitions, ensuring tighter integration across repository files.
Quantitative evaluations reveal significant gains in exact match and semantic similarity metrics, validating the InlineCoder framework's effectiveness.

Bidirectional inlining is a repository-level code generation process in which a code model explicitly synthesizes and integrates both upstream (caller) and downstream (callee) context for a target function to enhance semantic coherence and dependency resolution. This approach fragments the repository’s dependency graph around the function under construction, inlines the function’s draft implementation into its call sites (upstream inlining), concurrently augments the prompt with the definitions of invoked callees (downstream retrieval), and thereby reframes the repository-level code completion problem as a more tractable, information-rich function-level synthesis problem (Hu et al., 1 Jan 2026). This process is central to the InlineCoder framework, which demonstrates notable improvements in exact match and semantic similarity on realistic repository-level code generation benchmarks.

1. Motivation and Challenges of Repository-Level Context Integration

Repository-level code generation differs fundamentally from file-level or function-level synthesis due to the necessity of coordinating changes and resolving dependencies that span multiple files, modules, and layers of abstraction across a software repository (Li et al., 9 Mar 2025). Traditional retrieval-augmented generation (RAG) approaches typically retrieve relevant snippets based on lexical or embedding similarity, but often fail to capture the intricate data-, control-, and call-graph relationships that determine how code is used or interacts across files (Tao et al., 6 Oct 2025, Liu et al., 2024). Repository-level tasks therefore require:

Correct alignment of call signatures, types, and inter-module utilities
Semantic coherence and adherence to project-specific coding conventions
Robust handling of usage contexts and integration points (callers) as well as dependency targets (callees)
Maintaining code correctness and ensuring that test suites are passed after modification

These demands motivate the development of bidirectional inlining as a strategy to present the code model with highly contextualized, structurally-grounded prompts that go beyond surface retrieval.

2. Bidirectional Inlining Process: Methodology

The bidirectional inlining process as instantiated in InlineCoder is characterized by two complementary subroutines: upstream inlining and downstream retrieval, both coordinated through an initial draft implementation (“anchor”) and supplemented by confidence-guided prompt structuring (Hu et al., 1 Jan 2026).

2.1. Upstream Inlining

Upstream inlining propagates the draft implementation of the target function into its usage loci—the callers. This is operationalized as follows:

Call-site enumeration: Traverse the repository’s abstract syntax tree (AST) to identify all occurrences where the target function is invoked.
Caller extraction: For each call site, extract the enclosing function/body.
Parameter substitution: For each call $f(a_1,\dots,a_m)$ to the target $f(p_1,\dots,p_m)$ , replace each parameter $p_i$ in the draft body with actual argument $a_i$ ( $\sigma(p_i)=a_i$ ).
Return value normalization: Normalize return statements to assign to an explicit result variable, facilitating call-site substitution.
Splicing: Replace the call in the caller’s body with the transformed draft body, preserving syntactic correctness and indentation.

This yields multiple “inlined” caller contexts, each presenting the draft implementation within real-world usage scenarios.

2.2. Downstream Retrieval

Downstream retrieval contextualizes the function under synthesis by including the implementations of all code entities it is expected to invoke (its callees):

Callee identification: Parse the draft (anchor) implementation’s AST to enumerate all invoked names, supplementing with the model’s own predicted call targets.
Function index lookup: Search the repository index for definitions matching these invoked names, select and rank the most relevant.
Context budget: Concatenate a bounded number of callee definitions to the prompt, explicitly excluding the target function itself to avoid self-recurrence.

The process ensures that the code model is exposed to both how the new function will be called (upstream scenarios) and to the APIs/utilities available for it to call (downstream dependencies).

3. Prompt Construction and Confidence-Based Guidance

Prompt assembly is a structured concatenation of:

Base repository context (imports and cross-file references)
Upstream inlined caller scenarios, each demarcated for clarity
Downstream (callee) context blocks
A natural-language confidence cue based on perplexity analysis of the initial draft (“anchor”)
The anchor draft implementation
Instruction requesting the final, refined implementation

Confidence estimation is performed by calculating the anchor’s token-level perplexity on a small, fast LLM. The prompt is labelled as “high”, “medium”, or “low” confidence, and instructs the LLM to trust, partially use, or discard the anchor as appropriate. This mechanism addresses uncertainty propagation and guides the model’s attention over the assembled context.

4. Design Rationale and Theoretical Implications

Bidirectional inlining is specifically designed to address the pathologies identified in earlier RAG and context-window approaches:

Repository-to-function reframing: By inlining and projecting the repository context into a single, function-centric view, the approach bypasses irrelevance and retrieval noise, flattening complex call graphs into a linear, synthesizable format.
Bidirectional context: Presenting both usage (upstream) and resource (downstream) aspects synergistically supplies the model with input-output constraints, realistic usage modes, and existing code conventions—mechanistically aligning generated code with project architecture.
Draft as anchor: The initial draft not only anchors the retrieval of relevant context, but its perplexity provides a self-estimated trust score, dynamically modulating model guidance—an innovation supported by empirical gains in semantic accuracy.
Linearized input structure: The translation of AST-based graph relationships into ordered prompt blocks exploits LLM strengths in digesting sequential contexts over unordered lists or raw graphs.

These insights are supported by experiments demonstrating superior performance in code matching and semantic metrics compared to baseline retrieval-augmented approaches (Hu et al., 1 Jan 2026).

5. Quantitative Evaluation and Ablation Results

Bidirectional inlining as implemented in InlineCoder has been evaluated on large-scale, repository-level code generation benchmarks including DevEval and RepoExec (Hai et al., 2024), with both foundation and instruction-tuned LLMs (Li et al., 9 Mar 2025). Notable findings include:

On DevEval (average over three models), InlineCoder achieves 11.75% exact match (EM) and 67.03% edit similarity (ES) versus 11.17% EM and 62.08% ES for the strongest vanilla baselines (relative gains of +5.13% EM, +10.86% ES).
On RepoExec, average EM increases from 0.93% (vanilla) to 1.90% (InlineCoder), a relative gain of +29.73%.
Component ablations confirm that removing upstream inlining, downstream context, or draft guidance each leads to measurable degradations in EM, ES, and BLEU.
Return-statement EM and callee-call EM both show gains, indicating better handling of semantic constraints and cross-file usage.

These results indicate the efficacy of bidirectional inlining in repository-level code generation, robustly outperforming prior approaches based solely on similarity retrieval or local context expansion.

6. Limitations and Future Directions

Observed limitations include:

Current implementations are restricted to Python, contingent on AST toolchains, but the paradigm is agnostic and theoretically extendable to statically typed languages (Java, C++, TypeScript) where cross-file type and call relationships are more explicitly encoded (Pan et al., 2024).
Scaling to very large repositories requires budgeted sampling or summarization for inlining to remain prompt-length compatible; aggressive sharding or compression may be necessary.
Multi-language or polyglot repositories present challenges in generalized subtree matching and symbol resolution; adaptation of retrieval rules per language is indicated (Tao et al., 6 Oct 2025).
The approach does not yet integrate dynamic, test-based validation in-line with synthesis, which may further improve functional correctness and cross-file dependency integration.
Integration of execution-based feedback during generation, online adaption of retrieval strategies, and extension to incremental feature implementation are identified as promising research avenues (Li et al., 9 Mar 2025).

Overall, the bidirectional inlining process operationalizes a significant advance in repository-level code generation by unifying upstream and downstream semantic context—reframing repository code completion as a tractable, context-rich function-level task and yielding measurable gains in code synthesis fidelity and semantic correctness.