Papers
Topics
Authors
Recent
Search
2000 character limit reached

LLVM Function Inlining Advances

Updated 31 January 2026
  • LLVM Function Inlining is the process of replacing a call site with the callee's code after profitability and legality checks, enhancing execution speed and binary efficiency.
  • It utilizes a multi-stage pipeline with IR attributes, call-graph analysis, and a cost-threshold model that dynamically adjusts inlining decisions based on function size, hotness, and optimization flags.
  • Recent ML-guided frameworks such as MLGO, MLGOPerf, and ACPO employ reinforcement learning to optimize the trade-off between performance gains and code size, influencing both security and analysis.

Function inlining in LLVM is the process of replacing a call site with the body of the callee function, subject to a series of profitability and legality checks. This optimization plays a critical role in improving both execution speed and, under size-oriented heuristics, reducing binary size; its ramifications extend to downstream passes and have a substantial impact on overall code generation and analysis.

1. LLVM Function Inlining Pipeline and Cost Model

LLVM’s inlining pipeline is managed by a multi-stage architecture comprising front-end source hints, IR-level attributes, a call-graph-based pass manager, hard-coded exclusion heuristics, a per-call-site cost–threshold model, and optional link-time inlining via LTO (Thin or Full). The key decision is made for each call site, where the inliner computes an estimated cost CC and a dynamic threshold TT. Inlining is performed if CTC \leq T. The cost and threshold are influenced by a set of factors, including function size, execution hotness, attributes (e.g., always_inline, inlinehint, noinline), and global optimization flags.

Explicitly, the computation is: C(fcall)=bibcost(i)+penalty×#{call sites in f}+overheadC(f_{\text{call}}) = \sum_b \sum_{i \in b} \text{cost}(i) + \text{penalty} \times \#\{\text{call sites in } f\} + \text{overhead}

T(fcall)=T0(–O)+αinlinehint1inlinehint+αhot1hotαoptsize1optsizeT(f_{\text{call}}) = T_0(\text{–O}) + \alpha_{\text{inlinehint}}\mathbf{1}_{\rm inlinehint} + \alpha_{\text{hot}}\mathbf{1}_{\rm hot} - \alpha_{\text{optsize}}\mathbf{1}_{\rm optsize} - \cdots

where T0T_0 is an optimization-level-dependent constant (e.g., 250 at –O3, 5 at –Oz), and penalty\text{penalty} is typically 25 (Abusabha et al., 16 Dec 2025).

2. Inlining Ratios, Build Variability, and “Extreme Inlining”

The degree to which functions are inlined can be quantified via the inlining ratio: R={f:f was inlined at ≥1 call site}{f:f appears as an IR function}×100%R = \frac{\left|\{\,f : f \text{ was inlined at ≥1 call site} \}\right|}{\left|\{\,f : f \text{ appears as an IR function}\}\right|} \times 100\% Empirical data shows this ratio varies widely: mean values of 0.83% at –O0, 32.7% at –O3, and 20.5% at –Oz, with maxima up to 79.6% achievable via “extreme inlining” (artificially inflating TT and/or suppressing penalties) (Abusabha et al., 16 Dec 2025). Significant factors include application domain, inlining flags, profile-derived function “hotness,” LTO mode, and aggressive settings on –mllvm flags (Abusabha et al., 16 Dec 2025). Full LTO typically increases the ratio by 10–15 percentage points beyond non-LTO builds.

Notably, this extensive variability has major implications for ML-based binary analysis and reproducibility, since ML models trained under one inlining regime may fail under binaries produced with different inlining configurations (Abusabha et al., 16 Dec 2025).

Optimization Level Mean Ratio Max Ratio
–O0 0.83% 9.52%
–O3 32.70% 66.67%
–Oz 20.47% 61.67%
Extreme (coreutils) ~79.6%

Extreme recipes such as –O3 –flto=full –mllvm -inline-threshold=200000 can nearly inline all feasible sites, dramatically altering the analyzability and security profile of the resulting binary (Abusabha et al., 16 Dec 2025).

3. Machine Learning-Guided Function Inlining

Traditional inlining uses hand-tuned analytic thresholds and cost models. Recent advances have demonstrated that inlining can be re-conceptualized as a structured sequential decision process, suitable for machine learning.

MLGO Framework (Inlining-for-Size)

MLGO recasts LLVM inlining as a Markov Decision Process: each decision point is represented by an 11-dimensional feature vector at the call site, capturing caller-centric, callee-centric, call-site-local, and global call-graph features. The action space is binary, with a learned policy πθ\pi_\theta selecting “inline” or “do not inline” for each call site. The reward is the reduction in native code size, ultimately aiming to maximize cumulative code size savings: J(θ)=Eπθ[t=0Trt]J(\theta) = \mathbb{E}_{\pi_\theta}\left[\sum_{t=0}^T r_t\right] Two algorithms are employed:

  • Policy Gradient/PPO, with a neural architecture (two hidden layers of sizes 40/20).
  • Evolution Strategies (Trofin et al., 2021).

MLGO achieves up to 7% reduction in binary size versus –Oz with negligible (∼1%) compile-time overhead, and exhibits domain- and time-generalization (robustness under codebase and compiler evolution). Integration is via the InlineAdvisor interface, supporting both AOT-compiled and dynamic TensorFlow inference (Trofin et al., 2021).

MLGOPerf and ACPO (Inlining-for-Speed)

MLGOPerf extends MLGO to target performance instead of code size. The RL policy is trained using rewards predicted by a secondary model, “IR2Perf,” which estimates post-inlining speedup based on 20 IR features and PCA reduction. The inlining agent is also augmented with features such as block frequency and loop level. On SPEC CPU2006, MLGOPerf reaches geomean speedups of 1.8% versus O3 at the cost of 17.8% larger binaries (Ashouri et al., 2022). This dual-model architecture provides practical rewards for RL training without real binary executions, and enables a two-level autotune opportunity for downstream code regions (loop unroll/interleave).

ACPO wraps MLGOPerf’s PPO-trained policy, leveraging 13 static IR and profile-derived features for each call site. The decision function is replaced in the inliner’s pass pipeline, entirely supplanting the hand-tuned “profitability” heuristic but leaving legality checks unperturbed. ACPO delivers a mean speedup of 2.1% (CBench) with a 15% increase in code size, confirming the learned policy’s ability to trade code growth for performance by unlocking further mid-end optimizations (Ashouri et al., 2023).

Framework Target Features Reported Gain Code Size Impact
MLGO code size 11 (static) up to –7% vs. –Oz <1% overhead
MLGOPerf performance 13 (static) +1.8% (SPEC2006), +2.2% +12–18%
ACPO performance 13 (static) +2.1% (CBench) +15%

4. Feature Engineering for Inline Policy

All ML-guided approaches to LLVM inlining rely on fixed-size vectors of numeric features extracted per call site. The specific selection and preprocessing include:

  • Call-graph metrics: depth (height in SCC), number of calls in SCC, fan-in/fan-out counts.
  • Function size metrics: basic block and instruction counts for caller/callee, control-flow edge count.
  • Profile-derived frequency metrics (if available): estimated hot block frequencies, call-site invocation counts.
  • Local call-site features: distance from SCC root, cost estimation, number of constant parameters.
  • Global metrics: total edge and node counts in the call graph.

Normalization of such features (zero mean/unit variance) is performed in the training pipeline for MLGO/MLGOPerf, but is not necessarily implemented inside LLVM at inference time for ACPO (Trofin et al., 2021, Ashouri et al., 2022, Ashouri et al., 2023).

5. Integration, Practical Deployment, and Overheads

ML-guided inliners are integrated into LLVM via the InlineAdvisor interface. In static AOT deployments (e.g., MLGO, ACPO), the TensorFlow or equivalent model is ahead-of-time compiled into the LLVM binary, with no external dependencies and deterministic inference. A dynamic mode is provided for policy improvement and offline training (logging full feature/action trajectories). ACPO adds C++ AOT mode to eliminate per-call inference cost, at the expense of manual model rebuilds (Ashouri et al., 2023).

Overheads are modest: MLGO reports ∼1% additional compile time, 0.65% memory consumption, and a binary size increase of 0.08% for clang (Trofin et al., 2021). ACPO reports a 0.4 s/module compile-time increment due to model RPC, which can be avoided with AOT compilation (Ashouri et al., 2023).

6. Security and ML-Based Binary Analysis Implications

Function inlining significantly alters static binary features relevant for machine learning-based security analyses, such as instruction sequences and control-flow. Inlining can be exploited to evade or subvert discriminative and generative ML models deployed for binary classification or malware detection. Variability in inlining ratios—caused by build flags, optimization levels, or explicit inliner settings—undermines consistency between training and deployment environments for such ML models, representing a critical challenge for robust binary analysis (Abusabha et al., 16 Dec 2025). Subtle compiler flags (e.g., thresholds, penalties) can be used maliciously or inadvertently, creating “evasive” binary variants.

7. Limitations and Future Directions

While ML-guided inlining frameworks demonstrate consistent improvement over hand-tuned heuristics, several limitations are identified:

  • Training costs: RL policies require extensive data collection and compute resources (e.g., MLGO’s PPO converges in ≈12 h, ES in ≈60–150 h) (Trofin et al., 2021).
  • Feature expressiveness: Reliance on static IR features may not fully capture dynamic or call-graph-structural properties. Incorporation of richer encodings (code embeddings, GNNs) is a proposed direction (Trofin et al., 2021, Ashouri et al., 2022).
  • Reward proxy limitations: In MLGOPerf, mismatch between per-function and global speedup is observed for ~15% of data, owing to cache and memory effects. More sophisticated or multi-task reward schemes are under consideration (Ashouri et al., 2022).
  • Code size blowup: Performance-oriented policies substantially increase code size (15–25%) (Ashouri et al., 2022, Ashouri et al., 2023), potentially challenging in resource-constrained or embedded contexts.

Future research is expected to refine feature extraction (including dynamic hardware counters), integrate multi-objective RL (balancing size and speed), and generalize models to additional optimization passes such as register allocation, loop unrolling, and vectorization (Trofin et al., 2021, Ashouri et al., 2022, Ashouri et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLVM Function Inlining.