LLVM Function Inlining Advances
- LLVM Function Inlining is the process of replacing a call site with the callee's code after profitability and legality checks, enhancing execution speed and binary efficiency.
- It utilizes a multi-stage pipeline with IR attributes, call-graph analysis, and a cost-threshold model that dynamically adjusts inlining decisions based on function size, hotness, and optimization flags.
- Recent ML-guided frameworks such as MLGO, MLGOPerf, and ACPO employ reinforcement learning to optimize the trade-off between performance gains and code size, influencing both security and analysis.
Function inlining in LLVM is the process of replacing a call site with the body of the callee function, subject to a series of profitability and legality checks. This optimization plays a critical role in improving both execution speed and, under size-oriented heuristics, reducing binary size; its ramifications extend to downstream passes and have a substantial impact on overall code generation and analysis.
1. LLVM Function Inlining Pipeline and Cost Model
LLVM’s inlining pipeline is managed by a multi-stage architecture comprising front-end source hints, IR-level attributes, a call-graph-based pass manager, hard-coded exclusion heuristics, a per-call-site cost–threshold model, and optional link-time inlining via LTO (Thin or Full). The key decision is made for each call site, where the inliner computes an estimated cost and a dynamic threshold . Inlining is performed if . The cost and threshold are influenced by a set of factors, including function size, execution hotness, attributes (e.g., always_inline, inlinehint, noinline), and global optimization flags.
Explicitly, the computation is:
where is an optimization-level-dependent constant (e.g., 250 at –O3, 5 at –Oz), and is typically 25 (Abusabha et al., 16 Dec 2025).
2. Inlining Ratios, Build Variability, and “Extreme Inlining”
The degree to which functions are inlined can be quantified via the inlining ratio: Empirical data shows this ratio varies widely: mean values of 0.83% at –O0, 32.7% at –O3, and 20.5% at –Oz, with maxima up to 79.6% achievable via “extreme inlining” (artificially inflating and/or suppressing penalties) (Abusabha et al., 16 Dec 2025). Significant factors include application domain, inlining flags, profile-derived function “hotness,” LTO mode, and aggressive settings on –mllvm flags (Abusabha et al., 16 Dec 2025). Full LTO typically increases the ratio by 10–15 percentage points beyond non-LTO builds.
Notably, this extensive variability has major implications for ML-based binary analysis and reproducibility, since ML models trained under one inlining regime may fail under binaries produced with different inlining configurations (Abusabha et al., 16 Dec 2025).
| Optimization Level | Mean Ratio | Max Ratio |
|---|---|---|
| –O0 | 0.83% | 9.52% |
| –O3 | 32.70% | 66.67% |
| –Oz | 20.47% | 61.67% |
| Extreme (coreutils) | ~79.6% | — |
Extreme recipes such as –O3 –flto=full –mllvm -inline-threshold=200000 can nearly inline all feasible sites, dramatically altering the analyzability and security profile of the resulting binary (Abusabha et al., 16 Dec 2025).
3. Machine Learning-Guided Function Inlining
Traditional inlining uses hand-tuned analytic thresholds and cost models. Recent advances have demonstrated that inlining can be re-conceptualized as a structured sequential decision process, suitable for machine learning.
MLGO Framework (Inlining-for-Size)
MLGO recasts LLVM inlining as a Markov Decision Process: each decision point is represented by an 11-dimensional feature vector at the call site, capturing caller-centric, callee-centric, call-site-local, and global call-graph features. The action space is binary, with a learned policy selecting “inline” or “do not inline” for each call site. The reward is the reduction in native code size, ultimately aiming to maximize cumulative code size savings: Two algorithms are employed:
- Policy Gradient/PPO, with a neural architecture (two hidden layers of sizes 40/20).
- Evolution Strategies (Trofin et al., 2021).
MLGO achieves up to 7% reduction in binary size versus –Oz with negligible (∼1%) compile-time overhead, and exhibits domain- and time-generalization (robustness under codebase and compiler evolution). Integration is via the InlineAdvisor interface, supporting both AOT-compiled and dynamic TensorFlow inference (Trofin et al., 2021).
MLGOPerf and ACPO (Inlining-for-Speed)
MLGOPerf extends MLGO to target performance instead of code size. The RL policy is trained using rewards predicted by a secondary model, “IR2Perf,” which estimates post-inlining speedup based on 20 IR features and PCA reduction. The inlining agent is also augmented with features such as block frequency and loop level. On SPEC CPU2006, MLGOPerf reaches geomean speedups of 1.8% versus O3 at the cost of 17.8% larger binaries (Ashouri et al., 2022). This dual-model architecture provides practical rewards for RL training without real binary executions, and enables a two-level autotune opportunity for downstream code regions (loop unroll/interleave).
ACPO wraps MLGOPerf’s PPO-trained policy, leveraging 13 static IR and profile-derived features for each call site. The decision function is replaced in the inliner’s pass pipeline, entirely supplanting the hand-tuned “profitability” heuristic but leaving legality checks unperturbed. ACPO delivers a mean speedup of 2.1% (CBench) with a 15% increase in code size, confirming the learned policy’s ability to trade code growth for performance by unlocking further mid-end optimizations (Ashouri et al., 2023).
| Framework | Target | Features | Reported Gain | Code Size Impact |
|---|---|---|---|---|
| MLGO | code size | 11 (static) | up to –7% vs. –Oz | <1% overhead |
| MLGOPerf | performance | 13 (static) | +1.8% (SPEC2006), +2.2% | +12–18% |
| ACPO | performance | 13 (static) | +2.1% (CBench) | +15% |
4. Feature Engineering for Inline Policy
All ML-guided approaches to LLVM inlining rely on fixed-size vectors of numeric features extracted per call site. The specific selection and preprocessing include:
- Call-graph metrics: depth (height in SCC), number of calls in SCC, fan-in/fan-out counts.
- Function size metrics: basic block and instruction counts for caller/callee, control-flow edge count.
- Profile-derived frequency metrics (if available): estimated hot block frequencies, call-site invocation counts.
- Local call-site features: distance from SCC root, cost estimation, number of constant parameters.
- Global metrics: total edge and node counts in the call graph.
Normalization of such features (zero mean/unit variance) is performed in the training pipeline for MLGO/MLGOPerf, but is not necessarily implemented inside LLVM at inference time for ACPO (Trofin et al., 2021, Ashouri et al., 2022, Ashouri et al., 2023).
5. Integration, Practical Deployment, and Overheads
ML-guided inliners are integrated into LLVM via the InlineAdvisor interface. In static AOT deployments (e.g., MLGO, ACPO), the TensorFlow or equivalent model is ahead-of-time compiled into the LLVM binary, with no external dependencies and deterministic inference. A dynamic mode is provided for policy improvement and offline training (logging full feature/action trajectories). ACPO adds C++ AOT mode to eliminate per-call inference cost, at the expense of manual model rebuilds (Ashouri et al., 2023).
Overheads are modest: MLGO reports ∼1% additional compile time, 0.65% memory consumption, and a binary size increase of 0.08% for clang (Trofin et al., 2021). ACPO reports a 0.4 s/module compile-time increment due to model RPC, which can be avoided with AOT compilation (Ashouri et al., 2023).
6. Security and ML-Based Binary Analysis Implications
Function inlining significantly alters static binary features relevant for machine learning-based security analyses, such as instruction sequences and control-flow. Inlining can be exploited to evade or subvert discriminative and generative ML models deployed for binary classification or malware detection. Variability in inlining ratios—caused by build flags, optimization levels, or explicit inliner settings—undermines consistency between training and deployment environments for such ML models, representing a critical challenge for robust binary analysis (Abusabha et al., 16 Dec 2025). Subtle compiler flags (e.g., thresholds, penalties) can be used maliciously or inadvertently, creating “evasive” binary variants.
7. Limitations and Future Directions
While ML-guided inlining frameworks demonstrate consistent improvement over hand-tuned heuristics, several limitations are identified:
- Training costs: RL policies require extensive data collection and compute resources (e.g., MLGO’s PPO converges in ≈12 h, ES in ≈60–150 h) (Trofin et al., 2021).
- Feature expressiveness: Reliance on static IR features may not fully capture dynamic or call-graph-structural properties. Incorporation of richer encodings (code embeddings, GNNs) is a proposed direction (Trofin et al., 2021, Ashouri et al., 2022).
- Reward proxy limitations: In MLGOPerf, mismatch between per-function and global speedup is observed for ~15% of data, owing to cache and memory effects. More sophisticated or multi-task reward schemes are under consideration (Ashouri et al., 2022).
- Code size blowup: Performance-oriented policies substantially increase code size (15–25%) (Ashouri et al., 2022, Ashouri et al., 2023), potentially challenging in resource-constrained or embedded contexts.
Future research is expected to refine feature extraction (including dynamic hardware counters), integrate multi-objective RL (balancing size and speed), and generalize models to additional optimization passes such as register allocation, loop unrolling, and vectorization (Trofin et al., 2021, Ashouri et al., 2022, Ashouri et al., 2023).