Two-Stage Auto-Bidding Algorithm

Updated 4 February 2026

Two-stage auto-bidding algorithm is a structured framework that separates high-level resource planning from tactical, real-time bid optimization, effectively handling complex auction constraints.
The approach employs dual-stage optimization with budget pacing via Lagrange multipliers and fine-grained impression-level decisions, achieving submodularity and controlled regret.
It is applied in domains like online advertising, multi-agent auctions, multi-platform bidding, and energy aggregation, ensuring strict budget adherence and scalable performance.

A two-stage auto-bidding algorithm is a structured framework that decomposes the automated bidding process into two interdependent optimization phases, typically with one stage addressing a strategic/high-level planning problem and the other managing tactical/real-time or fine-grained optimization. This decomposition enables tractable solutions to complex auction or resource allocation problems that feature combinatorial constraints, multi-agent interaction, uncertainty, or exploration-exploitation tradeoffs. The two-stage paradigm is prevalent in online advertising, resource markets, content recommendation, and energy aggregation, with application-specific formulations and theoretical guarantees.

1. General Architecture and Motivation

In canonical two-stage auto-bidding algorithms, the first stage (“outer” or “campaign-level”) solves a high-level resource or pacing problem, often considering budget, incentive design, or distributional objectives over a planning horizon. The second stage (“inner” or “impression-level”) typically makes fine-grained decisions in response to real-time events, such as which bids to submit for individual auctions or resource allocations, subject to constraints and signals from the upper-level solution. This decomposition is driven by (1) scale separation, where global constraints are enforced at a slower time scale, and (2) the need to balance global objectives (e.g., welfare, learning, or incentive compatibility) with per-instance efficiency (Liu et al., 28 Jan 2026, Mou et al., 13 Mar 2025, Aggarwal et al., 26 Feb 2025, Zhao et al., 2020).

2. Dual-Objective and Submodular Surrogates in Content Promotion

In content recommendation and paid promotion, two-stage auto-bidding algorithms address the tradeoff between immediate value (e.g., click maximization) and long-term model improvement (exploratory data acquisition). The first stage sets dynamic budget pacing via a Lagrange multiplier interpreted as a “shadow price,” using multiplicative updates based on observed spend and pacing targets. The second stage conducts impression-level bid optimization, where the optimal bid maximizes the product of auction win probability and the surplus of per-impression marginal utility over the shadow price, adapting for auction format (first- or second-price). Marginal utility incorporates a decomposable surrogate objective—gradient coverage—proven to be submodular and formally connected to Fisher Information and optimal experimental design (Liu et al., 28 Jan 2026).

When impression labels are unavailable, confidence-gated heuristics estimate the relevant gradients, and a zeroth-order variant supports black-box models via two-point finite differences. Theoretical guarantees include monotone submodularity, sublinear regret in online auctions, strict budget adherence, and efficient millisecond-latency implementation.

3. Bi-level and Game-theoretic Two-Stage Mechanisms

In markets with multiple competitive agents (such as advertising marketplaces), two-stage algorithms are instantiated as bi-level reinforcement learning frameworks. The inner stage corresponds to each agent computing an ε-best-response policy to the aggregate policy of others, often solved via policy gradient in a unified, permutation-invariant POMDP. The outer stage, managed by the platform, adjusts the parameters of the shared bidding policy to maximize social welfare, subject to the constraint that the equilibrium of the inner layer approximates a Nash equilibrium (ε-NE).

The bi-level policy gradient (BPG) algorithm, for example, deploys primal-dual Lagrangian methods with first-order policy gradients. Its single-level penalty reformulation and use of shared “unified” inner solvers yield computational complexity per iteration that is independent of the number of agents. This addresses the intractability of direct bi-level or Stackelberg approaches in large-scale environments. Convergence to a min-max saddle point, satisfying both platform-side welfare maximization and agent-side equilibrium constraints, is theoretically guaranteed (Mou et al., 13 Mar 2025).

4. Median-of-Medians Structure in Multi-Platform Bidding

For advertisers facing multi-platform environments with unknown value and cost functions displaying diminishing returns, two-stage structures appear as sequential search and refinement. The first stage (“exploration phase”) efficiently locates an optimal marginal cost threshold using a median-of-medians elimination process, parameterizing the feasible bid profile across platforms by this threshold and eliminating non-optimal regions in batch. The second stage (“optimization phase”) fractionally allocates marginal increments among platforms to satisfy budget and return-on-spend (ROS) constraints, using a greedy, bang-per-buck approach.

Learning-augmented variants integrate predictions of the optimal bid vector: if a predictor is accurate, the query complexity drops to $O(m)$ (number of platforms); otherwise, worst-case performance gracefully degrades to $O(m\log(mn)\log n)$ , matching lower bounds up to constants. These algorithms achieve near-optimal sample complexity for settings with only query access to value/cost functions (Aggarwal et al., 26 Feb 2025).

5. Real-time and Hierarchical Coordination in Energy Aggregation

Aggregators of flexible resources (e.g., electric vehicles) employ a two-stage bidding structure to handle incentive design, fleet response uncertainty, and multi-market participation. The first stage (day-ahead planning) solves a mixed-integer linear program that chooses EV incentives and market bids (charging/discharging/offered regulation), leveraging ARIMA-forecasted market prices and modeled fleet behaviors. The second stage (real-time) implements hourly recourse optimization based on observed regulation signals, updated forecasts, and realized fleet responses, ensuring compliance with both aggregator profit objectives and EV owner service-level constraints.

The structure enables separation of slow-timescale (planning, incentive-setting) and fast-timescale (regulation tracking, non-performance adjustment) control. Empirical performance demonstrates high profit, signal-following fidelity, and robust EV-owner outcomes on real market data (Zhao et al., 2020).

6. Summary of Algorithmic Structure and Theoretical Properties

The defining characteristics across these settings are encapsulated below:

Problem Domain	Stage I (High Level)	Stage II (Fine Grained)	Theoretical Guarantees
Content Promotion	Budget pacing via Lagrange multiplier	Impression-level bid, marginal utility	Submodularity, regret bounds, budget feasibility (Liu et al., 28 Jan 2026)
Multi-Agent Auctions	Platform policy for social welfare	Agent best-response policy	Convergence to ε-NE, complexity independent of N (Mou et al., 13 Mar 2025)
Multi-Platform Bidding	Marginal cost threshold determination	Fractional allocation/gap rounding	Optimal query complexity, learning-augmented (Aggarwal et al., 26 Feb 2025)
Energy Aggregation	Day-ahead MILP (incentives, bids)	Real-time hourly RT-MILP	Market profit, service guarantees, fast MILP solvers (Zhao et al., 2020)

All cases exhibit a clean separation between strategic/resource-level decisions and tactical/instance-level execution, supported by convexity, submodular surrogates, or game-theoretic formulations, with rigorous worst-case or regret guarantees.

7. Extensions, Limitations, and Directions

A common theme is the extension to settings with partial observability, adversarial environments, or data-driven surrogate modeling. Open directions include tighter integration of online learning and two-stage optimization, robust treatment of model and market mis-specification, dynamic adaptation to cross-agent interactions, and computational improvements in large-scale stochastic two-stage programs.

A plausible implication is that two-stage frameworks, by unifying theory and implementable heuristics, offer a scalable approach to complex auto-bidding, enabling application to diverse domains with high-frequency auction or resource-allocation dynamics.