Procedural Market-Making

Updated 1 February 2026

Procedural market-making is a framework that uses algorithmic recursions and formal control to optimize quoting and inventory decisions.
It integrates cost-function pricing and adaptive bandit or reinforcement learning techniques to adjust overround and manage risk with theoretical loss guarantees.
Applications include dynamic pricing in limit order books, regime-adaptive quoting, and multi-asset market-making under complex market microstructures.

Procedural market-making refers to a class of algorithmic market-making frameworks in which quoting and inventory decisions are fully specified by recursions or online decision rules, often derived from formal control principles or adversarial regret minimization, and typically implemented as “modular” architectures. This approach contrasts with heuristic or ad hoc quoting algorithms, offering theoretical guarantees on profit, risk, and adaptation. Procedural frameworks often integrate cost-function-based price mechanisms, online learning or bandit optimization, and explicit stochastic control under realistic models of limit order books, order flow, and inventory dynamics.

1. Modular Cost-Function Market-Making and Overround Optimization

Procedural frameworks originated with the modular architecture introduced in "Bandit Market Makers," which combines a cost-function-based automated market maker (CFAMM) with an online bandit algorithm for adaptively tuning the market-maker’s profit margin (the “overround”) (Penna et al., 2011). At each round $t$ :

The market maker maintains a liability vector $q^t$ over $n$ possible outcomes.
A convex, monotonic, bounded-loss cost function $C^t \in \mathcal{C}$ , such as the LMSR $C(q) = b \log \sum_{i=1}^n \exp(q_i / b)$ for liquidity parameter $b>0$ , determines prices $\pi(q) = \nabla C(q)$ .
To make profits, an “overround” parameter $a \geq 1$ scales the base cost: $C_a(q) = a C_0(q)$ . The overround directly increases quoted marginal prices, steepening bid-ask spreads.
Each admissible $a$ is treated as an arm in a continuous-armed adversarial bandit problem; after each trade, the bandit module evaluates the change in worst-case profit to update arm weights (e.g., via EXP3 or CAB), guaranteeing distribution-free regret $O\bigl(T^{2/3}\log^{1/3}T\bigr)$ relative to the best static overround in hindsight.
The architecture preserves the core properties of CFAMMs—bounded loss for all $C \in \mathcal{C}$ , computational tractability (e.g., via fast LMSR evaluation), and runtime complexity $O(N)$ per round for $N$ discrete overround arms.

This procedural combination yields risk-controlled, profit-adaptive automated market making, and adapts rapidly to regime changes in trader demand (Penna et al., 2011).

2. Reinforcement Learning and Procedural Market-Making Agents

Recent procedural frameworks for market making leverage reinforcement learning (RL), where the quoting and inventory management policies are learned online in simulated or live market environments, often using high-capacity function approximators:

RL Formulation: The market making task is modeled as an episodic or continuous-time Markov Decision Process (MDP), with states including inventory, queue positions, observed order book features, and market signals. Actions correspond to discrete or continuous quoting parameters (symmetric or asymmetric), inventory hedging, and sometimes side selection (Spooner et al., 2018, Gašperov et al., 2022, Niu et al., 2023, Vicente, 24 Jul 2025).
Reward Shaping: Advanced procedural RL market makers use custom reward functions to balance spread-capture against adverse inventory accumulation, often with asymmetric or time-averaged PnL penalties, explicit risk aversion, or multi-objective formulations (e.g., Pareto-front optimization) (Spooner et al., 2018, Vicente, 24 Jul 2025).
Adaptive Architectures: Deep RL market makers often integrate multiple state representations (agent-centric, market-centric, microstructural features) via hierarchical encoders, attention, or temporal-convolutional networks (Niu et al., 2023). Imitation learning from signal-based expert policies can accelerate convergence.
Online Adaptation: Procedural RL market makers may employ ensemble selection, discounted Thompson sampling (POW-dTS), and continual policy switching to adapt in non-stationary, competitive multi-agent environments (Vicente, 24 Jul 2025).

Tables of empirical results consistently show RL-based procedural market makers outperforming static or online-learning baselines in realized PnL, Sharpe, and risk-adjusted inventory measures, even under significant transaction costs and in multi-agent scenarios (Spooner et al., 2018, Gašperov et al., 2022, Niu et al., 2023, Vicente, 24 Jul 2025, Ganesh et al., 2019).

3. Procedural LOB Modeling: Event-Driven and Inventory-Aware Control

Procedural frameworks extend to high-fidelity simulation and control in limit order book environments:

Realistic LOB Simulators: Modern procedural agents train and backtest in event-driven simulators replicating full LOB state, with empirical distributions for order sizes, arrival times, cancellations, and drift/spread regime switches (Lu et al., 2018, Gašperov et al., 2022, Zimmer et al., 15 Sep 2025).
Non-markovian and Hawkes-based Order Flows: State-of-the-art models incorporate long-memory and clustering of order flow via multi-exponential Hawkes processes (intensity $\lambda_k(t)$ ), capturing the feedback and self-excitation observed in real markets (Jusselin, 2020, Gašperov et al., 2022, Wang et al., 7 Aug 2025, Zimmer et al., 15 Sep 2025).
Control-Theoretic Bellman/HJB Formulation: Procedural market makers can formalize quote setting and inventory liquidation as solutions to discrete or continuous-time dynamic programming equations (Bellman, HJB, or Quasi-Variational Inequality), with optimal actions computed via finite-difference, value or policy iteration, or deep neural network approximations (Lu et al., 2018, Jusselin, 2020, Law et al., 2019, Zimmer et al., 15 Sep 2025).
Practical Algorithms: Efficient procedural strategies result from precomputing Q-tables, training neural networks to approximate value functions, or implementing online backward solvers on realistic microstructural state spaces.

4. Multi-Asset, Multi-Level, and Inventory-Driven Quoting Protocols

Procedural market-making is generalizable to richer state and action spaces:

Multi-Level Ladder and Scaled-Beta Policies: Flexible procedural quoting protocols encode limit-order distributions using parameterized families (e.g., scaled beta distributions) that interpolate between single-level, ladder, and touch-at-best quoting schemes, with inventory-aware control via continuous shape-parameter laws (Jerome et al., 2022).
Multi-Price-Level RL/MM: RL agents with multi-level action and state spaces, imitative expert policies, and trend-augmented auxiliary features achieve superior returns and risk/return balances in high-frequency, real-market LOB backtests (Niu et al., 2023).
Procedural Internalization and Market-Making for Execution: Extended frameworks embed internal market-making engines into optimal execution models, jointly solving for external limit/market order flow and internal crossing, with closed-form control for internal and interbank spreads, impulse execution, and market impact (Morimoto, 2024).

5. Algorithmic Implementation and Regime Adaptation

The procedural paradigm mandates explicit algorithms (or “recipes”) at each stage:

Initialization: Define base cost-function or policy class, initialize quoting/position state, and discretize overround, spread factors, or action sets as required (Penna et al., 2011, Spooner et al., 2018).
Online Loop: At each event or time step, compute and post quotes, observe trades and state transitions, update profit-inventory metrics and algorithmic weights (EXP3, SARSA, DQN/PPO/TD3), and adjust quoting/spread or policy components (Penna et al., 2011, Spooner et al., 2018, Vicente, 24 Jul 2025, Zimmer et al., 15 Sep 2025, Niu et al., 2023, Jerome et al., 2022).
Backtesting/Simulation: Use realistic LOB simulators parameterized by historical data, bandit/MDP environments, or adversarial zero-sum games (ARL), and report performance metrics (PnL, Sharpe ratio, MAP, spread capture, tail risk) under varied market regimes and shocks (Lu et al., 2018, Wang et al., 7 Aug 2025, Vicente, 24 Jul 2025, Zimmer et al., 15 Sep 2025).
Dynamic Regime Adaptation: Adaptive mechanisms include shock detection (bandit weight shifts (Penna et al., 2011)), multi-policy weighting (Vicente, 24 Jul 2025), real-time re-estimation, and adversarial or robust RL protocols able to generalize across volatility, liquidity, and competition regimes (Wang et al., 7 Aug 2025, Zimmer et al., 15 Sep 2025).

6. Regret Analysis, Guarantees, and Theoretical Properties

Procedural frameworks provide explicit theoretical guarantees, an essential distinction from black-box or manual architectures:

Distribution-free Regret: In bandit-market-maker frameworks, regret relative to best static overround is $O(T^{2/3}\log^{1/3} T)$ , and the worst-case loss is provably bounded by the cost-function class (Penna et al., 2011).
Online Learning Interpretations: Quoting can be reduced to online service in sequential auction/dynamic pricing games with provably optimal regret bounds (e.g., $O(T^{2/3})$ under minimal distributional assumptions) (Cesa-Bianchi et al., 2024).
Control-Optimality: For Markovian and non-Markovian stochastic control models, the solution to the Bellman/HJBQVI equations yields value-optimal procedural strategies, with existence, uniqueness, and convergence results proved for broad classes (e.g., Hawkes-driven LOBs, impulse control, multi-exponential and neural network approximations) (Jusselin, 2020, Law et al., 2019, Lu et al., 2018, Zimmer et al., 15 Sep 2025).
Inventory Risk and Liquidation: Inventory-driven control laws, RL reward shaping, and impulse/optimal liquidation protocols ensure bounded and adaptive risk/exposure profiles across varied market conditions (Jerome et al., 2022, Morimoto, 2024, Vicente, 24 Jul 2025).
Adaptivity to Participant Behavior: Empirical work demonstrates that procedural RL and bandit agents adapt rapidly to information shocks, competition intensity, and demand regime shifts, maintaining bounded or sublinear regret even with adversarial trader behavior (Penna et al., 2011, Wang et al., 7 Aug 2025).

7. Extensions, Generalizations, and Market Design Implications

Procedural market-making frameworks are extensible and inform market design:

Prediction, Information and Bayesian Market Makers: Extensions to informational market makers (BMM) allow for posterior updating, dynamic variance estimation, automatic shock detection, and endogenous spread regulation, at the cost of unbounded loss. The tradeoff between adaptivity and guaranteed loss-bounding is explicit (Brahma et al., 2010).
Multi-Agent and Algorithmic Collusion: In repeated games, independent procedural Q-learners can coordinate (without communication) on supra-competitive quoting strategies, presenting novel challenges for antitrust and algorithmic collusion detection (Han, 2022).
Cross-Asset, Regime, and Microstructure Generalization: Frameworks are designed to handle multi-asset, regime-switching, and multi-level LOBs, with modular plug-and-play architecture in simulation and live deployment (Zimmer et al., 15 Sep 2025, Niu et al., 2023, Wang et al., 7 Aug 2025).
Market-Making with Auction and Hybrid Mechanisms: Specialized procedural policies have been developed for sessions combining continuous LOB phases and closing auctions, adapting quoting strategies to projected clearing prices and order book dynamics (Graf et al., 24 Jan 2026).

In sum, procedural market-making establishes a mathematically rigorous, empirically effective paradigm for automated liquidity provision, integrating online cost-function or RL-based quoting with explicit control-theoretic or regret-minimizing updates. This supports robust, risk-controlled, profit-effective, and adaptive market-making in increasingly complex and dynamic electronic market environments (Penna et al., 2011, Spooner et al., 2018, Gašperov et al., 2022, Niu et al., 2023, Morimoto, 2024, Vicente, 24 Jul 2025, Jusselin, 2020, Han, 2022, Cesa-Bianchi et al., 2024, Graf et al., 24 Jan 2026).