Reference-Free Iterative Monotonic Process

Updated 6 February 2026

Reference-free iterative monotonic process is an optimization method that generates a sequence of solutions with guaranteed non-degradation of performance without external reference models.
It leverages intrinsic data, surrogate objective construction, and adaptive step-sizing to achieve monotonic improvements in areas such as reinforcement learning, control, and autoformalization.
Rigorous theoretical guarantees and empirical validations underscore its robustness and effectiveness in ensuring stable convergence across complex, data-driven tasks.

A reference-free iterative monotonic process is an optimization or learning procedure that constructs a sequence of solutions, iterates, or policies, where each update is guaranteed not to degrade a target objective, and importantly, does so without relying on an explicit reference trajectory, reference state, or model. Monotonicity is realized either in the objective (cost or reward) or in a certified confidence bound on multi-dimensional performance, while “reference-free” indicates that all constraint, correction, or improvement signals are derived from intrinsic properties (data, feedback, convexity, or majorization) rather than external targets or reference models. This paradigm appears across reinforcement learning, control, fixed-point algorithms for variational problems, inverse problems, and formal autoformalization, leading to rigorously justified, convergent, and robust procedures.

1. Foundational Principles and Formal Structure

The key property of a reference-free iterative monotonic process is the guarantee that the objective function value does not deteriorate from one iteration to the next. This is in contrast to standard iterative procedures that may not enforce monotonicity, or methods that rely on reference states, reference trajectories, or external supervision.

Core definition:

Let $\{x_k\}$ be a sequence produced by an iterative algorithm, optimizing some function $J(x)$ . The process is monotonic if $J(x_{k+1}) \ge J(x_k)$ (for maximization) or $J(x_{k+1}) \le J(x_k)$ (for minimization) for all $k$ . Reference-free indicates all decisions and step adjustments are based solely on intrinsic data, solution history, or mathematical structure, not on externally imposed references.

The process may output the next iterate $x_{k+1}$ by

maximizing a suitably defined surrogate lower bound,
minimizing a locally tight majorizer,
taking a gradient or operator step projected onto feasibility domains,
solving a constrained maximization subject to divergence or entropy regularization, with the property that each $x_{k+1}$ improves or at least preserves $J$ relative to $x_k$ .

2. Representative Algorithms and Methodologies

Policy Optimization (Reinforcement Learning)

Model-Free Trajectory-Based Methods (MOTO):

The “MOTO” algorithm constructs local, quadratic approximations of the state-action $Q$ -function from trajectory data without learning a dynamics model, and backs these up to enforce exact KL-divergence trust-region constraints per time step. The closed-form update for a linear-Gaussian policy under a quadratic $Q$ surrogate ensures that each updated policy does not decrease expected return and enforces an exact bound on the update magnitude via the KL constraint (Akrour et al., 2016).

Easy Monotonic Policy Iteration (EMPI):

EMPI establishes a surrogate lower bound for policy improvement in MDPs, penalizing the average policy divergence under the old discounted trajectory distribution, and optimizes this surrogate. By designing the step so that the surrogate always increases, monotonic improvement in expected return is guaranteed—no reference trajectory or external model is used (Achiam, 2016).

Control and Predictive Control

Reference-Free Iterative Learning Model Predictive Control with Neural Certificates:

In iterative LMPC, terminal sets and costs are constructed from data accrued in previous episodes, using learned certificate functions (e.g., neural CLBFs) to define feasible and cost-improving terminal conditions for the MPC subproblem. Monotonicity is enforced by guaranteeing that the performance cost is non-increasing at each iteration, given the exact satisfaction of the control Lyapunov-barrier conditions, with no need for reference trajectory information (Hashimoto et al., 18 Jul 2025).

Optimization, Operator Theory, and PDEs

Majorization-Minimization Source Localization (SOLVIT):

For nonconvex, nonsmooth source localization under time-difference-of-arrival (TDOA) measurement, an iterative scheme constructs at each step a quadratic, globally tight surrogate (majorizer) of the cost at the current solution. The next iterate is the global minimizer of this majorizer. By construction, the actual cost does not increase, and the approach is reference-free: all measurements are used without privileging any sensor as a reference (Jyothi et al., 2019).

Halpern Iteration for Monotone Inclusion:

Halpern-style schemes for variational inequalities and monotone inclusion in Hilbert space iterate scaled combinations of initial points and projected operator steps, with the weights and steps determined adaptively, not by reference models or targets. The process is parameter-free and strongly convergent, with monotonic decay of an appropriate potential (Diakonikolas, 2020). This pattern also appears in fixed-point iterative Galerkin discretizations of monotone PDEs, where the contraction mapping is reference-free, and the process is stopped based on a posteriori error estimators (Congreve et al., 2015, Nevanlinna, 2021).

Automated Theorem Formalization

Monotonic Reference-Free Refinement for Autoformalization:

In full-theorem autoformalization, candidates are generated and improved using complementary LLM-driven refiners and theorem-prover feedback, with performance scored on a composite metric over formal validity and other soft dimensions. A conservative acceptance policy ensures that, at each step, either the actual composite objective or a certified lower confidence bound does not decrease. No reference to ground-truth formalizations is required; LLM and prover signals are fused to drive monotonic certified progress (Zhang et al., 30 Jan 2026).

3. Theoretical Guarantees and Convergence Properties

A defining advantage of these processes is the theoretical guarantee of monotonicity and typically global convergence (to a stationary point, local minimum, or, in some cases, the true optimum).

Majorization-Minimization: Monotonicity is ensured by construction of surrogates tangent and majorizing at each iteration, so $f(x^{(k+1)}) \le f(x^{(k)})$ (Jyothi et al., 2019).
Trust Region/KL-Constrained Policy Updates: KL or TV divergence constraints strictly limit the policy update in each iteration, preventing collapse or divergence and yielding provable non-decrease in expected return (Akrour et al., 2016, Achiam, 2016).
Certificate-Based MPC: Cost is non-increasing over iterations when certificate assumptions are met, with recursive feasibility and stability also guaranteed under mild conditions (Hashimoto et al., 18 Jul 2025).
Fixed-Point and Halpern Iteration: Strict contractiveness, or potential descent, ensures geometric or sublinear convergence in norm to the solution, with near-optimal iteration complexity (Diakonikolas, 2020, Nevanlinna, 2021, Congreve et al., 2015).
Autoformalization Acceptance Policies: Conservative selection rules, enforced by lower confidence bounds and hard-masked objectives, guarantee that every accepted change does not degrade true (or certified) utility, and almost sure global convergence is proved (Zhang et al., 30 Jan 2026).

4. Empirical Performance and Applications

Reference-free iterative monotonic processes have demonstrated strong empirical performance in a range of domains:

Autoformalization Benchmarks: Near-monotonic, simultaneous improvement of multiple formalization dimensions, with up to 93.44% formal validity and 78.22% overall score on miniF2F, and monotonic gain across all axes despite the adversarial nature of LLMs and theorem provers (Zhang et al., 30 Jan 2026).
Trajectory-Based RL and Control Tasks: MOTO and EMPI algorithms outperform approaches relying on model linearization or non-monotonic updates in highly nonlinear simulated control tasks, validating the advantage of the monotonicity and reference-free structure (Akrour et al., 2016, Achiam, 2016, Hashimoto et al., 18 Jul 2025).
Source Localization: SOLVIT yields lower RMSE than both reference-based and previous reference-free methods, matching Cramér–Rao lower bounds on synthetic and real data (Jyothi et al., 2019).
PDE Solvers: Iterative monotone Galerkin schemes achieve optimal error for strongly monotone problems with a fixed, minimal iteration count per mesh, even in adaptive refinement settings (Congreve et al., 2015).

5. Design Patterns and Algorithmic Templates

Key algorithmic elements characterizing reference-free monotonic processes include:

Surrogate Objective Construction: Build a lower bound or majorizer of the true objective at the current iterate, tight at this point, and globally tractable (e.g., quadratic or separable).
Update by Global Minimization/Maximization: The iterate is updated by exact or analytical optimization of the surrogate (MM, convex trust region, fixed-point contraction).
Intrinsic Step-Sizing: Step sizes or trust-region radii are computed adaptively from contractiveness, local Lipschitz conditions, or dual parameter optimization, not by anchoring to a reference solution.
Monotonicity Acceptance Policy: Changes are accepted only when a strict (objective or certified) improvement is obtained, often enforced via statistically conservative bounds when stochastic or approximate scoring is required (Zhang et al., 30 Jan 2026).
Reference-Free Evaluation: No reliance on external reference signals, trajectories, or explicit models; all feedback is intrinsic—data, state distributions, learned surrogates, or multi-agent LLM/prover responses.

6. Comparative Table of Selected Reference-Free Iterative Monotonic Processes

Domain/Algorithm	Reference-Free Mechanism	Monotonicity Guarantee
MOTO (policy opt.)	KL-constrained update, no model	Per-step return lower bound
EMPI (policy iter.)	Avg TV/KL penalization, no ref.	Surrogate objective increase
SOLVIT (MM localization)	Majorizer, all TDOA pairs, no ref.	Cost non-increase per iter.
Halpern iteration	Parameter-free operator step	Potential descent
LMPC (MPC/control)	OCLBF certificate, no reference	Non-increasing cost
Autoformalization	LLM/prover composite, no gt-ref.	Plug-in/LCB objective

7. Significance and Impact

Reference-free iterative monotonic processes provide provable robustness and convergence without the fragilities introduced by reference tracking, model bias, or overfitting to pre-specified targets. They underpin advances in safe reinforcement learning, control without predefined trajectories, robust majorization in nonconvex estimation problems, monotone optimization in PDE and convex minimization contexts, and certified progress in data-driven and stochastic settings like autoformalization.

By leveraging monotonicity at every step and eliminating dependence on reference trajectories or models, these processes facilitate stable performance, strong generalization, and, increasingly, practical deployment in domains where reference information is unavailable or unreliable.

References:

(Akrour et al., 2016, Achiam, 2016, Hashimoto et al., 18 Jul 2025, Jyothi et al., 2019, Diakonikolas, 2020, Nevanlinna, 2021, Congreve et al., 2015, Zhang et al., 30 Jan 2026).