Interval Regret Algorithms

Updated 8 January 2026

Interval Regret Algorithms are methods that measure loss over any contiguous interval, enabling adaptive performance in nonstationary or adversarial settings.
They employ geometric interval covering and meta-expert tracking techniques to achieve near-optimal regret bounds in online convex optimization and multi-armed bandit problems.
These algorithms extend to robust combinatorial optimization, offering finite convergence guarantees through logic-based Benders and LP-based heuristic formulations.

Interval Regret Algorithms are a broad class of methods for online learning and robust combinatorial optimization where performance is measured not only against full-horizon comparators but also via "interval regret": the regret with respect to a comparator over arbitrary contiguous intervals. They provide foundational guarantees in nonstationary or adversarial environments and unify several optimality criteria. These algorithms span online convex optimization, bandit learning, combinatorial min-max regret, and competitive analysis.

1. Formal Definition and Problem Classes

Interval regret, often called strongly adaptive regret, measures the maximum deviation from the best static comparator over every contiguous subinterval. In online convex optimization (OCO) with decision set $\mathcal{K}$ and losses $\ell_t$ , the interval regret on $I=[s,t]$ is defined as

$\mathrm{Regret}_{I}(A) = \sum_{j=s}^{t}\ell_j(x_j) - \min_{x\in\mathcal{K}}\sum_{j=s}^{t}\ell_j(x)$

The worst-case across all intervals yields the strongly adaptive regret

$\mathrm{SA\text{-}Regret}(k) = \max_{I:|I|=k}\mathrm{Regret}_{I}(A)$

This framework generalizes to settings such as multi-armed bandits (interval regret against the best arm on $I$ ), combinatorial optimization with interval costs (worst-case regret for 0-1 decision variables), Metrical Task Systems (interval regret with movement costs), scheduling under processing time uncertainty, and two-stage robust optimization with interval uncertainty.

2. Interval Regret in Online Convex Optimization

The state-of-the-art interval regret algorithms for OCO leverage meta-expert/specialist tracking frameworks, geometric covering intervals, and adaptive or second-order meta-algorithms. The essential construction uses a set of experts, each running a base OCO subroutine over a prespecified interval, and a meta-algorithm that blends their predictions.

Key methods and guarantees:

Meta-expert covering: Instantiating base experts on a geometric set of intervals, activating them as "sleeping experts", and aggregating with multiplicative weights or adaptive hedges (e.g., AdaNormalHedge, Adapt-ML-Prod) (Zhang et al., 2020, Zhang et al., 2019, Zhang et al., 1 Aug 2025).
Regret bounds: For convex losses, near-optimal interval regret $O(\sqrt{|I| \log T})$ (Zhang et al., 2020), and for exp-concave or strongly convex losses, logarithmic dependence $O((d/\alpha)\log |I|\log T)$ or $O((1/\lambda)\log |I|\log T)$ respectively (Zhang et al., 2019, Zhang et al., 1 Aug 2025).
Dual adaptivity: Simultaneously minimizes interval regret for multiple loss curvature classes without parameter tuning (Zhang et al., 1 Aug 2025).
Efficient reductions: FLH-type construction (Follow-the-Leading-History), which reduces meta-expert overhead to $O(\log\log T)$ per round while achieving near-optimal $O(\sqrt{|I|})$ interval regret (Lu et al., 2022).
Interior-point methods: Achieve $O(\sqrt{(t-s+1)\log T})$ interval regret with only one linear system solve per iteration, improving computational efficiency over FTRL and OGD in complex feasible sets (Hazan et al., 2023).
Projection-free algorithms: Achieve adaptive regret bounds (e.g., $O(\sqrt{T})$ for SO-accessible sets) with $O(T)$ separation oracle calls—without orthogonal projections—using infeasible projections and blockwise updates (Garber et al., 2022).
Strongly adaptive regret in control: MARC meta-algorithm transfers sublinear total regret of base controllers to sublinear interval regret in time-varying linear dynamical control via specialist tracking and potential functions (Gradu et al., 2020).

3. Interval Regret in Bandit and Competitive Settings

In adversarial multi-armed bandit problems, interval regret emerges as a stringent performance metric. Notable findings include:

Lower bound: With only one query per round, worst-case interval regret is provably almost linear: $\Omega(|I|^{1-\epsilon})$ (Lu et al., 2024).
Two-query breakthrough: The Strongly Adaptive Bandit Learner (StABL) achieves tight interval regret $\tilde{O}(\sqrt{n|I|})$ for $n$ arms with only two queries per round, using geometric interval coverage and unbiased loss estimators (Lu et al., 2024).
Extension to bandit convex optimization: Analogous geometric meta-expert and multi-point probing algorithms yield $O(dGD\sqrt{|I|}\log^2 T)$ interval regret (Lu et al., 2024).
Switching cost and MTS: For Metrical Task Systems, interval regret bounds match $\sqrt{(D+1)|I|\log(NT)}$ , while simultaneously maintaining the optimal competitive ratio (Daniely et al., 2019).

4. Robust Combinatorial Optimization Under Interval Uncertainty

Interval regret plays a central role in robust ("min-max regret") combinatorial optimization, especially with uncertain cost intervals. The prototypical problems are:

Interval Min-Max Regret ILP:

Decision variables $x\in\{0,1\}^n$ , cost intervals $l_i\le c_i\le u_i$ , regret $R(x,S) = F(x,S) - F^*(S)$ , robustness $Z(x) = \max_S R(x,S)$ (Carvalho et al., 2019, Assunção et al., 2016, Assunção et al., 2020).
Worst-case scenario for $x$ is $c_i=u_i$ if $x_i=1$ , $c_i=l_i$ else [Aissi et al.].
MILP and logic-based Benders decomposition formulations: master maintains $x$ and an auxiliary bound, separation subproblem solves the classical NP-hard problem under worst-case cost vector, and new cuts are generated for violated regrets (Assunção et al., 2020).
Finite convergence: Proven for all interval 0-1 min-max regret problems—even with NP-hard subproblems—since each separation generates a new cut from the finite feasible set (Assunção et al., 2020).

Metaheuristics and heuristics:

AMU (Mean-Upper): Solves deterministic ILP under mean and upper scenarios and selects the better; guaranteed 2-approximation (Carvalho et al., 2019, Assunção et al., 2016).
SBA: Samples a grid of scenarios between lower and upper bounds and selects among solutions.
LP-based heuristic (LPH): Exploits dual information from LP relaxation to construct a single compact MILP yielding near-optimal bounds, often outperforming AMU and Benders in practical instances (Assunção et al., 2016).

Extensions:

Min-max regret scheduling under interval uncertainty (Drwal, 2017).
Two-stage min-max regret for partial-commitment combinatorial models: Second-stage costs are interval-uncertain, leading to additional complexity; algorithms leverage row-column generation, compact MIP, and greedy heuristics (Goerigk et al., 2020).

5. Unified Principles and Algorithmic Frameworks

Interval regret algorithms share common structural ideas across domains:

Geometric interval covering: Any interval $[s,t]$ is efficiently covered by $O(\log(t-s+1))$ dyadic intervals, ensuring meta-algorithms can track local behavior.
Sleeping expert meta-algorithms: Experts are activated only on their intervals, with linear or adaptive weighting to aggregate predictions (Zhang et al., 2019, Zhang et al., 1 Aug 2025, Zhang et al., 2020).
Surrogate losses and multiple learning rates: Second-order and curvature-friendly surrogates (exp-concave, strongly convex) allow universality and tight bounds without parameter tuning.
Cut generation in combinatorial problems: Logic-based Benders replaces polyhedral duality for combinatorial subproblems, ensuring optimal convergence even when subproblems are NP-hard (Assunção et al., 2020).
Efficient meta-expert management: Doubly-exponential lifespan spacing reduces computational overhead to $O(\log\log T)$ per round, enabling practical implementation for large $T$ (Lu et al., 2022).

6. Computational Results and Practical Implications

Empirical evidence and computational studies consistently show that interval regret methods outperform traditional full-horizon regret minimizers in dynamic or adversarial settings.

In robust ILP instances, logic-based Benders and LP-based heuristics provide near-optimal solutions, with LPH improving or matching Benders' bounds with greatly reduced CPU time (Assunção et al., 2016).
AMU and SBA heuristics are fast and near-optimal, though LPH is preferred for hard instances (Carvalho et al., 2019).
In OCO, adaptive meta-expert methods retain near-optimal interval regret and dynamic regret while incurring only $O(\log^2 T)$ or $O(\log\log T)$ overhead (Zhang et al., 2019, Lu et al., 2022).
Bandit interval regret lower bounds make clear the necessity of multi-probing: two-query methods are necessary and sufficient for tight $\tilde{O}(\sqrt{n|I|})$ adaptive regret (Lu et al., 2024).

7. Open Problems and Future Directions

Continued research in interval regret algorithms raises several open questions and future avenues:

Dimensionality- and parameter-free interval regret bounds.
Extending interval regret guarantees to broader function classes (weak curvature, composite losses).
Single-projection (OCO) constructions with optimal interval regret and minimal overhead.
Robust combinatorial frameworks for multi-stage and networked uncertainties, requiring new tractable approximations.
Practical enhancements in meta-expert pool management and heuristic cut selection for Benders-type methods.

A plausible implication is that interval regret algorithms will anchor future advances in robust online decision processes, adversarial learning environments, and competitive analysis, offering unified, computationally efficient, and universally adaptive frameworks.