Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints

Published 8 Oct 2025 in cs.LG and cs.GT | (2510.07266v1)

Abstract: We present an algorithm guaranteeing dynamic regret bounds for online omniprediction with long term constraints. The goal in this recently introduced problem is for a learner to generate a sequence of predictions which are broadcast to a collection of downstream decision makers. Each decision maker has their own utility function, as well as a vector of constraint functions, each mapping their actions and an adversarially selected state to reward or constraint violation terms. The downstream decision makers select actions "as if" the state predictions are correct, and the goal of the learner is to produce predictions such that all downstream decision makers choose actions that give them worst-case utility guarantees while minimizing worst-case constraint violation. Within this framework, we give the first algorithm that obtains simultaneous \emph{dynamic regret} guarantees for all of the agents -- where regret for each agent is measured against a potentially changing sequence of actions across rounds of interaction, while also ensuring vanishing constraint violation for each agent. Our results do not require the agents themselves to maintain any state -- they only solve one-round constrained optimization problems defined by the prediction made at that round.

Abstract PDF Upgrade to Chat

Summary

The paper introduces an algorithm for online omniprediction that achieves dynamic swap regret bounds with logarithmic dependence on subsequence complexity.
It provides stateless agent implementations, enabling one-round constrained optimization without maintaining historical state.
The method yields sublinear dynamic regret and bounded long-term constraint violations, offering strong guarantees in adversarial, non-stationary environments.

Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints

Problem Setting and Motivation

This work addresses the problem of online omniprediction in the presence of long-term constraints, where a centralized learner broadcasts predictions to a collection of downstream agents, each with their own utility and constraint functions. The agents select actions as if the predictions are correct, and the learner's objective is to ensure that all agents achieve strong utility guarantees while simultaneously minimizing their worst-case constraint violations. The setting is adversarial: outcomes are chosen by an adversary, and the agents' utilities and constraints are linear and Lipschitz in the outcome space.

A key challenge in this domain is to provide regret guarantees not just with respect to a static benchmark (a single action that is feasible across all rounds), but with respect to dynamic benchmarks—sequences of actions that may change over time, as long as the number of changes is sublinear in the time horizon. This is particularly important in adversarial or non-stationary environments, where the optimal action may shift as the environment evolves. The paper further strengthens the benchmark by considering swap regret, allowing agents to compete with the best action modification rule (possibly time-varying) rather than just the best sequence of actions.

Main Contributions

Logarithmic Dependence on Subsequence Complexity

The primary technical contribution is an algorithm that achieves regret and constraint violation guarantees on arbitrary collections of subsequences, with only logarithmic dependence on the number of subsequences. This is a significant improvement over prior work, which incurred linear dependence and thus could not provide nontrivial guarantees for dynamic regret (which requires considering all $O(T^2)$ contiguous intervals). The result is achieved by leveraging a conditionally unbiased prediction algorithm, instantiated to provide both decision calibration and infeasibility calibration for all agents and all relevant subsequences.

Dynamic Swap Regret and Stateless Agent Implementation

The algorithm provides dynamic swap regret bounds, a strictly stronger guarantee than dynamic external regret. Swap regret allows agents to compete with the best sequence of action modification rules, which can change over time, and is a superset of dynamic external regret (which corresponds to constant modification rules). The approach also enables a stateless implementation for downstream agents: agents simply solve a one-round constrained optimization problem using the current prediction, without maintaining any state or history. This is in contrast to prior elimination-based approaches, which required agents to track feasible action sets over time.

Theoretical Guarantees

The paper establishes the following key results:

Cumulative Constraint Violation: For any agent and any subsequence $S$ , the cumulative constraint violation is bounded by

$O\left( L_C |A| |S|^{1/4} + L_C \sqrt{|A||S|} + J (L_C + L_C^2) \sqrt{|S|} \right) \cdot \log(\cdot)$

where $L_C$ is the Lipschitz constant of the constraints, $|A|$ is the action space size, $J$ is the number of constraints, and the logarithmic factor depends on problem parameters.

Constrained Swap Regret: For any agent and any subsequence $S$ , the swap regret is bounded by

$O\left( L_U |A| |S|^{1/4} + L_U \sqrt{|A||S|} + J (L_C + L_C^2) |A| \sqrt{|S|} \right) \cdot \log(\cdot)$

where $L_U$ is the Lipschitz constant of the utility functions.

Dynamic Regret: For any dynamic benchmark with $\Delta$ changes, the dynamic swap regret is bounded by

$O\left( L_U |A| T^{1/4} \Delta^{3/4} + L_U \sqrt{|A| T \Delta} + J (L_C + L_C^2) |A| \sqrt{T \Delta} \right) \cdot \log(\cdot)$

This bound is sublinear in $T$ as long as $\Delta = o(T)$ .

Zero-Margin Benchmarks: For the standard (zero-margin) benchmark, the algorithm achieves regret and constraint violation bounds of order $\tilde{O}(T^{2/3})$ .

All guarantees hold simultaneously for all agents and all subsequences, with high probability.

Algorithmic Approach

The core of the approach is a prediction algorithm that ensures conditional unbiasedness with respect to both the agents' decision rules and infeasibility events, across all relevant subsequences. This is achieved by extending the Unbiased-Prediction algorithm of [noarov2023highdimensional], which provides high-dimensional, conditionally unbiased predictions in the online setting.

At each round, the forecaster outputs a prediction $p_t$ for the outcome $y_t$ . Each agent then solves the following one-round constrained optimization problem:

$\max_{a \in A} u(a, p_t) \quad \text{subject to} \quad c_j(a, p_t) \leq 0 \quad \forall j \in [J]$

If no action is predicted to be feasible, the agent may select any action; the analysis shows that this event occurs infrequently.

The prediction algorithm is constructed to ensure that, for any agent, any action, and any subsequence, the sum of prediction errors (in the $\ell_\infty$ norm) is sublinear in the number of rounds where the action is selected or predicted infeasible. This is achieved by maintaining a collection of events corresponding to all relevant subsequences and agent-action pairs, and applying a min-max optimization at each round to ensure unbiasedness across all events.

Technical Analysis

The regret and constraint violation bounds are derived by decomposing the cumulative utility and constraint violation into terms involving the difference between predicted and realized outcomes, and leveraging the conditional unbiasedness guarantees. The analysis uses martingale concentration inequalities (Freedman's and Azuma-Hoeffding's inequalities) to control deviations, and exploits the linearity and Lipschitz properties of the utility and constraint functions.

A key technical lemma shows that the number of rounds where a benchmark action is incorrectly predicted to be infeasible is small, scaling as $O(\sqrt{|S|})$ for each subsequence $S$ . This ensures that the stateless agent decision rule is sufficient to guarantee vanishing constraint violation and regret.

The dynamic regret bounds are obtained by partitioning the time horizon according to the change points in the dynamic benchmark, and summing the per-interval regret bounds. The logarithmic dependence on the number of subsequences is crucial for ensuring that the overall regret remains sublinear.

Implications and Future Directions

This work provides a unified and scalable framework for online omniprediction with long-term constraints, supporting strong regret guarantees for arbitrary collections of downstream agents and subsequences. The stateless agent implementation is particularly attractive for practical deployment, as it eliminates the need for agents to maintain complex state or history.

The logarithmic dependence on the number of subsequences opens the door to efficient dynamic regret guarantees in a wide range of adversarial and non-stationary environments. The extension to swap regret further strengthens the robustness of the guarantees, accommodating more general forms of agent adaptation.

Potential future directions include:

Extending the framework to continuous action spaces and non-linear utility/constraint functions.
Improving the rates for the zero-margin benchmark, or closing the gap between the strictly feasible and nominally feasible cases.
Investigating oracle-efficient or computationally scalable implementations for large-scale settings.
Applying the framework to real-world domains such as resource allocation, online markets, or adaptive control, where multiple agents interact with a shared predictive system under constraints.

Conclusion

The paper establishes a new state-of-the-art for online omniprediction with long-term constraints, providing the first dynamic swap regret bounds with only logarithmic dependence on the number of subsequences. The approach is general, robust, and practical, enabling stateless agent implementations and supporting strong guarantees for a broad class of downstream decision makers. The results have significant implications for the design of predictive systems in adversarial and non-stationary environments, and suggest several promising avenues for further research in online learning and sequential decision making.

Markdown Report Issue