Avoiding exponential support enumeration for LRA and minimizing TR without Constant-Support

Develop an implicit anytime algorithm with a guaranteed stopping criterion for robust Markov decision processes without the Constant-Support Assumption that solves long-run average reward and minimizing undiscounted total reward objectives on polytopic uncertainty sets without resorting to exponential enumeration of possible successor-support sets.

Background

The paper presents implicit anytime algorithms with stopping criteria for stochastic shortest path and maximizing total reward in robust MDPs that may violate the Constant-Support Assumption. For long-run average reward and minimizing total reward under these conditions, the authors explain that their current approach may require an exponential blowup by enumerating possible supports of transitions.

They explicitly conjecture that this exponential blowup can be avoided and mark this as a question for future work, indicating the need for an implicit technique that remains efficient while providing correctness guarantees.

References

We provide implicit anytime algorithms for SSP and maximizing TR and explain the complications for LRA and minimizing TR, for which we may need to resort to an exponential blowup by enumerating possible supports. We conjecture that this can be avoided but leave this question for future work.

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient  (2412.10185 - Meggendorfer et al., 2024) in Paragraph “Beyond Constant-Support,” Section 5