Avoiding exponential support enumeration for LRA and minimizing TR without Constant-Support
Develop an implicit anytime algorithm with a guaranteed stopping criterion for robust Markov decision processes without the Constant-Support Assumption that solves long-run average reward and minimizing undiscounted total reward objectives on polytopic uncertainty sets without resorting to exponential enumeration of possible successor-support sets.
References
We provide implicit anytime algorithms for SSP and maximizing TR and explain the complications for LRA and minimizing TR, for which we may need to resort to an exponential blowup by enumerating possible supports. We conjecture that this can be avoided but leave this question for future work.
— Solving Robust Markov Decision Processes: Generic, Reliable, Efficient
(2412.10185 - Meggendorfer et al., 2024) in Paragraph “Beyond Constant-Support,” Section 5