Implicit MEC-search for objectives requiring agent fixation

Derive an implicit maximal end-component search procedure that identifies simple end component candidates for minimizing total reward with star=c, maximizing total reward with star=∞, and long-run average reward objectives in polytopic robust Markov decision processes where the agent policy must be fixed, thereby avoiding explicit construction while preserving the correctness of the bounded value iteration approach.

Background

The authors’ implicit anytime algorithm relies on identifying simple end components (SECs) by searching for maximal end components (MECs) under a fixed policy. For the objectives they handle implicitly (SSP and maximizing TR with star=c), fixing the environment policy suffices and yields a finite-action MDP amenable to implicit MEC search.

For other objectives (minimizing TR with star=c, maximizing TR with star=∞, and LRA), fixing the agent policy would result in exponential or infinite action spaces, making implicit MEC discovery nontrivial. The authors conjecture that an implicit MEC-search algorithm for these cases exists and leave this as future work.

References

In these cases, we can resort to the explicit algorithm for polytopic RMDPs, but a general algorithm is elusive. We conjecture that it is possible to derive an implicit MEC-search algorithm for these cases, but leave it as future work.

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient  (2412.10185 - Meggendorfer et al., 2024) in Appendix F (Implicit Anytime Algorithm for RMDPs without Constant-Support), paragraph “Complications for Other Objectives and Non-Polytopic RMDPs”