Implicit MEC-search for objectives requiring agent fixation
Derive an implicit maximal end-component search procedure that identifies simple end component candidates for minimizing total reward with star=c, maximizing total reward with star=∞, and long-run average reward objectives in polytopic robust Markov decision processes where the agent policy must be fixed, thereby avoiding explicit construction while preserving the correctness of the bounded value iteration approach.
References
In these cases, we can resort to the explicit algorithm for polytopic RMDPs, but a general algorithm is elusive. We conjecture that it is possible to derive an implicit MEC-search algorithm for these cases, but leave it as future work.
— Solving Robust Markov Decision Processes: Generic, Reliable, Efficient
(2412.10185 - Meggendorfer et al., 2024) in Appendix F (Implicit Anytime Algorithm for RMDPs without Constant-Support), paragraph “Complications for Other Objectives and Non-Polytopic RMDPs”