Implicit computation of the minimal positive transition probability from H‑representation

Develop an implicit method to compute or tightly bound the minimal positive transition probability p_min for polytopic robust Markov decision processes when the uncertainty sets are given in H‑representation, avoiding enumeration of polytope vertices.

Background

The initialization procedure for bounded value iteration requires an a priori bound on the maximum finite expected total reward, which depends on a lower bound p_min on positive transition probabilities under optimal play. For polytopic uncertainty sets represented by vertices (V‑representation), p_min can be obtained by scanning vertices.

However, when polytopes are provided in H‑representation (linear inequalities), the authors do not know how to obtain p_min implicitly without converting to vertices, which can be exponentially large. They flag this as an unresolved methodological gap.

References

In a polytopic RMDP, p_min can be computed by finding the minimum transition probability occurring in any vertex of the polytope (combinations of vertices need not be checked, as p_min is given with respect to some optimal pair of policies, and memoryless deterministic policies which pick only vertices of the polytope are sufficient for optimal play). We are not aware of a way to do this implicitly when given a polytopes in H-representation.

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient  (2412.10185 - Meggendorfer et al., 2024) in Appendix E (Details for Section 5), subsection “Definition of Procedure: INIT,” paragraph on computing p_min