Computational Hardness of Reinforcement Learning with Partial $q^π$-Realizability

Published 24 Oct 2025 in cs.AI, cs.CC, and cs.LG | (2510.21888v1)

Abstract: This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^{{\pi}$-realizability.} In this framework, the objective is to learn an $\epsilon$-optimal policy with respect to a predefined policy set $\Pi$, under the assumption that all value functions for policies in $\Pi$ are linearly realizable. The assumptions of this framework are weaker than those in $q^{{\pi}$-realizability} but stronger than those in $q^{*$-realizability,} providing a practical model where function approximation naturally arises. We prove that learning an $\epsilon$-optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those in $q^{*$-realizability} and suggest computational difficulty persists even when $\Pi$ is expanded beyond the optimal policy. To establish this, we reduce from two complexity problems, $\delta$-Max-3SAT and $\delta$-Max-3SAT(b), to instances of GLinear-$\kappa$-RL (greedy policy) and SLinear-$\kappa$-RL (softmax policy). Our findings indicate that positive computational results are generally unattainable in partial $q^{{\pi}$-realizability,} in contrast to $q^{{\pi}$-realizability} under a generative access model.