Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Published 16 Jan 2014 in cs.GT and cs.LG | (1401.3907v1)

Abstract: We extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.