Model Selection for Inverse Reinforcement Learning via Structural Risk Minimization
Abstract: Inverse reinforcement learning (IRL) usually assumes the reward function model is pre-specified as a weighted sum of features and estimates the weighting parameters only. However, how to select features and determine a proper reward model is nontrivial and experience-dependent. A simplistic model is less likely to contain the ideal reward function, while a model with high complexity leads to substantial computation cost and potential overfitting. This paper addresses this trade-off in the model selection for IRL problems by introducing the structural risk minimization (SRM) framework from statistical learning. SRM selects an optimal reward function class from a hypothesis set minimizing both estimation error and model complexity. To formulate an SRM scheme for IRL, we estimate the policy gradient from given demonstration as the empirical risk, and establish the upper bound of Rademacher complexity as the model penalty of hypothesis function classes. The SRM learning guarantee is further presented. In particular, we provide the explicit form for the linear weighted sum setting. Simulations demonstrate the performance and efficiency of our algorithm.
- Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
- A survey of inverse reinforcement learning. Artificial Intelligence Review, 55(6):4307–4346, 2022.
- A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500, 2021.
- Dynamic inverse reinforcement learning for characterizing animal behavior. Advances in Neural Information Processing Systems, 35:29663–29676, 2022.
- Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- Infinite-horizon policy-gradient estimation. journal of artificial intelligence research, 15:319–350, 2001.
- A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307, 2012.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Construction of a 3d object recognition and manipulation database from grasp demonstrations. Autonomous Robots, 40:175–192, 2016.
- Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5):1902–1914, 2001.
- Learning driving styles for autonomous vehicles from demonstration. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 2641–2646. IEEE, 2015.
- Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks. Autonomous Robots, 41:593–612, 2017.
- Structural risk minimization for switched system identification. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 1002–1007. IEEE, 2020.
- Foundations of machine learning. MIT Press, 2nd edition, 2018.
- Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
- Inverse reinforcement learning through policy gradient minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765. IEEE, 2018.
- Bayesian inverse reinforcement learning. In IJCAI, volume 7, pages 2586–2591, 2007.
- Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems, 3:297–330, 2020.
- Structural risk minimization over data-dependent hierarchies. IEEE transactions on Information Theory, 44(5):1926–1940, 1998.
- Structural risk minimization for learning nonlinear dynamics. arXiv preprint arXiv:2309.16527, 2023.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Behavioral cloning from observation. arXiv preprint arXiv:1805.01954, 2018.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Modeling interaction via the principle of maximum causal entropy. 2010.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.