Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis
Abstract: Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration efficiency and robustness. Risk-sensitive policy gradient methods, which incorporate both expected return and risk measures, have been explored for their ability to yield more robust policies, yet their iteration complexity remains largely underexplored. In this work, we conduct a rigorous iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm with an exponential utility function. We establish an iteration complexity of $\mathcal{O}(\epsilon{-2})$ to reach an $\epsilon$-approximate first-order stationary point (FOSP). Furthermore, we investigate whether risk-sensitive algorithms can achieve better iteration complexity compared to their risk-neutral counterparts. Our analysis indicates that risk-sensitive REINFORCE can potentially converge faster. To validate our analysis, we empirically evaluate the learning performance and convergence efficiency of the risk-neutral and risk-sensitive REINFORCE algorithms in multiple environments: CartPole, MiniGrid, and Robot Navigation. Empirical results confirm that risk-averse cases can converge and stabilize faster compared to their risk-neutral counterparts. More details can be found on our website https://ruiiu.github.io/riskrl.
- Reinforcement learning algorithms: An overview and classification. In 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–7. IEEE, 2021.
- Infinite-horizon policy-gradient estimation. journal of artificial intelligence research, 15:319–350, 2001.
- Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
- Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
- Reinforcement learning in economics and finance. Computational Economics, pp. 1–38, 2021.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
- Sample complexity of episodic fixed-horizon reinforcement learning. Advances in Neural Information Processing Systems, 28, 2015.
- Epistemic risk-sensitive reinforcement learning. arXiv preprint arXiv:1906.06273, 2019.
- Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret. Advances in Neural Information Processing Systems, 33:22384–22395, 2020.
- Angelos Filos. Reinforcement learning for portfolio management. arXiv preprint arXiv:1909.09571, 2019.
- Entropic risk optimization in discounted mdps. In International Conference on Artificial Intelligence and Statistics, pp. 47–76. PMLR, 2023.
- Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3-4):142–363, 2017.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
- Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
- Sham Machandranath Kakade. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom), 2003.
- Better theory for sgd in the nonconvex world. arXiv preprint arXiv:2002.03329, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- The sample-complexity of general reinforcement learning. In International Conference on Machine Learning, pp. 28–36. PMLR, 2013.
- Learning bounds for risk-sensitive learning. Advances in Neural Information Processing Systems, 33:13867–13879, 2020.
- Data-driven distributionally robust optimal control with state-dependent noise. arXiv preprint arXiv:2303.02293, 2023.
- Risk-sensitive inverse reinforcement learning via coherent risk models. In Robotics: science and systems, volume 16, pp. 117, 2017.
- Risk-sensitive reinforcement learning. Machine learning, 49:267–290, 2002.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Risk-sensitive reinforce: A monte carlo policy gradient algorithm for exponential performance criteria. In 2021 60th IEEE Conference on Decision and Control (CDC), pp. 1522–1527. IEEE, 2021.
- Risk-sensitive reinforcement learning with exponential criteria. arXiv preprint arXiv:2212.09010, 2022.
- Matteo Papini. Safe policy optimization. 2021.
- Stochastic variance-reduced policy gradient. In International conference on machine learning, pp. 4026–4035. PMLR, 2018.
- Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends® in Machine Learning, 15(5):537–693, 2022.
- Rmix: Learning risk-sensitive policies for cooperative reinforcement learning agents. Advances in Neural Information Processing Systems, 34:23049–23062, 2021.
- Risk-sensitive reinforcement learning. Neural computation, 26(7):1298–1328, 2014.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Sebastian U Stich. Unified optimal analysis of the (stochastic) gradient method. arXiv preprint arXiv:1907.04232, 2019.
- Reinforcement learning: An introduction. MIT press, 2018.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Sample efficient policy gradient methods with recursive variance reduction. arXiv preprint arXiv:1909.08610, 2019.
- An improved convergence analysis of stochastic variance-reduced policy gradient. In Uncertainty in Artificial Intelligence, pp. 541–551. PMLR, 2020.
- A general sample complexity analysis of vanilla policy gradient. In International Conference on Artificial Intelligence and Statistics, pp. 3332–3380. PMLR, 2022.
- Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Transactions on Neural Networks and Learning Systems, 32(12):5435–5444, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.