Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens

Published 16 Apr 2024 in eess.SY, cs.LG, cs.SY, and math.OC | (2404.10851v3)

Abstract: We provide the first known algorithm that provably achieves $\varepsilon$-optimality within $\widetilde{\mathcal{O}}(1/\varepsilon)$ function evaluations for the discounted discrete-time LQR problem with unknown parameters, without relying on two-point gradient estimates. These estimates are known to be unrealistic in many settings, as they depend on using the exact same initialization, which is to be selected randomly, for two different policies. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates, which either leads to $\widetilde{\mathcal{O}}(1/\varepsilon2)$ rates or heavily relies on stability assumptions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Y. Abbasi-Yadkori and C. Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.
  2. Logarithmic regret in adaptive control of noisy linear quadratic regulator systems using hints. Journal of Machine Learning Research, 2022. submitted, arXiv preprint arXiv:2210.16303.
  3. D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995.
  4. S. Bittanti and M. C. Campi. Adaptive control of linear time invariant systems: the “bet on the best” principle. Communications in Information & Systems, 6(4):299–320, 2006.
  5. Adaptive linear quadratic Gaussian control: the cost-biased approach revisited. SIAM Journal on Control and Optimization, 36(6):1890–1907, 1998.
  6. H. Chen and L. Guo. Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost. SIAM Journal on Control and Optimization, 25(4):845–867, 1987.
  7. H. Chen and J. Zhang. Identification and adaptive control for systems with unknown orders, delay, and coefficients. IEEE Transactions on Automatic Control, 35(8):866–877, 1990.
  8. Learning linear-quadratic regulators efficiently with only T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG regret. In International Conference on Machine Learning, pages 1300–1309. Proceedings of Machine Learning Research, 2019.
  9. Regret bounds for robust adaptive control of the linear quadratic regulator. Advances in Neural Information Processing Systems, 31, 2018.
  10. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476. PMLR, 2018.
  11. A model-free first-order method for linear quadratic regulator with O~⁢(1/ε)~𝑂1𝜀\tilde{O}(1/\varepsilon)over~ start_ARG italic_O end_ARG ( 1 / italic_ε ) sampling complexity, 2023.
  12. R. E. Kalman et al. Contributions to the theory of optimal control. Bol. Soc. Mat. Mexicana, 5(2):102–119, 1960.
  13. T. Lai and C. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):154–166, 1982.
  14. B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302–1338, 2000.
  15. E. B. Lee and L. Markus. Foundations of optimal control theory. Wiley, New York, 1967.
  16. On the resolvent condition in the kreiss matrix theorem. BIT Numerical Mathematics, 24:584–591, 1984.
  17. Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. Journal of Machine Learning Research, 21(21):1–51, 2020.
  18. Certainty equivalence is efficient for linear quadratic control. In Advances in Neural Information Processing Systems, volume 32, 2019.
  19. Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem. IEEE Transactions on Automatic Control, 67(5):2435–2450, 2022.
  20. M. Simchowitz and D. Foster. Naive exploration is optimal for online LQR. In International Conference on Machine Learning, pages 8937–8948. Proceedings of Machine Learning Research, 2020.
  21. M. N. Spijker. On a conjecture by le veque and trefethen related to the kreiss matrix theorem. BIT Numerical Mathematics, 9:551–555, 1991.
  22. C. M. Stein. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9(6):1135–1151, 1981.
  23. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, pages 1057–1063, 1999.
  24. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
  25. Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost. In Neural Information Processing Systems, 2019.
  26. M. Zhou and J. Lu. Single timescale actor-critic method to solve the linear quadratic regulator with convergence guarantees. Journal of Machine Learning Research, 24(222):1–34, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 29 likes about this paper.