Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Published 19 Dec 2023 in cs.LG, cs.AI, and stat.ML | (2312.11863v1)

Abstract: Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with $\mathcal{C}$-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and the neural network approximation theory for the H\"{o}lder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (102)
  1. Neural network learning: theoretical foundations. Cambridge University Press.
  2. Fitted Q-iteration in continuous action-space MDPs. In NeurIPS.
  3. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning.
  4. Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning. arXiv preprint arXiv:2202.11566.
  5. Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. JMLR.
  6. On deep learning as a remedy for the curse of dimensionality in nonparametric regression. The Annals of Statistics.
  7. Adversarial model for offline reinforcement learning. arXiv preprint arXiv:2302.11048.
  8. Fractals in probability and analysis, volume 162. Cambridge University Press.
  9. Mitigating covariate shift in imitation learning via offline data with partial coverage. In NeurIPS.
  10. Information-theoretic considerations in batch reinforcement learning. In ICML.
  11. Deep operator learning lessens the curse of dimensionality for PDEs. arXiv preprint arXiv:2301.12227.
  12. Adversarially trained actor critic for offline reinforcement learning. In ICML.
  13. Imagenet: A large-scale hierarchical image database. In CVPR.
  14. A theoretical analysis of deep Q-learning. In Learning for Dynamics and Control.
  15. Error propagation for approximate policy and value iteration. In NeurIPS.
  16. Deep neural networks for estimation and inference. Econometrica.
  17. A kernel loss for solving the bellman equation. In NeurIPS.
  18. Projection pursuit regression. Journal of the American statistical Association.
  19. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219.
  20. Off-policy deep reinforcement learning without exploration. In ICML.
  21. Learning theory estimates with observations from general stationary stochastic processes. Neural computation.
  22. A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning. The Annals of Statistics.
  23. Optimal smoothing in single-index models. The Annals of Statistics.
  24. Ibragimov, I. A. 1962. Some limit theorems for stationary processes. Theory of Probability & Its Applications.
  25. Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks. In ICLR.
  26. Jiang, N. 2019. On value functions and the agent-environment boundary. arXiv preprint arXiv:1905.13341.
  27. Minimax value interval for off-policy evaluation and policy optimization. In NeurIPS.
  28. Deep nonparametric regression on approximately low-dimensional manifolds. Annals of Statistics.
  29. Approximation bounds for norm constrained neural networks with applications to regression and GANs. Applied and Computational Harmonic Analysis.
  30. Is pessimism provably efficient for offline rl? In ICML.
  31. Approximately optimal approximate reinforcement learning. In ICML.
  32. Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning. Operations Research.
  33. Offline reinforcement learning with implicit Q-learning. In Deep RL Workshop NeurIPS 2021.
  34. Krizhevsky, A. 2009. Learning multiple layers of features from tiny images. Master’s thesis, University of Tront.
  35. Stabilizing off-policy q-learning via bootstrapping error reduction. In NeurIPS.
  36. Conservative q-learning for offline reinforcement learning. In NeurIPS.
  37. Batch reinforcement learning. Reinforcement learning: State-of-the-art.
  38. Safe policy improvement with baseline bootstrapping. In ICML.
  39. Gradient-based learning applied to document recognition. Proceedings of the IEEE.
  40. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.
  41. Settling the sample complexity of model-based offline reinforcement learning. arXiv preprint arXiv:2204.05275.
  42. Batch policy learning in average reward markov decision processes. The Annals of Statistics.
  43. Neural trust region/proximal policy optimization attains globally optimal policy. In NeurIPS.
  44. Provably good batch off-policy reinforcement learning without great exploration. In NeurIPS.
  45. Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis.
  46. Mildly conservative Q-learning for offline reinforcement learning. In NeurIPS.
  47. Maume-Deschamps, V. 2006. Exponential inequalities and functional estimations for weak dependent data: applications to dynamical systems. Stochastics and Dynamics.
  48. Human-level control through deep reinforcement learning. Nature.
  49. Rademacher complexity bounds for non-iid processes. In NeurIPS.
  50. Stability Bounds for Stationary φ𝜑\varphiitalic_φ-mixing and β𝛽\betaitalic_β-mixing Processes. JMLR.
  51. On the number of linear regions of deep neural networks. In NeurIPS.
  52. Munos, R. 2007. Performance bounds in l_p-norm for approximate value iteration. SIAM journal on control and optimization.
  53. Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. JMLR.
  54. On sample complexity of offline reinforcement learning with deep ReLU networks in Besov spaces. TMLR.
  55. On instance-dependent bounds for offline reinforcement learning with linear function approximation. arXiv preprint arXiv:2211.13208.
  56. Sensitivity and generalization in neural networks: an empirical study. In ICLR.
  57. The intrinsic dimension of images and its impact on learning. In ICLR.
  58. Entropy-based concentration inequalities for dependent variables. In ICML.
  59. Bridging offline reinforcement learning and imitation learning: A tale of pessimism. In NeurIPS.
  60. Optimal conservative offline rl with general function approximation via augmented lagrangian. arXiv preprint arXiv:2211.00716.
  61. Dimensionality compression and expansion in deep neural networks. arXiv preprint arXiv:1906.00443.
  62. RAMBO-RL: robust adversarial model-based offline reinforcement learning. In NeurIPS.
  63. Rosenblatt, M. 1956a. A central limit theorem and a strong mixing condition. Proceedings of the national Academy of Sciences.
  64. Rosenblatt, M. 1956b. Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics.
  65. On empirical risk minimization with dependent and heavy-tailed data. In NeurIPS.
  66. Scherrer, B. 2014. Approximate policy iteration schemes: a comparison. In ICML.
  67. Schmidt-Hieber, A. J. 2020. Nonparametric regression using deep neural networks with ReLU activation function. Annals of statistics.
  68. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems & Imaging, 16(1).
  69. Deep network approximation characterized by number of neurons. arXiv preprint arXiv:1906.05497.
  70. Deep network with approximation error being reciprocal of width to power of square root of depth. Neural Computation.
  71. Statistical inference of the value function for reinforcement learning in infinite-horizon settings. Journal of the Royal Statistical Society Series B: Statistical Methodology.
  72. Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity. In ICML.
  73. Keep doing what worked: Behavioral modelling priors for offline reinforcement learning. In ICLR.
  74. Mastering the game of Go with deep neural networks and tree search. Nature.
  75. A kernel two-sample test for dynamical systems. arXiv preprint arXiv:2004.11098.
  76. Learning from dependent observations. Journal of Multivariate Analysis.
  77. Stone, C. J. 1982. Optimal global rates of convergence for nonparametric regression. The Annals of Statistics.
  78. Stone, C. J. 1985. Additive regression and other nonparametric models. The Annals of Statistics.
  79. Reinforcement learning: An introduction. MIT press.
  80. Suzuki, T. 2018. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality. arXiv preprint arXiv:1810.08033.
  81. Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space. In NeurIPS.
  82. Finite time bounds for sampling based fitted value iteration. In ICML.
  83. Minimax weight and q-function learning for off-policy evaluation. In ICML.
  84. Pessimistic model-based offline reinforcement learning under partial coverage. arXiv preprint arXiv:2107.06226.
  85. Vershynin, R. 2018. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press.
  86. Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150.
  87. What are the statistical limits of offline RL with linear function approximation? In ICLR.
  88. Wolpert, D. H. 1996. The lack of a priori distinctions between learning algorithms. Neural computation.
  89. Bellman-consistent pessimism for offline reinforcement learning. In NeurIPS.
  90. Q* approximation schemes for batch reinforcement learning: A theoretical comparison. In UAI.
  91. Batch value-function approximation with only realizability. In ICML.
  92. Yarotsky, D. 2017. Error bounds for approximations with deep ReLU networks. Neural Networks.
  93. Yarotsky, D. 2018. Optimal approximation of continuous functions by very deep ReLU networks. In COLT.
  94. Yarotsky, D. 2021. Elementary superexpressive activations. In ICML.
  95. Near-optimal offline reinforcement learning via double variance reduction. In NeurIPS.
  96. Yu, B. 1994. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability.
  97. Bellman residual orthogonalization for offline reinforcement learning. arXiv preprint arXiv:2203.12786.
  98. Offline reinforcement learning with realizability and single-policy concentrability. In COLT.
  99. Variational policy gradient method for reinforcement learning with general utilities. In NeurIPS.
  100. Corruption-robust offline reinforcement learning. In AISTATS.
  101. Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach. In AISTATS.
  102. Finite-sample analysis for sarsa with linear function approximation. In NeurIPS.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.