Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning from Visual Observation via Offline Pretrained State-to-Go Transformer

Published 22 Jun 2023 in cs.LG and cs.CV | (2306.12860v1)

Abstract: Learning from visual observation (LfVO), aiming at recovering policies from only visual observation data, is promising yet a challenging problem. Existing LfVO approaches either only adopt inefficient online learning schemes or require additional task-specific information like goal states, making them not suited for open-ended tasks. To address these issues, we propose a two-stage framework for learning from visual observation. In the first stage, we introduce and pretrain State-to-Go (STG) Transformer offline to predict and differentiate latent transitions of demonstrations. Subsequently, in the second stage, the STG Transformer provides intrinsic rewards for downstream reinforcement learning tasks where an agent learns merely from intrinsic rewards. Empirical results on Atari and Minecraft show that our proposed method outperforms baselines and in some tasks even achieves performance comparable to the policy learned from environmental rewards. These results shed light on the potential of utilizing video-only data to solve difficult visual reinforcement learning tasks rather than relying on complete offline datasets containing states, actions, and rewards. The project's website and code can be found at https://sites.google.com/view/stgtransformer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Imitation learning with concurrent actions in 3d games. In IEEE Conference on Computational Intelligence and Games (CIG), 2018.
  2. Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564, 2018.
  3. Generative adversarial imitation from observation. arXiv preprint arXiv:1807.06158, 2018.
  4. Imitating latent policies from observation. In International Conference on Machine Learning (ICML), 2019.
  5. Imitating unknown policies via exploration. arXiv preprint arXiv:2008.05660, 2020.
  6. Mobile: Model-based imitation learning from observation alone. In Neural Information Processing Systems (NeurIPS), 2021.
  7. Off-policy imitation learning from observations. In Neural Information Processing Systems (NeurIPS), 2020.
  8. Imitation learning from observations under transition model disparity. In Neural Information Processing Systems (NeurIPS) Workshop on Deep Reinforcement Learning, 2021.
  9. Learning about progress from experts. In International Conference on Learning Representations (ICLR), 2023.
  10. Time-contrastive networks: Self-supervised learning from video. In IEEE International Conference on Robotics and Automation (ICRA), 2018.
  11. Playing hard exploration games by watching youtube. In Neural Information Processing Systems (NeurIPS), 2018.
  12. Imitation learning from video by leveraging proprioception. In International Joint Conference on Artificial Intelligence (IJCAI), 2019.
  13. Adversarial imitation learning from video using a state observer. In International Conference on Robotics and Automation (ICRA), 2022.
  14. Visual imitation learning with patch rewards. arXiv preprint arXiv:2302.00965, 2023.
  15. Generalizable imitation learning from observation via inferring goal proximity. In Neural Information Processing Systems (NeurIPS), 2021.
  16. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
  17. Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems (NeurIPS), 2021.
  18. Zero-shot visual imitation. In IEEE Computer Vision and Pattern Recognition (CVPR) Workshops, 2018.
  19. Behavioral cloning from observation. In International Joint Conference on Artificial Intelligence (IJCAI), 2018.
  20. Video pretraining (vpt): Learning to act by watching unlabeled online videos. In Neural Information Processing Systems (NeurIPS), 2022.
  21. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  22. Generative adversarial imitation learning. In Neural Information Processing Systems (NeurIPS), 2016.
  23. Guiding online reinforcement learning with action-free offline pretraining. arXiv preprint arXiv:2301.12876, 2023.
  24. Juergen Schmidhuber. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. In Neural Information Processing Systems (NeurIPS), 2019.
  25. Third-person imitation learning. arXiv preprint arXiv:1703.01703, 2017.
  26. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning (ICML), 2020.
  27. Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning (ICML), 2021.
  28. Attention is all you need. In Neural Information Processing Systems (NeurIPS), 2017.
  29. Offline reinforcement learning as one big sequence modeling problem. In Neural Information Processing Systems (NeurIPS), 2021.
  30. Multi-game decision transformers. In Neural Information Processing Systems (NeurIPS), 2022.
  31. Online decision transformer. In International Conference on Machine Learning (ICML), 2022.
  32. Reinforcement learning: An introduction. MIT press, 2018.
  33. Improving language understanding by generative pre-training, 2018.
  34. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In European Conference on Computer Vision (ECCV), 2020.
  35. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML), 2017.
  36. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  37. Imitation learning from pixel observations for continuous control. In Neural Information Processing Systems (NeurIPS) Workshop on Deep Reinforcement Learning, 2021.
  38. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  39. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning (ICML), 2020.
  40. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (ICML), 2018.
  41. Minedojo: Building open-ended embodied agents with internet-scale knowledge. In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022.
  42. Clip4mc: An rl-friendly vision-language model for minecraft. arXiv preprint arXiv:2303.10571, 2023.
  43. Grad-cam: Visual explanations from deep networks via gradient-based localization. In International Conference on Computer Vision (ICCV), 2017.
  44. Self-imitation learning. In International Conference on Machine Learning (ICML), 2018.
Citations (10)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.