Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fast exploration and learning of latent graphs with aliased observations

Published 13 Mar 2023 in cs.LG and cs.AI | (2303.07397v4)

Abstract: We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. “A survey of exploration methods in reinforcement learning” In arXiv preprint arXiv:2109.00157, 2021
  2. Andrew G Barto, Steven J Bradtke and Satinder P Singh “Learning to act using real-time dynamic programming” In Artificial intelligence 72.1-2 Elsevier, 1995, pp. 81–138
  3. Matthew James Beal “Variational algorithms for approximate Bayesian inference” University of London, University College London (United Kingdom), 2003
  4. “Deepmind lab” In arXiv preprint arXiv:1612.03801, 2016
  5. Marc Bellemare, Joel Veness and Erik Talvitie “Skip context tree switching” In International conference on machine learning, 2014, pp. 1458–1466 PMLR
  6. “Unifying count-based exploration and intrinsic motivation” In Advances in neural information processing systems 29, 2016
  7. Marc G Bellemare “Count-Based Frequency Estimation with Bounded Memory.” In IJCAI, 2015, pp. 3337–3344
  8. “The mathematics of statistical machine translation: Parameter estimation” MIT Press, 1993
  9. “Empirical study of the benefits of overparameterization in learning latent variable models” In International Conference on Machine Learning, 2020, pp. 1211–1219 PMLR
  10. Anthony R Cassandra, Leslie Pack Kaelbling and Michael L Littman “Acting optimally in partially observable stochastic domains” In Aaai 94, 1994, pp. 1023–1028
  11. Anthony R Cassandra, Michael L Littman and Nevin Lianwen Zhang “Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes” In UAI’97: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, 1997
  12. “The electrical resistance of a graph captures its commute and cover times” In Proceedings of the twenty-first annual ACM symposium on Theory of computing, 1989, pp. 574–586
  13. “Learning to explore using active neural slam” In arXiv preprint arXiv:2004.05155, 2020
  14. Lonnie Chrisman “Reinforcement learning with perceptual aliasing: The perceptual distinctions approach” In AAAI 1992, 1992, pp. 183–188 Citeseer
  15. “Learning transferable graph exploration” In Advances in Neural Information Processing Systems 32, 2019
  16. “First return, then explore” In Nature 590.7847 Nature Publishing Group, 2021, pp. 580–586
  17. Jorge Fuentes-Pacheco, José Ruiz-Ascencio and Juan Manuel Rendón-Mancha “Visual simultaneous localization and mapping: a survey” In Artificial intelligence review 43 Springer, 2015, pp. 55–81
  18. “Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps” In Nature communications 12.1 Nature Publishing Group, 2021, pp. 1–17
  19. “Shaping belief states with generative environment models for rl” In Advances in Neural Information Processing Systems 32, 2019
  20. “Cognitive mapping and planning for visual navigation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2616–2625
  21. “Recurrent world models facilitate policy evolution” In Advances in neural information processing systems 31, 2018
  22. “Mastering atari with discrete world models” In arXiv preprint arXiv:2010.02193, 2020
  23. Dongqi Han, Kenji Doya and Jun Tani “Variational recurrent models for solving partially observable control tasks” In arXiv preprint arXiv:1912.10703, 2019
  24. “Multiple view geometry in computer vision” Cambridge university press, 2003
  25. Zhiao Huang, Fangchen Liu and Hao Su “Mapping state space using landmarks for universal goal reaching” In Advances in Neural Information Processing Systems 32, 2019
  26. “Deep variational reinforcement learning for POMDPs” In International Conference on Machine Learning, 2018, pp. 2117–2126 PMLR
  27. “Human-level performance in 3D multiplayer games with population-based reinforcement learning” In Science 364.6443 American Association for the Advancement of Science, 2019, pp. 859–865
  28. “Graph-Enhanced Exploration for Goal-oriented Reinforcement Learning”, 2021
  29. Leslie Pack Kaelbling, Michael L Littman and Anthony R Cassandra “Planning and acting in partially observable stochastic domains” In Artificial intelligence 101.1-2 Elsevier, 1998, pp. 99–134
  30. Peter Karkus, David Hsu and Wee Sun Lee “Qmdp-net: Deep learning for planning under partial observability” In Advances in neural information processing systems 30, 2017
  31. Matan Keidar and Gal A Kaminka “Robot exploration with fast frontier detection: theory and experiments” In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, 2012, pp. 113–120
  32. “Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models” In Transactions on Machine Learning Research, 2022
  33. Renaud Lambiotte, Martin Rosvall and Ingo Scholtes “From networks to optimal higher-order models of complex systems” In Nature physics 15.4 Nature Publishing Group, 2019, pp. 313–320
  34. “Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model” In Advances in Neural Information Processing Systems 33, 2020, pp. 741–752
  35. Michael L Littman, Anthony R Cassandra and Leslie Pack Kaelbling “Learning policies for partially observable environments: Scaling up” In Machine Learning Proceedings 1995 Elsevier, 1995, pp. 362–370
  36. Marlos C Machado, Marc G Bellemare and Michael Bowling “A laplacian framework for option discovery in reinforcement learning” In International Conference on Machine Learning, 2017, pp. 2295–2304 PMLR
  37. “Eigenoption discovery through the deep successor representation” In arXiv preprint arXiv:1710.11089, 2017
  38. David A McAllester and Satinder Singh “Approximate planning for factored POMDPs using belief state simplification” In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999
  39. Andrew Kachites McCallum “Reinforcement learning with selective perception and hidden state” University of Rochester, 1996
  40. R Andrew McCallum “Overcoming incomplete perception with utile distinction memory” In Proceedings of the Tenth International Conference on Machine Learning, 1993, pp. 190–196
  41. João V Messias and Shimon Whiteson “Dynamic-depth context tree weighting” In Advances in Neural Information Processing Systems 30, 2017
  42. “Solving POMDPs by searching the space of finite policies” In arXiv preprint arXiv:1301.6720, 2013
  43. “Kinematic state abstraction and provably efficient rich-observation reinforcement learning” In International conference on machine learning, 2020, pp. 6961–6971 PMLR
  44. Tianwei Ni, Benjamin Eysenbach and Ruslan Salakhutdinov “Recurrent model-free rl is a strong baseline for many pomdps” In arXiv preprint arXiv:2110.05038, 2021
  45. “Deep exploration via bootstrapped DQN” In Advances in neural information processing systems 29, 2016
  46. “Neural map: Structured memory for deep reinforcement learning” In arXiv preprint arXiv:1702.08360, 2017
  47. Jurgis Pasukonis, Timothy Lillicrap and Danijar Hafner “Evaluating Long-Term Memory in 3D Mazes” In arXiv preprint arXiv:2210.13383, 2022
  48. Joelle Pineau, Geoff Gordon and Sebastian Thrun “Point-based value iteration: An anytime algorithm for POMDPs” In Ijcai 3, 2003, pp. 1025–1032
  49. “Rapid task-solving in novel environments” In arXiv preprint arXiv:2006.03662, 2020
  50. Nikolay Savinov, Alexey Dosovitskiy and Vladlen Koltun “Semi-parametric topological memory for navigation” In arXiv preprint arXiv:1803.00653, 2018
  51. Jürgen Schmidhuber “Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments” In Institut für Informatik, Technische Universität München. Technical Report FKI-126 90, 1990
  52. Jürgen Schmidhuber “Reinforcement learning in Markovian and non-Markovian environments” In Advances in neural information processing systems 3, 1990
  53. “Resolving perceptual aliasing in the presence of noisy sensors” In Advances in Neural Information Processing Systems 17, 2004
  54. Guy Shani, Ronen I Brafman and Solomon E Shimony “Model-based online learning of POMDPs” In Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings 16, 2005, pp. 353–364 Springer
  55. “Learning overcomplete hmms” In Advances in Neural Information Processing Systems 30, 2017
  56. “Map Induction: Compositional spatial submap learning for efficient exploration in novel environments” In arXiv preprint arXiv:2110.12301, 2021
  57. Pranav Shyam, Wojciech Jaśkowski and Faustino Gomez “Model-based active exploration” In International conference on machine learning, 2019, pp. 5779–5788 PMLR
  58. Edward Jay Sondik “The optimal control of partially observable Markov processes” Stanford University, 1971
  59. Matthijs TJ Spaan and Nikos Vlassis “Perseus: Randomized point-based value iteration for POMDPs” In Journal of artificial intelligence research 24, 2005, pp. 195–220
  60. Alexander L Strehl and Michael L Littman “An analysis of model-based interval estimation for Markov decision processes” In Journal of Computer and System Sciences 74.8 Elsevier, 2008, pp. 1309–1331
  61. Sebastian Thrun “Probabilistic robotics” In Communications of the ACM 45.3 ACM New York, NY, USA, 2002, pp. 52–57
  62. “Gelling, and melting, large graphs by edge manipulation” In Proceedings of the 21st ACM international conference on Information and knowledge management, 2012, pp. 245–254
  63. Steven D Whitehead and Dana H Ballard “Learning to perceive and act by trial and error” In Machine Learning 7 Springer, 1991, pp. 45–83
  64. “The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation” In Cell 183.5 Elsevier, 2020, pp. 1249–1263
  65. Herbert S Wilf “The editor’s corner: the white screen problem” In The American Mathematical Monthly 96.8 Taylor & Francis, 1989, pp. 704–707
  66. Ji Xu, Daniel J Hsu and Arian Maleki “Benefits of over-parameterization with EM” In Advances in Neural Information Processing Systems 31, 2018
  67. Jian Xu, Thanuka L Wickramarathne and Nitesh V Chawla “Representing higher-order dependencies in networks” In Science advances 2.5 American Association for the Advancement of Science, 2016, pp. e1600028
  68. “Exploration in deep reinforcement learning: a comprehensive survey” In arXiv preprint arXiv:2109.06668, 2021
  69. “Efficient reinforcement learning in block mdps: A model-free representation learning approach” In International Conference on Machine Learning, 2022, pp. 26517–26547 PMLR
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.