Fast exploration and learning of latent graphs with aliased observations
Abstract: We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime.
- “A survey of exploration methods in reinforcement learning” In arXiv preprint arXiv:2109.00157, 2021
- Andrew G Barto, Steven J Bradtke and Satinder P Singh “Learning to act using real-time dynamic programming” In Artificial intelligence 72.1-2 Elsevier, 1995, pp. 81–138
- Matthew James Beal “Variational algorithms for approximate Bayesian inference” University of London, University College London (United Kingdom), 2003
- “Deepmind lab” In arXiv preprint arXiv:1612.03801, 2016
- Marc Bellemare, Joel Veness and Erik Talvitie “Skip context tree switching” In International conference on machine learning, 2014, pp. 1458–1466 PMLR
- “Unifying count-based exploration and intrinsic motivation” In Advances in neural information processing systems 29, 2016
- Marc G Bellemare “Count-Based Frequency Estimation with Bounded Memory.” In IJCAI, 2015, pp. 3337–3344
- “The mathematics of statistical machine translation: Parameter estimation” MIT Press, 1993
- “Empirical study of the benefits of overparameterization in learning latent variable models” In International Conference on Machine Learning, 2020, pp. 1211–1219 PMLR
- Anthony R Cassandra, Leslie Pack Kaelbling and Michael L Littman “Acting optimally in partially observable stochastic domains” In Aaai 94, 1994, pp. 1023–1028
- Anthony R Cassandra, Michael L Littman and Nevin Lianwen Zhang “Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes” In UAI’97: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, 1997
- “The electrical resistance of a graph captures its commute and cover times” In Proceedings of the twenty-first annual ACM symposium on Theory of computing, 1989, pp. 574–586
- “Learning to explore using active neural slam” In arXiv preprint arXiv:2004.05155, 2020
- Lonnie Chrisman “Reinforcement learning with perceptual aliasing: The perceptual distinctions approach” In AAAI 1992, 1992, pp. 183–188 Citeseer
- “Learning transferable graph exploration” In Advances in Neural Information Processing Systems 32, 2019
- “First return, then explore” In Nature 590.7847 Nature Publishing Group, 2021, pp. 580–586
- Jorge Fuentes-Pacheco, José Ruiz-Ascencio and Juan Manuel Rendón-Mancha “Visual simultaneous localization and mapping: a survey” In Artificial intelligence review 43 Springer, 2015, pp. 55–81
- “Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps” In Nature communications 12.1 Nature Publishing Group, 2021, pp. 1–17
- “Shaping belief states with generative environment models for rl” In Advances in Neural Information Processing Systems 32, 2019
- “Cognitive mapping and planning for visual navigation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2616–2625
- “Recurrent world models facilitate policy evolution” In Advances in neural information processing systems 31, 2018
- “Mastering atari with discrete world models” In arXiv preprint arXiv:2010.02193, 2020
- Dongqi Han, Kenji Doya and Jun Tani “Variational recurrent models for solving partially observable control tasks” In arXiv preprint arXiv:1912.10703, 2019
- “Multiple view geometry in computer vision” Cambridge university press, 2003
- Zhiao Huang, Fangchen Liu and Hao Su “Mapping state space using landmarks for universal goal reaching” In Advances in Neural Information Processing Systems 32, 2019
- “Deep variational reinforcement learning for POMDPs” In International Conference on Machine Learning, 2018, pp. 2117–2126 PMLR
- “Human-level performance in 3D multiplayer games with population-based reinforcement learning” In Science 364.6443 American Association for the Advancement of Science, 2019, pp. 859–865
- “Graph-Enhanced Exploration for Goal-oriented Reinforcement Learning”, 2021
- Leslie Pack Kaelbling, Michael L Littman and Anthony R Cassandra “Planning and acting in partially observable stochastic domains” In Artificial intelligence 101.1-2 Elsevier, 1998, pp. 99–134
- Peter Karkus, David Hsu and Wee Sun Lee “Qmdp-net: Deep learning for planning under partial observability” In Advances in neural information processing systems 30, 2017
- Matan Keidar and Gal A Kaminka “Robot exploration with fast frontier detection: theory and experiments” In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, 2012, pp. 113–120
- “Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models” In Transactions on Machine Learning Research, 2022
- Renaud Lambiotte, Martin Rosvall and Ingo Scholtes “From networks to optimal higher-order models of complex systems” In Nature physics 15.4 Nature Publishing Group, 2019, pp. 313–320
- “Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model” In Advances in Neural Information Processing Systems 33, 2020, pp. 741–752
- Michael L Littman, Anthony R Cassandra and Leslie Pack Kaelbling “Learning policies for partially observable environments: Scaling up” In Machine Learning Proceedings 1995 Elsevier, 1995, pp. 362–370
- Marlos C Machado, Marc G Bellemare and Michael Bowling “A laplacian framework for option discovery in reinforcement learning” In International Conference on Machine Learning, 2017, pp. 2295–2304 PMLR
- “Eigenoption discovery through the deep successor representation” In arXiv preprint arXiv:1710.11089, 2017
- David A McAllester and Satinder Singh “Approximate planning for factored POMDPs using belief state simplification” In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999
- Andrew Kachites McCallum “Reinforcement learning with selective perception and hidden state” University of Rochester, 1996
- R Andrew McCallum “Overcoming incomplete perception with utile distinction memory” In Proceedings of the Tenth International Conference on Machine Learning, 1993, pp. 190–196
- João V Messias and Shimon Whiteson “Dynamic-depth context tree weighting” In Advances in Neural Information Processing Systems 30, 2017
- “Solving POMDPs by searching the space of finite policies” In arXiv preprint arXiv:1301.6720, 2013
- “Kinematic state abstraction and provably efficient rich-observation reinforcement learning” In International conference on machine learning, 2020, pp. 6961–6971 PMLR
- Tianwei Ni, Benjamin Eysenbach and Ruslan Salakhutdinov “Recurrent model-free rl is a strong baseline for many pomdps” In arXiv preprint arXiv:2110.05038, 2021
- “Deep exploration via bootstrapped DQN” In Advances in neural information processing systems 29, 2016
- “Neural map: Structured memory for deep reinforcement learning” In arXiv preprint arXiv:1702.08360, 2017
- Jurgis Pasukonis, Timothy Lillicrap and Danijar Hafner “Evaluating Long-Term Memory in 3D Mazes” In arXiv preprint arXiv:2210.13383, 2022
- Joelle Pineau, Geoff Gordon and Sebastian Thrun “Point-based value iteration: An anytime algorithm for POMDPs” In Ijcai 3, 2003, pp. 1025–1032
- “Rapid task-solving in novel environments” In arXiv preprint arXiv:2006.03662, 2020
- Nikolay Savinov, Alexey Dosovitskiy and Vladlen Koltun “Semi-parametric topological memory for navigation” In arXiv preprint arXiv:1803.00653, 2018
- Jürgen Schmidhuber “Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments” In Institut für Informatik, Technische Universität München. Technical Report FKI-126 90, 1990
- Jürgen Schmidhuber “Reinforcement learning in Markovian and non-Markovian environments” In Advances in neural information processing systems 3, 1990
- “Resolving perceptual aliasing in the presence of noisy sensors” In Advances in Neural Information Processing Systems 17, 2004
- Guy Shani, Ronen I Brafman and Solomon E Shimony “Model-based online learning of POMDPs” In Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings 16, 2005, pp. 353–364 Springer
- “Learning overcomplete hmms” In Advances in Neural Information Processing Systems 30, 2017
- “Map Induction: Compositional spatial submap learning for efficient exploration in novel environments” In arXiv preprint arXiv:2110.12301, 2021
- Pranav Shyam, Wojciech Jaśkowski and Faustino Gomez “Model-based active exploration” In International conference on machine learning, 2019, pp. 5779–5788 PMLR
- Edward Jay Sondik “The optimal control of partially observable Markov processes” Stanford University, 1971
- Matthijs TJ Spaan and Nikos Vlassis “Perseus: Randomized point-based value iteration for POMDPs” In Journal of artificial intelligence research 24, 2005, pp. 195–220
- Alexander L Strehl and Michael L Littman “An analysis of model-based interval estimation for Markov decision processes” In Journal of Computer and System Sciences 74.8 Elsevier, 2008, pp. 1309–1331
- Sebastian Thrun “Probabilistic robotics” In Communications of the ACM 45.3 ACM New York, NY, USA, 2002, pp. 52–57
- “Gelling, and melting, large graphs by edge manipulation” In Proceedings of the 21st ACM international conference on Information and knowledge management, 2012, pp. 245–254
- Steven D Whitehead and Dana H Ballard “Learning to perceive and act by trial and error” In Machine Learning 7 Springer, 1991, pp. 45–83
- “The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation” In Cell 183.5 Elsevier, 2020, pp. 1249–1263
- Herbert S Wilf “The editor’s corner: the white screen problem” In The American Mathematical Monthly 96.8 Taylor & Francis, 1989, pp. 704–707
- Ji Xu, Daniel J Hsu and Arian Maleki “Benefits of over-parameterization with EM” In Advances in Neural Information Processing Systems 31, 2018
- Jian Xu, Thanuka L Wickramarathne and Nitesh V Chawla “Representing higher-order dependencies in networks” In Science advances 2.5 American Association for the Advancement of Science, 2016, pp. e1600028
- “Exploration in deep reinforcement learning: a comprehensive survey” In arXiv preprint arXiv:2109.06668, 2021
- “Efficient reinforcement learning in block mdps: A model-free representation learning approach” In International Conference on Machine Learning, 2022, pp. 26517–26547 PMLR
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.