Learning World Models with Identifiable Factorization
Abstract: Extracting a stable and compact representation of the environment is crucial for efficient reinforcement learning in high-dimensional, noisy, and non-stationary environments. Different categories of information coexist in such environments -- how to effectively extract and disentangle these information remains a challenging problem. In this paper, we propose IFactor, a general framework to model four distinct categories of latent state variables that capture various aspects of information within the RL system, based on their interactions with actions and rewards. Our analysis establishes block-wise identifiability of these latent variables, which not only provides a stable and compact representation but also discloses that all reward-relevant factors are significant for policy learning. We further present a practical approach to learning the world model with identifiable blocks, ensuring the removal of redundants but retaining minimal and sufficient information for policy optimization. Experiments in synthetic worlds demonstrate that our method accurately identifies the ground-truth latent variables, substantiating our theoretical findings. Moreover, experiments in variants of the DeepMind Control Suite and RoboDesk showcase the superior performance of our approach over baselines.
- Le Chang and Doris Y Tsao. The code for facial identity in the primate brain. Cell, 169(6):1013–1028, 2017.
- Invariant visual representation by single neurons in the human brain. Nature, 435(7045):1102–1107, 2005.
- Jürgen Schmidhuber. On learning to think: Algorithmic information theory for novel combinations of reinforcement learning controllers and recurrent neural world models. CoRR, abs/1511.09249, 2015.
- World models. CoRR, abs/1803.10122, 2018.
- Learning latent dynamics for planning from pixels. In Proceedings of the 36th International Conference on Machine Learning, pages 2555–2565, Long Beach, California, 2019.
- Dream to control: Learning behaviors by latent imagination. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
- Mastering atari with discrete world models. In Proceedings of the 9th International Conference on Learning Representations, Virtual Event, Austria, 2021.
- Mastering diverse domains through world models. CoRR, abs/2301.04104, 2023.
- Denoised mdps: Learning world models better than the world itself. In Proceedings of the 39th International Conference on Machine Learning, pages 22591–22612, Baltimore, Maryland, 2022.
- A survey on interpretable reinforcement learning. CoRR, abs/2112.13112, 2021.
- Learning task informed abstractions. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 3480–3491, Virtual Event, 2021.
- Action-sufficient state representation learning for control with structural constraints. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 9260–9279, Baltimore, Maryland, 2022.
- Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity. In Proceedings of the 25th International Conference on Machine Learning, volume 307, pages 424–431, Helsinki, Finland, 2008.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Challenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 4114–4124, Long Beach, California, 2019.
- Nonlinear independent component analysis: Existence and uniqueness results. Neural networks, 12(3):429–439, 1999.
- Learning temporally causal latent processes from general temporal data. In Proceedings of the 10th International Conference on Learning Representations, Virtual Event, 2022.
- Temporally disentangled representation learning. In Advances in Neural Information Processing Systems 35, New Orleans, Louisiana, 2022.
- Disentanglement via mechanism sparsity regularization: A new principle for nonlinear ICA. In Proceedings of the 1st Conference on Causal Learning and Reasoning, volume 177, pages 428–484, Eureka, CA, 2022.
- Partial disentanglement for domain adaptation. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 11455–11472, Baltimore, Maryland, 2022.
- On the identifiability of nonlinear ica: Sparsity and beyond. In Advances in Neural Information Processing Systems 35, New Orleans, Louisiana, 2022.
- Identification of nonlinear latent hierarchical models. CoRR, abs/2306.07916, 2023.
- An introduction to variational methods for graphical models. Machine learning, 37:183–233, 1999.
- The information bottleneck method. CoRR, physics/0004057, 2000.
- Deep variational information bottleneck. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
- Mutual information neural estimation. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 530–539, Stockholm, Sweden, 2018.
- Self-supervised learning with data augmentations provably isolates content from style. In Advances in Neural Information Processing Systems 34, pages 16451–16467, Virtual Event, 2021.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
- Disentangling by factorising. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2654–2663, Stockholm, Sweden, 2018.
- Towards nonlinear disentanglement in natural data with temporal sparse coding. In Proceedings of the 9th International Conference on Learning Representations, Virtual Event, Austria, 2021.
- Nonlinear ICA of temporally dependent stationary sources. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54, pages 460–469, Fort Lauderdale, FL, 2017.
- Vladimir Vovk. Kernel ridge regression. In Empirical Inference - Festschrift in Honor of Vladimir N. Vapnik, pages 105–116, 2013.
- Tongzhou Wang. Robodesk with a diverse set of distractors. https://github.com/SsnL/robodesk, 2022.
- Robodesk: A multi-task reinforcement learning benchmark. https://github.com/google-research/robodesk, 2021.
- Deepmind control suite. CoRR, abs/1801.00690, 2018.
- Learning invariant representations for reinforcement learning without reconstruction. In Proceedings of the 9th International Conference on Learning Representations, Virtual Event, Austria, 2021.
- CURL: contrastive unsupervised representations for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 5639–5650, Virtual Event, 2020.
- Predictive information accelerates learning in RL. In Advances in Neural Information Processing Systems 33, Virtual Event, 2020.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International conference on machine learning, pages 1861–1870, Stockholm, Sweden, 2018.
- Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems 28, pages 2746–2754, Quebec, Canada, 2015.
- From pixels to torques: Policy learning with deep dynamical models. CoRR, abs/1502.02251, 2015.
- Time-contrastive networks: Self-supervised learning from video. In Proceedings of the 35th International Conference on Robotics and Automation, pages 1134–1141, Brisbane, Australia, 2018.
- Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018.
- Learning predictive representations for deformable objects using contrastive estimation. In Proceedings of the 5th Conference on Robot Learning, pages 564–574, 2021.
- Deep reinforcement and infomax learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33, pages 3686–3698, Virtual Event, 2020.
- Invariant action effect model for reinforcement learning. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pages 9260–9268, Virtual Event, 2022.
- Pablo Samuel Castro. Scalable methods for computing state similarity in deterministic markov decision processes. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pages 10069–10076, New York, NY, 2020.
- Toward causal representation learning. Proc. IEEE, 109(5):612–634, 2021.
- Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. In Advances in Neural Information Processing Systems 29, pages 3765–3773, Barcelona, Spain, 2016.
- CITRIS: causal identifiability from temporal intervened sequences. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 13557–13603, Baltimore, Maryland, 2022.
- Causal representation learning for instantaneous and temporal effects in interactive systems. In Proceedings of the 11th International Conference on Learning Representations, 2023.
- Causation, Prediction, and Search. Spring-Verlag Lectures in Statistics, 1993.
- J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2000.
- Self-supervised learning with data augmentations provably isolates content from style. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, pages 16451–16467, Virtual Event, 2021.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22:268:1–268:8, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.