Can a Transformer Represent a Kalman Filter?
Abstract: Transformers are a class of autoregressive deep learning architectures which have recently achieved state-of-the-art performance in various vision, language, and robotics tasks. We revisit the problem of Kalman Filtering in linear dynamical systems and show that Transformers can approximate the Kalman Filter in a strong sense. Specifically, for any observable LTI system we construct an explicit causally-masked Transformer which implements the Kalman Filter, up to a small additive error which is bounded uniformly in time; we call our construction the Transformer Filter. Our construction is based on a two-step reduction. We first show that a softmax self-attention block can exactly represent a Nadaraya-Watson kernel smoothing estimator with a Gaussian kernel. We then show that this estimator closely approximates the Kalman Filter. We also investigate how the Transformer Filter can be used for measurement-feedback control and prove that the resulting nonlinear controllers closely approximate the performance of standard optimal control policies such as the LQG controller.
- “Rt-1: Robotics transformer for real-world control at scale” In arXiv preprint arXiv:2212.06817, 2022
- “Decision transformer: Reinforcement learning via sequence modeling” In Advances in neural information processing systems 34, 2021, pp. 15084–15097
- “An image is worth 16x16 words: Transformers for image recognition at scale” In arXiv preprint arXiv:2010.11929, 2020
- Babak Hassibi, Ali H Sayed and Thomas Kailath “Indefinite-Quadratic estimation and control: a unified approach to H2 and H-Infinity theories” SIAM, 1999
- Thomas Kailath, Ali H Sayed and Babak Hassibi “Linear estimation” Prentice Hall, 2000
- “Supervised Pretraining Can Learn In-Context Reinforcement Learning” In arXiv preprint arXiv:2306.14892, 2023
- Licong Lin, Yu Bai and Song Mei “Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining” In arXiv preprint arXiv:2310.08566, 2023
- “Formal algorithms for transformers” In arXiv preprint arXiv:2207.09238, 2022
- “Attention is all you need” In Advances in neural information processing systems 30, 2017
- “Xlnet: Generalized autoregressive pretraining for language understanding” In Advances in neural information processing systems 32, 2019
- Qinqing Zheng, Amy Zhang and Aditya Grover “Online decision transformer” In international conference on machine learning, 2022, pp. 27042–27059 PMLR
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.