Papers
Topics
Authors
Recent
Search
2000 character limit reached

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations

Published 21 Aug 2025 in cs.CV | (2508.15404v1)

Abstract: Masked Autoencoders (MAEs) have emerged as a powerful pretraining technique for vision foundation models. Despite their effectiveness, they require extensive hyperparameter tuning (masking ratio, patch size, encoder/decoder layers) when applied to novel datasets. While prior theoretical works have analyzed MAEs in terms of their attention patterns and hierarchical latent variable models, the connection between MAE hyperparameters and performance on downstream tasks is relatively unexplored. This work investigates how MAEs learn spatial correlations in the input image. We analytically derive the features learned by a linear MAE and show that masking ratio and patch size can be used to select for features that capture short- and long-range spatial correlations. We extend this analysis to non-linear MAEs to show that MAE representations adapt to spatial correlations in the dataset, beyond second-order statistics. Finally, we discuss some insights on how to select MAE hyper-parameters in practice.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 75 likes about this paper.