Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes

Published 17 Apr 2025 in cs.LG | (2504.12991v2)

Abstract: We study the robustness of Transformer LLMs under semantic out-of-distribution (OOD) shifts, where training and test data lie in disjoint latent spaces. Using Wasserstein-1 distance and Gevrey-class smoothness, we derive sub-exponential upper bounds on prediction error. Our theoretical framework explains how smoothness governs generalization under distributional drift. We validate these findings through controlled experiments on arithmetic and Chain-of-Thought tasks with latent permutations and scalings. Results show empirical degradation aligns with our bounds, highlighting the geometric and functional principles underlying OOD generalization in Transformers.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.