Establish whether Tabular Language Models learn tabular invariances or rely on non-tabular mechanisms

Determine whether Tabular Language Models that serialize tables into text (including models such as Tabula-8B and related fine-tuned LLMs) actually learn tabular invariances—specifically row-permutation and column-permutation invariance—and generalize across tabular prediction tasks, or whether their observed performance primarily arises from non-tabular mechanisms such as instruction-following abilities and format familiarity.

Background

The paper questions the core premise behind Tabular LLMs (TLMs), namely that with sufficient scale they would generalize across the structure and invariances of tabular data. Because TLMs serialize tables into sequential text and inherit next-token priors, it is unclear if they learn tabular-specific invariances such as row- and column-permutation invariance, or if performance stems from other sources.

This uncertainty motivates the study’s central inquiry before the authors’ empirical analysis: whether TLMs’ reported generalization reflects genuine tabular reasoning or evaluation artifacts and non-tabular capabilities.

References

Yet it remains unclear whether TLMs actually learn tabular invariances and generalize over tabular data or succeed through other mechanisms entirely.

The Illusion of Generalization: Re-examining Tabular Language Model Evaluation  (2602.04031 - Gorla et al., 3 Feb 2026) in Section 1 (Introduction)