LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces

Published 7 Apr 2026 in cs.CL and cs.AI | (2604.06086v1)

Abstract: Modern Transformer-based LLMs achieve strong performance in natural language processing tasks, yet their latent semantic spaces remain largely uninterpretable black boxes. This paper introduces LAG-XAI (Lie Affine Geometry for Explainable AI), a novel geometric framework that models paraphrasing not as discrete word substitutions, but as a structured affine transformation within the embedding space. By conceptualizing paraphrasing as a continuous geometric flow on a semantic manifold, we propose a computationally efficient mean-field approximation, inspired by local Lie group actions. This allows us to decompose paraphrase transitions into geometrically interpretable components: rotation, deformation, and translation. Experiments on the noisy PIT-2015 Twitter corpus, encoded with Sentence-BERT, reveal a "linear transparency" phenomenon. The proposed affine operator achieves an AUC of 0.7713. By normalizing against random chance (AUC 0.5), the model captures approximately 80% of the non-linear baseline's effective classification capacity (AUC 0.8405), offering explicit parametric interpretability in exchange for a marginal drop in absolute accuracy. The model identifies fundamental geometric invariants, including a stable matrix reconfiguration angle (~27.84°) and near-zero deformation, indicating local isometry. Cross-domain generalization is confirmed via direct cross-corpus validation on an independent TURL dataset. Furthermore, the practical utility of LAG-XAI is demonstrated in LLM hallucination detection: using a "cheap geometric check," the model automatically detected 95.3% of factual distortions on the HaluEval dataset by registering deviations beyond the permissible semantic corridor. This approach provides a mathematically grounded, resource-efficient path toward the mechanistic interpretability of Transformers.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces LAG-XAI, an affine-Lie framework that models paraphrasing as global affine transformations in Transformer embedding spaces.
The paper decomposes semantic transitions into rotation, deformation, and translation, revealing invariant descriptors such as rotation angle and deformation index.
The paper demonstrates robust performance in anomaly detection and cross-domain generalization, validated through empirical evaluation on multiple datasets.

LAG-XAI: Affine Lie Group Modeling for Interpretable Paraphrasing in Transformer Embedding Spaces

Introduction

This work introduces LAG-XAI, an affine-Lie geometric framework for interpretable paraphrasing in Transformer latent spaces. By formulating paraphrasing as a global affine transformation in the embedding space, LAG-XAI provides a continuum-theoretic and parametric mechanism for understanding semantic transitions, grounded in Lie group theory. The approach decomposes semantic changes into rotation, deformation, and translation within a high-dimensional manifold, thereby yielding interpretable XAI descriptors. Empirical evaluation is carried out on social media data (PIT-2015, TURL) using SBERT embeddings, with additional validation on hallucination detection (HaluEval).

Geometric Formulation of Paraphrasing

LAG-XAI posits that semantic paraphrasing is instantiated as an affine transformation $T(x) = Ax + t$ , approximating the local action of the Lie group $\mathrm{Aff}(n)$ on sentence embedding manifolds. Under the manifold hypothesis, semantic transitions are modeled as integral flows along an invariant vector field, leveraging the spatial homogeneity of transformer-induced semantic subspaces.

Lie group-inspired techniques, including polar decomposition and principal component analysis (PCA) of drift vectors, provide empirical surrogates for infinitesimal generators (Lie algebra elements). The transformation matrix $A$ and translation vector $t$ are estimated through geometrically regularized least-squares, incorporating both isometry-promoting Procrustes alignment and constraint to empirical semantic drift directions.

Mechanistic Interpretability and Invariant Extraction

Unlike static similarity metrics, the affine model supports the decomposition of paraphrase transitions into orthogonal (rotation $R$ ), symmetric (deformation $S$ ), and translation ( $t$ ) components. The statistical invariants—rotation angle $\theta$ , deformation index (Def), translation norm (Shift), and determinant sign (chirality)—yield a structured XAI profile of each semantic transformation.

Key empirical phenomena include:

Linear Transparency: Approximately 80% of SBERT’s effective paraphrase discrimination capacity arises from a single affine transformation, indicating a linearizable structure in transformer latent spaces.
Local Isometry: The semantic deformation index approaches zero under optimal regularization, confirming that legitimate paraphrasing corresponds to isometries—volume-conserving transformations—on the semantic manifold.
Geometric Constant: The mean structural reconfiguration angle ( $\theta \approx 27.84^\circ$ ) is nearly invariant across corpora and random seeds.
Semantic Chirality: A negative determinant (det( $A$ ) < 0) reveals that logically invertive operations (e.g., active-passive transformations, argument inversion) correspond to mirror reflections in embedding space.

Empirical Results and Strong Numerical Findings

On the PIT-2015 dev set, LAG-XAI's affine operator achieves an AUC of 0.7713 compared to the SBERT cosine baseline of 0.8405, capturing approximately 80% of the possible gain relative to random chance. The deformation index is tightly bounded (Def $\mathrm{Aff}(n)$ 0 0.00025), substantiating the local isometry hypothesis.

The framework robustly generalizes cross-domain:

Global consensus operators trained on PIT-2015 generalize to TURL with minimal performance degradation, indicating that identified invariants characterize intrinsic, architecture-dependent properties of the semantic manifold.
Domain-specific (local) models exhibit severe overfitting and poor generalization, emphasizing the necessity of global invariant modeling.

Hallucination detection via geometric anomaly is highly effective: on HaluEval, LAG-XAI identifies 95.3% of hallucinated samples (F1 = 92.8%), in a strict zero-shot regime, solely with geometric error thresholds.

Theoretical and Practical Implications

LAG-XAI’s geometric perspective reifies semantic similarity as structured, decomposable motion, rather than opaque pointwise distances. The explicit identification of invariant directions and geometric boundaries foundationally supports:

Geometrically equivariant Transformer architectures: Loss functions can regularize toward learned Lie subspaces, facilitating inherent paraphrase-invariance and robustness.
Automated anomaly and hallucination detection: The "cheap geometric check" is computationally lightweight and effective for online monitoring, obviating expensive autoregressive LLM querying.
Controlled latent space augmentation: Embedding adjustment via (A, t) allows continuous, interpretable generation and stylization without post-hoc token manipulation.

The distinction between rotation (syntactic restructuring) and translation (pragmatic drift) additionally enables fine-grained control and auditing in XAI settings.

Limitations

The affine model is fundamentally a first-order local approximation; long-range or highly non-linear transformations (deep summarization, multi-paragraph restructuring) are not encompassed. PCA- and Procrustes-based generator estimation provides empirically tractable, but mathematically approximate, surrogacy for true Lie algebraic mapping, particularly in high dimensions ( $\mathrm{Aff}(n)$ 1). Generality claims are currently supported for SBERT-based architectures and English; further work is needed for decoder-only LLMs and morphologically rich languages.

Future Directions

Subsequent research should pursue:

Integration of geometric invariants into LLM training, yielding paraphrase-equivariant or -invariant architectures.
Extension to piecewise-affine or higher-order deformation models for capturing strong non-linearities.
Validation across architectures and languages, potentially extracting universal geometric constants for semantic manifold modeling.
Real-time integration into LLM pipelines for dynamic interruption of hallucinations during generation, enforcing semantic guardrails.

Conclusion

LAG-XAI demonstrates that structured affine Lie group actions can explain a substantial portion of semantic motion in Transformer embedding spaces. This approach yields explicit, interpretable geometric invariants, achieves high accuracy and robustness under resource constraints, and supports both mechanistic interpretability and practical anomaly detection. The framework lays the groundwork for a geometric theory of NLP tasks, facilitating the transition from opaque statistical measures to controllable, physics-informed semantics operating directly in latent space.

Reference:

“LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces” (2604.06086)

Markdown Report Issue