TensorLens: End-to-End Transformer Analysis
- TensorLens is a framework that represents a transformer as an input-dependent linear operator using a high-order attention–interaction tensor to encapsulate all computational components.
- It reformulates multi-head attention, layer normalization, feed-forward networks, and residuals into a unified linear Jacobian through precise vectorization techniques.
- Empirical tests demonstrate that TensorLens outperforms traditional methods in visualization, probing, and manipulation of transformer behaviors, supporting tasks like model distillation and ablation.
TensorLens is a theoretical and practical framework for end-to-end transformer analysis, centered on the construction of a high-order attention–interaction tensor that encodes a full transformer as a single, input-dependent linear operator. This tensorial representation captures all computational components of a transformer block—including multi-head attention, feed-forward networks (FFN), layer normalizations, and residual connections—in a unified, expressive formalism. TensorLens provides both the mathematical apparatus and empirical tools for interpretability, visualization, manipulation, and probing of transformer architectures, overcoming limitations of previous attention-aggregation methodologies (Atad et al., 25 Jan 2026).
1. Mathematical Formulation of the High-Order Attention–Interaction Tensor
TensorLens formulates a vanilla -layer transformer, at fixed input , as an input-conditioned linear operator , where is the sequence length and the hidden dimension, so that and
with a matrix unfolding of . The model-wide tensor is the ordered product of per-layer block-tensors : and thereby
Each encapsulates attention, both residual and non-residual pathways, layer normalizations, and FFN operations as a single linear transformation: where represents the multi-head attention tensor, and are the two layer normalization tensors, is the FFN linearization tensor, and the identity. The Kronecker-products and diagonalizations required to form these sub-tensors are derived explicitly for each operation, with the entire construction being local—i.e., functionally dependent on the specific input instance by using statistics (e.g., layernorm means, variances, activation slopes) observed on .
2. Stepwise Derivation and Structural Intuition
The derivation exploits standard vectorization identities to recast all individual sub-layer computations into the form , specifically:
- Self-attention: Combines token–token interactions (length–length) and feature–feature correlations (dimension–dimension) as a sum of Kronecker products , where is the per-head attention matrix and are value/output projections.
- Layer Normalization & FFN: Conditioned on fixed input (so are frozen), both LayerNorm and FFN become data-dependent diagonal linear operators.
- Residuals: Incorporated via additive identity; vectorized as .
- Compositionality: Stacking all blocks yields a nested or concatenated product of the blockwise .
This linearization is locally faithful—by definition, is the exact Jacobian of the transformer's forward function, patched such that all nonlinearities (softmax, LayerNorm, activation slopes) are “frozen” at values computed on .
3. Computational Construction and Example Pseudocode
To compute for a given input , TensorLens uses automatic differentiation. The approach fixes nonlinearities (i.e., computes and freezes softmax weights, norms, and activation slopes at ) and computes the output’s total Jacobian with respect to the input:
1 2 3 4 5 6 7 8 9 10 11 |
def compute_tensor(model, X): # 1) run a forward pass to cache all A_h, σ, φ′, etc. Y = model.forward_with_cache(X) # 2) define a linearized version that uses the cached A_h, φ′, σ def lin_model(X_): return model.linearized_forward(X_) # returns (L,D) # 3) compute the Jacobian dY/dX: shape (L,D,L,D) T = torch.autograd.functional.jacobian(lin_model, X, create_graph=False) # T has shape (L,D,L,D) return T |
4. Comparison to Previous Attention Aggregation Methodologies
TensorLens differs fundamentally from earlier aggregation schemes. The following table summarizes its relation to major prior approaches:
| Method | Included Components | Notable Omissions |
|---|---|---|
| Attn (head-averaging) | per layer | Projections, residuals, FFN, LN |
| Rollout [Abnar & Zuidema] | Chained averages | As above |
| Value-weighted [Kobayashi] | Incorporates | LayerNorm, residuals, FFN |
| W.AttnResLN [Kobayashi '21] | Residuals, first LN | FFN, second LN |
| GlobEnc [Modarressi '22] | Two LNs added | FFN |
| TensorLens | All linear ops, input/output embeddings, activations | — |
Only TensorLens:
- Is fully principled, incorporating all linear operations, both LayerNorms, both FFN projections, activation slopes, residual adds, and embeddings.
- Is exact (first-order) at a given , as it is the literal Jacobian of the model’s patched forward function (local error bounded by Proposition 1).
- Offers flexible axis collapses to derive generalized or specialized attention maps that subsume previous variants.
5. Empirical Evaluation and Applications
Extensive empirical tests demonstrate that TensorLens provides superior fidelity and interpretability compared to previous aggregation schemes.
- Perturbation Tests: On DeiT-Base/Small (ImageNet), TensorLens-based maps (“Tensor,Norm” and “Tensor,In+Out”) achieve AUC 0.66/0.82 (versus 0.60 for any non-tensor baseline). For BERT-family and Gemma3 models on IMDB, TensorLens AUC 0.10/0.16 (0.09 for non-tensor baselines). On decoder-only LLMs (Pythia-1B, Pico-570M, Phi-1.5 on WikiText-103), TensorLens is top-1 or top-2 by HS-MSE, AOPC metrics.
- Relation Decoding: Averaging per-example tensors yields a relation-specific linear map, matching or exceeding the Linear Relation Extraction (LRE) baseline on Pythia-1B, which only considers embedding.
- Interpretability and Visualization: By collapsing to attention maps (via norms, in+out embeddings, or per-class output projections), TensorLens recovers or extends attribution maps for input token importance. Examples include , , and .
- Manipulation and Distillation: (as the local linearization of at ) is directly usable for linear distillation (cf. “LoLCats” by Zhang et al. ’24). Model interventions can be effected by masking subtensors within , with immediate re-evaluation of collapsed attention maps.
A memory-efficient Jacobian-slice implementation, as well as full code and worked examples, are available at https://github.com/idoatad/TensorLens.
6. Theoretical Guarantees and Scope
TensorLens is theoretically grounded, representing the first complete, input-dependent, high-order tensor formalization of a transformer’s global linear behavior. It encapsulates all prior “extended attention” proposals as strict special cases—achievable via particular axis reductions or omission of components. Proposition 1 in the source material provides local error bounds for the Jacobian approximation. The framework operates directly with input and output embeddings, and includes the capacity to trace, ablate, or visualize the influence of any model subcomponent within at the granularity of tokens, neurons, or projection subspaces.
A plausible implication is that TensorLens may serve as a foundational analytic tool for the next generation of mechanistic interpretability and model-editing methodologies, providing fine-grained, exact, and extensible representations of transformer computations.
Reference: [TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors, (Atad et al., 25 Jan 2026)]