Jacobian Scopes: Token-Level Causal Attribution
- Jacobian Scopes are a framework for token-level causal attribution, quantifying input influence using first-order gradients.
- The method uses semantic, Fisher, and Temperature variants to accurately measure contributions to various output metrics in neural models.
- It leverages automatic differentiation for efficient computation, balancing precision and cost in large-scale interpretability tasks.
Jacobian Scopes constitute a rigorous framework for token-level causal attribution in neural models, particularly LLMs. The concept originates in modern interpretability research, where the objective is to quantify the degree to which each input token influences specific model outputs. Jacobian Scopes apply first-order differentiation of hidden states with respect to inputs to produce local, model-blind influence assignments. The methodology generalizes to a variety of output quantities—including specific logits, the probability distribution, and model confidence—through gradient projection. This article provides an authoritative, comprehensive synthesis of the state-of-the-art in Jacobian Scopes, spanning formal definitions, algorithmic methodology, empirical applications, and computational trade-offs (Liu et al., 23 Jan 2026).
1. Formal Definition and Preliminaries
Jacobian Scopes measure the sensitivity of a model's output, via its post-norm final hidden state to infinitesimal perturbations in each input token embedding . For an autoregressive LLM with output logits , vocabulary , and final-layer hidden representation , the token-wise Jacobian is
where indexes input tokens.
A projection vector encodes the output quantity of interest (e.g., a specific logit, model confidence, or full distribution geometry). The influence score for token is then
interpreted as the maximal first-order change in induced by a unit-norm perturbation to .
This formulation is compelling because it provides exact, local attributions independently of internal model decompositions such as attention heads or circuit pathways.
2. Variants of Jacobian Scopes: Semantic, Fisher, and Temperature
Jacobian Scopes are instantiated via three principal variants, each differing in the choice of projection direction and the functional attributed.
Semantic Scope: Attributes influence to a specific target token by selecting , the corresponding row of . Thus, , and . This variant isolates causal contributions to a single logit, requiring only one backward pass.
Fisher Scope: Attributes influence to the entire predictive distribution. The Fisher information geometry is pulled back through as , and influence is measured by . Accurately computing requires backward passes; efficient approximations are possible for scalability.
Temperature Scope: Attributes influence to model confidence, defined via inverse effective temperature and normalized direction . Then and . This variant approximates distribution-wide attribution at the computational cost of a single backward pass.
All variants are agnostic to architectural details, relying on global differentiability from outputs to inputs.
3. Algorithmic Procedures and Computational Complexity
Jacobian Scopes are implemented via automatic differentiation, typically in frameworks supporting full gradient backpropagation. The procedural template is as follows:
- For Semantic and Temperature Scopes, compute scalar loss (e.g., , ), then backpropagate to for all in a single pass.
- For Fisher Scope, backpropagate for each orthogonal output direction to assemble the full tensor, then contract with to obtain per token. This scales linearly with .
The optimization of this matrix chain product computation, particularly in programs consisting of many sequential differentiable subprograms, draws on the literature of scheduled Jacobian chaining. The dynamic programming heuristic elegantly balances serial and parallel computation to nearly match optimal schedules, with median cost ratios for realistic chain lengths and machine counts (Märtens et al., 9 May 2025).
4. Applications in LLMs: Attribution, Bias Detection, and Translation
Jacobian Scopes have been empirically validated in multiple contexts:
- Instruction Understanding: Attribution scores for model predictions such as "truthful" or "deceitful" in prompts sharply peak on pivotal tokens ("deceive," "argue").
- Political Bias: Scopes identify context tokens ("Columbia," "the South") as dominant drivers of logit predictions for "liberal" and "conservative," exposing training-set induced model biases.
- Machine Translation: Fisher Scopes yield token-to-token alignments, while Temperature Scopes reveal phrase-level regions of influence, consistent across semantic divergences.
- In-Context Learning (ICL): Applied to time-series forecasting, Scopes reveal the tendency of LLMs to attend to near-cutoff motifs or shift focus downstream for stochastic processes (Brownian motion).
These findings underscore the analytic power of token-level gradient-based attribution and its role in mechanistic interpretability.
5. Methodological Comparison: Advantages and Limitations
Jacobian Scopes contrast with prior interpretability techniques as follows:
- Attention weights: Not reliably causal; reflect distribution of computation, not influence.
- Activation patching: Mechanistic but interventional and computationally expensive.
- Integrated Gradients: Sensitive to out-of-distribution effects; may suffer "attention sink" artifacts.
Jacobian Scopes operate efficiently (O(1) for Semantic/Temperature, O() for Fisher), admit arbitrary projection directions, and are fully model-agnostic. However, they represent only local, first-order linearizations; nonlinear or nonlocal dependencies are not captured. They do not resolve layerwise/headwise mechanisms and are more computationally intensive than forward-only methods for large-scale analyses.
6. Extensions, Open Directions, and Impact
Potential extensions include:
- Spectral projection directions (e.g., top-k logit explanations).
- Scalable low-rank Fisher Scope approximations.
- Compositional attribution by integrating Jacobian Scopes with circuit-tracing or symbolic causal analysis.
These tools have substantial impact, enabling principled bias auditing, debugging of prompt influence, and detailed mechanism analysis in LLMs. A plausible implication is increased adoption for safety-critical model evaluation, interpretability benchmarking, and the development of robust user-facing systems.
The reference implementation and empirical benchmarks are available (Märtens et al., 9 May 2025, Liu et al., 23 Jan 2026), supporting reproducibility and further exploration.