Papers
Topics
Authors
Recent
Search
2000 character limit reached

Jacobian Scopes: Token-Level Causal Attribution

Updated 30 January 2026
  • Jacobian Scopes are a framework for token-level causal attribution, quantifying input influence using first-order gradients.
  • The method uses semantic, Fisher, and Temperature variants to accurately measure contributions to various output metrics in neural models.
  • It leverages automatic differentiation for efficient computation, balancing precision and cost in large-scale interpretability tasks.

Jacobian Scopes constitute a rigorous framework for token-level causal attribution in neural models, particularly LLMs. The concept originates in modern interpretability research, where the objective is to quantify the degree to which each input token influences specific model outputs. Jacobian Scopes apply first-order differentiation of hidden states with respect to inputs to produce local, model-blind influence assignments. The methodology generalizes to a variety of output quantities—including specific logits, the probability distribution, and model confidence—through gradient projection. This article provides an authoritative, comprehensive synthesis of the state-of-the-art in Jacobian Scopes, spanning formal definitions, algorithmic methodology, empirical applications, and computational trade-offs (Liu et al., 23 Jan 2026).

1. Formal Definition and Preliminaries

Jacobian Scopes measure the sensitivity of a model's output, via its post-norm final hidden state yy to infinitesimal perturbations in each input token embedding xtRdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}. For an autoregressive LLM with output logits z=Wyz = W y, vocabulary VV, and final-layer hidden representation y=hL(x1:T)y = h_L(x_{1:T}), the token-wise Jacobian is

Jt:=yxtRdmodel×dmodel,J_t := \frac{\partial y}{\partial x_t} \in \mathbb{R}^{d_\mathrm{model} \times d_\mathrm{model}},

where tt indexes input tokens.

A projection vector vRdmodelv \in \mathbb{R}^{d_\mathrm{model}} encodes the output quantity of interest (e.g., a specific logit, model confidence, or full distribution geometry). The influence score for token tt is then

Influencet:=vJt2,\mathrm{Influence}_t := \| v^\top J_t \|_2,

interpreted as the maximal first-order change in vyv^\top y induced by a unit-norm perturbation to xtx_t.

This formulation is compelling because it provides exact, local attributions independently of internal model decompositions such as attention heads or circuit pathways.

2. Variants of Jacobian Scopes: Semantic, Fisher, and Temperature

Jacobian Scopes are instantiated via three principal variants, each differing in the choice of projection direction vv and the functional attributed.

Semantic Scope: Attributes influence to a specific target token τ\tau by selecting v=wτv = w_\tau, the corresponding row of WW. Thus, vy=zτv^\top y = z_\tau, and Influencetsem=wτJt2\mathrm{Influence}_t^\mathrm{sem} = \| w_\tau^\top J_t \|_2. This variant isolates causal contributions to a single logit, requiring only one backward pass.

Fisher Scope: Attributes influence to the entire predictive distribution. The Fisher information geometry Fu=W[diag(p)pp]WF_u = W^\top [\operatorname{diag}(p) - p p^\top] W is pulled back through JtJ_t as Ft=JtFuJtF_t = J_t^\top F_u J_t, and influence is measured by Influencetfish=tr(Ft)\mathrm{Influence}_t^\mathrm{fish} = \operatorname{tr}(F_t). Accurately computing FtF_t requires O(dmodel)O(d_\mathrm{model}) backward passes; efficient approximations are possible for scalability.

Temperature Scope: Attributes influence to model confidence, defined via inverse effective temperature βeff=y2\beta_\mathrm{eff} = \|y\|_2 and normalized direction h^=y/y2\hat{h} = y/\|y\|_2. Then v=h^v = \hat{h} and Influencettemp=h^Jt2\mathrm{Influence}_t^\mathrm{temp} = \| \hat{h}^\top J_t \|_2. This variant approximates distribution-wide attribution at the computational cost of a single backward pass.

All variants are agnostic to architectural details, relying on global differentiability from outputs to inputs.

3. Algorithmic Procedures and Computational Complexity

Jacobian Scopes are implemented via automatic differentiation, typically in frameworks supporting full gradient backpropagation. The procedural template is as follows:

  • For Semantic and Temperature Scopes, compute scalar loss (e.g., Lsem=zτL_\mathrm{sem} = z_\tau, Ltemp=y2L_\mathrm{temp} = \|y\|_2), then backpropagate to L/xt\partial L / \partial x_t for all tt in a single pass.
  • For Fisher Scope, backpropagate for each orthogonal output direction to assemble the full JtJ_t tensor, then contract with FuF_u to obtain tr(Ft)\operatorname{tr}(F_t) per token. This scales linearly with dmodeld_\mathrm{model}.

The optimization of this matrix chain product computation, particularly in programs consisting of many sequential differentiable subprograms, draws on the literature of scheduled Jacobian chaining. The dynamic programming heuristic elegantly balances serial and parallel computation to nearly match optimal schedules, with median cost ratios 0.94\gtrsim 0.94 for realistic chain lengths and machine counts (Märtens et al., 9 May 2025).

4. Applications in LLMs: Attribution, Bias Detection, and Translation

Jacobian Scopes have been empirically validated in multiple contexts:

  • Instruction Understanding: Attribution scores for model predictions such as "truthful" or "deceitful" in prompts sharply peak on pivotal tokens ("deceive," "argue").
  • Political Bias: Scopes identify context tokens ("Columbia," "the South") as dominant drivers of logit predictions for "liberal" and "conservative," exposing training-set induced model biases.
  • Machine Translation: Fisher Scopes yield token-to-token alignments, while Temperature Scopes reveal phrase-level regions of influence, consistent across semantic divergences.
  • In-Context Learning (ICL): Applied to time-series forecasting, Scopes reveal the tendency of LLMs to attend to near-cutoff motifs or shift focus downstream for stochastic processes (Brownian motion).

These findings underscore the analytic power of token-level gradient-based attribution and its role in mechanistic interpretability.

5. Methodological Comparison: Advantages and Limitations

Jacobian Scopes contrast with prior interpretability techniques as follows:

  • Attention weights: Not reliably causal; reflect distribution of computation, not influence.
  • Activation patching: Mechanistic but interventional and computationally expensive.
  • Integrated Gradients: Sensitive to out-of-distribution effects; may suffer "attention sink" artifacts.

Jacobian Scopes operate efficiently (O(1) for Semantic/Temperature, O(dmodeld_\mathrm{model}) for Fisher), admit arbitrary projection directions, and are fully model-agnostic. However, they represent only local, first-order linearizations; nonlinear or nonlocal dependencies are not captured. They do not resolve layerwise/headwise mechanisms and are more computationally intensive than forward-only methods for large-scale analyses.

6. Extensions, Open Directions, and Impact

Potential extensions include:

  • Spectral projection directions (e.g., top-k logit explanations).
  • Scalable low-rank Fisher Scope approximations.
  • Compositional attribution by integrating Jacobian Scopes with circuit-tracing or symbolic causal analysis.

These tools have substantial impact, enabling principled bias auditing, debugging of prompt influence, and detailed mechanism analysis in LLMs. A plausible implication is increased adoption for safety-critical model evaluation, interpretability benchmarking, and the development of robust user-facing systems.

The reference implementation and empirical benchmarks are available (Märtens et al., 9 May 2025, Liu et al., 23 Jan 2026), supporting reproducibility and further exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Jacobian Scopes.