Pathway Activation Subspaces (PAS)
- Pathway Activation Subspaces are mathematically defined subspaces that capture low-rank input activations in neural architectures, enabling precise expert specialization.
- They underpin routing in MoE and LoRA architectures by aligning activation energies with expert pathways to improve model stability and performance.
- PAS methods facilitate rank stabilization, interpretability interventions, and continual learning, reducing forgetting while enhancing multi-task tuning.
A Pathway Activation Subspace (PAS) is a technical construct in neural network analysis that formalizes the set of input directions most responsible for an expert’s low-rank responses within mixture-of-experts (MoE) and low-rank adaptation (LoRA) architectures. The PAS framework is applied to characterize, control, and interpret expert specialization and to design routing and stabilization protocols that enable continual multi-task tuning while mitigating catastrophic forgetting and misaligned drift between routers and experts. PASs also underpin a mathematical and empirical lens on subspace intervention and patching, specifying when an activation subspace is genuinely causally mechanistic or merely an artefact of linear algebraic correlation.
1. Formal Definition and Construction of Pathway Activation Subspaces
Within LoRA-based mechanisms, a frozen linear layer is decomposed via a low-rank update , where , , and rank . For expert , the incremental output on input is . The PAS of expert is defined as , i.e., the row-space of .
The basis elements span . The dimensionality is at most , controlled by the LoRA rank. A PAS thus uniquely characterizes which input features, when projected onto , are directly actionable by the expert’s low-rank pathway. The activation coordinates specify a point in this subspace, and projection recovers the component of aligned with expert 's PAS (Hou et al., 19 Jan 2026).
Subspace activation patching, as explored in interpretability contexts, similarly takes a hypothesized subspace of an activation space and performs interventions by orthogonally replacing the projected component for a source activation into a target via (Makelov et al., 2023).
2. PAS-Guided Routing and Reweighting in MoE-LoRA
PASs enable routing algorithms to operate in the capability-aligned coordinate system defined by each expert’s functional subspace. For an input , the activation energy for expert is . Mixture weights are computed by softmax over these energies: .
Routing decisions are thereby grounded in the actual induced PAS activations, maintaining alignment between router and experts; the LoRA update is computed as . This mechanism prevents the phenomenon termed "misaligned co-drift," where router and experts’ preferences and specialization gradually diverge due to indiscriminate joint updates (Hou et al., 19 Jan 2026).
3. Rank Stabilization and Anti-Forgetting via PASs
To protect against forgetting in continual learning, PAS-aware rank stabilization tracks and regularizes the directions within each expert’s PAS that have been historically important across tasks. Per-task importance for expert , direction at stage is , aggregated over prior tasks as . A quadratic penalty is imposed on large changes for critically activated directions: , normalized from . The overall loss is .
Retaining previous matrices and running sums suffices for storage, and stabilized PAS directions correspond to those with high historical activation, preserving expert specialization and mitigating drift (Hou et al., 19 Jan 2026).
4. Interpretability, Activation Patching, and PASs
PASs connect with mechanistic interpretability paradigms, especially subspace activation patching. The procedure involves hypothesizing a subspace encoding a feature, constructing an orthogonal projector , and patching activations as described. The efficacy of this approach is assessed using metrics such as the fractional logit-difference decrease (FLDD) and interchange accuracy, as presented in Table 1 from (Makelov et al., 2023):
| Intervention | FLDD (%) | Interchange (%) |
|---|---|---|
| full MLP | –8 | 0.0 |
| v_MLP | 46.7 | 4.2 |
| rowspace(v_MLP) | 13.5 | 0.2 |
| nullspace(v_MLP) | 0 | 0.0 |
| full residual | 123.6 | 54.8 |
| v_resid | 140.7 | 74.8 |
| rowspace(v_resid) | 127.5 | 63.1 |
| nullspace(v_resid) | 13.9 | 0.4 |
| v_grad | 111.5 | 45.1 |
| rowspace(v_grad) | 106.5 | 40.6 |
| nullspace(v_grad) | 2.2 | 0.0 |
Patching directions composed of nullspace (causally disconnected) and dormant directions can produce strong but illusory interpretability signals. Removal of the nullspace component eliminates the effect, indicating that genuine PAS interventions require strong mechanistic alignment with the model’s output pathways (Makelov et al., 2023).
5. PASs and the Interpretability Illusion: Mechanistic and Empirical Analysis
A critical finding is that subspace interventions can be deceptive: a patching direction mixing a correlational but disconnected vector with a dormant causal vector may induce the appearance of feature localization without mechanistic faithfulness. In toy models and real tasks (such as indirect object identification and factual recall), patching along such "illusory" subspaces mimics genuine flipping of the output, but does so by activating dormant pathways rather than faithfully tracing the original feature (Makelov et al., 2023).
A dormant subspace is one that is inactive on typical data, yet can steer output if artificially energized. A causally disconnected subspace cannot affect model output for any activation. Notably, the optimal patch in terms of output influence often mixes these two at (equal weights), formalized by (Makelov et al., 2023).
6. PASs, Rank-1 Editing, and Subspace Equivalence
In factual recall, the connection between PASs and rank-1 editing is established via ROME, which modifies model weights by adding a rank-1 update to ensure the desired output on average subject activations. The empirical and theoretical correspondence between 1-D subspace patching and rank-1 edits is quantified by matching output directions and variance budgets: with , showing high cosine similarity and near-identical rewrite scores between patch and weight-editing methods across network layers (Makelov et al., 2023).
7. Evaluating the Faithfulness of PASs: Sanity Checks and Formal Criteria
End-to-end flipping success is insufficient for asserting mechanistic faithfulness of a PAS. The following criteria are integral: (a) strong class-based activation correlation; (b) alignment with output-relevant rowspaces (outside of ); (c) dormancy in the irrelevant portion of the PAS on data distribution; (d) correct positioning within known circuit bottlenecks; (e) generalization to new instances; (f) consistency across patching, ablation, and rank-1 editing interventions. Satisfying these conditions supports a PAS as a genuine mechanistic variable rather than an artefactual correlation (Makelov et al., 2023).
8. Empirical Performance Gains and Impact
On a 7-task multimodal continual learning benchmark (MLLM-CTBench), PASs-based methods outperform conventional baselines in both accuracy (AP) and resistance to forgetting (BWT):
| Method | AP (%) | BWT |
|---|---|---|
| MoELoRA (softmax) | 43.36 | –6.64 |
| PASs-MoE (Ours) | 48.46 | –2.15 |
These results demonstrate the quantitative superiority and parameter efficiency of PASs-guided approaches. Gains include a +5.1 percentage point increase in average performance and substantial reductions in forgetting across domains without additional parameters, confirming that PASs effectuate robust continual learning and stable expert specialization (Hou et al., 19 Jan 2026).
Pathway Activation Subspaces provide both a theoretical and practical foundation for routed expertise in adaptive neural architectures while also raising deep questions of mechanistic interpretability, causal faithfulness, and the distinction between functional subspaces and illusory activation phenomena. The PAS concept thus serves as a focal point for rigorous model analysis and future methodological development.