Pathway Activation Subspaces (PAS)

Updated 26 January 2026

Pathway Activation Subspaces are mathematically defined subspaces that capture low-rank input activations in neural architectures, enabling precise expert specialization.
They underpin routing in MoE and LoRA architectures by aligning activation energies with expert pathways to improve model stability and performance.
PAS methods facilitate rank stabilization, interpretability interventions, and continual learning, reducing forgetting while enhancing multi-task tuning.

A Pathway Activation Subspace (PAS) is a technical construct in neural network analysis that formalizes the set of input directions most responsible for an expert’s low-rank responses within mixture-of-experts (MoE) and low-rank adaptation (LoRA) architectures. The PAS framework is applied to characterize, control, and interpret expert specialization and to design routing and stabilization protocols that enable continual multi-task tuning while mitigating catastrophic forgetting and misaligned drift between routers and experts. PASs also underpin a mathematical and empirical lens on subspace intervention and patching, specifying when an activation subspace is genuinely causally mechanistic or merely an artefact of linear algebraic correlation.

1. Formal Definition and Construction of Pathway Activation Subspaces

Within LoRA-based mechanisms, a frozen linear layer $W \in \mathbb{R}^{d_{\mathrm{out}} \times d_{\mathrm{in}}}$ is decomposed via a low-rank update $\Delta W = B A$ , where $A \in \mathbb{R}^{r \times d_{\mathrm{in}}}$ , $B \in \mathbb{R}^{d_{\mathrm{out}} \times r}$ , and rank $r \ll \min(d_{\mathrm{in}}, d_{\mathrm{out}})$ . For expert $e$ , the incremental output on input $h \in \mathbb{R}^{d_{\mathrm{in}}}$ is $\Delta y_e(h) = B_e (A_e h)$ . The PAS of expert $e$ is defined as $\mathcal{S}_e = \mathrm{span}(A_e^\top) \subset \mathbb{R}^{d_{\mathrm{in}}}$ , i.e., the row-space of $A_e$ .

The basis elements $a_{e,1}^\top, \dots, a_{e,r}^\top$ span $\mathcal{S}_e$ . The dimensionality is at most $r$ , controlled by the LoRA rank. A PAS thus uniquely characterizes which input features, when projected onto $\mathcal{S}_e$ , are directly actionable by the expert’s low-rank pathway. The activation coordinates $z_e(h) = A_e h \in \mathbb{R}^r$ specify a point in this subspace, and projection $P_e(h) = A_e^\top (A_e A_e^\top)^+ A_e h$ recovers the component of $h$ aligned with expert $e$ 's PAS (Hou et al., 19 Jan 2026).

Subspace activation patching, as explored in interpretability contexts, similarly takes a hypothesized subspace $S$ of an activation space and performs interventions by orthogonally replacing the projected component for a source activation $act_A$ into a target $act_B$ via $act_B^{\mathrm{patched}} = (I - P_S) act_B + P_S act_A$ (Makelov et al., 2023).

2. PAS-Guided Routing and Reweighting in MoE-LoRA

PASs enable routing algorithms to operate in the capability-aligned coordinate system defined by each expert’s functional subspace. For an input $h$ , the activation energy for expert $e$ is $s_e(h) = \frac{1}{r} \|A_e h\|_2^2 = \frac{1}{r} \sum_{k=1}^r (a_{e,k}^\top h)^2$ . Mixture weights $\pi_e(h)$ are computed by softmax over these energies: $\pi_e(h) = \frac{\exp(s_e(h))}{\sum_{e'} \exp(s_{e'}(h))}$ .

Routing decisions are thereby grounded in the actual induced PAS activations, maintaining alignment between router and experts; the LoRA update is computed as $\Delta W(h) = \sum_e \pi_e(h) B_e A_e h$ . This mechanism prevents the phenomenon termed "misaligned co-drift," where router and experts’ preferences and specialization gradually diverge due to indiscriminate joint updates (Hou et al., 19 Jan 2026).

3. Rank Stabilization and Anti-Forgetting via PASs

To protect against forgetting in continual learning, PAS-aware rank stabilization tracks and regularizes the directions within each expert’s PAS that have been historically important across tasks. Per-task importance for expert $e$ , direction $k$ at stage $t$ is $I_{e,k}(t) = \mathbb{E}_{h \sim \mathcal{D}_t} [\pi_e(h) (a_{e,k}^\top h)^2]$ , aggregated over prior tasks as $I_{e,k}^{\mathrm{agg}(t-1)} = \sum_{t'=1}^{t-1} I_{e,k}(t')$ . A quadratic penalty is imposed on large changes for critically activated directions: $\mathcal{L}_{\mathrm{stab}} = \sum_{e,k} w_{e,k} \|a_{e,k}^{(t)} - a_{e,k}^{(t-1)}\|_2^2$ , $w_{e,k}$ normalized from $I_{e,k}^{\mathrm{agg}}$ . The overall loss is $\mathcal{L} = \mathcal{L}_{\mathrm{task}} + \lambda \mathcal{L}_{\mathrm{stab}}$ .

Retaining previous $A_e$ matrices and running sums $I_{e,k}^{\mathrm{agg}}$ suffices for storage, and stabilized PAS directions correspond to those with high historical activation, preserving expert specialization and mitigating drift (Hou et al., 19 Jan 2026).

4. Interpretability, Activation Patching, and PASs

PASs connect with mechanistic interpretability paradigms, especially subspace activation patching. The procedure involves hypothesizing a subspace $S$ encoding a feature, constructing an orthogonal projector $P_S$ , and patching activations as described. The efficacy of this approach is assessed using metrics such as the fractional logit-difference decrease (FLDD) and interchange accuracy, as presented in Table 1 from (Makelov et al., 2023):

Intervention	FLDD (%)	Interchange (%)
full MLP	–8	0.0
v_MLP	46.7	4.2
rowspace(v_MLP)	13.5	0.2
nullspace(v_MLP)	0	0.0
full residual	123.6	54.8
v_resid	140.7	74.8
rowspace(v_resid)	127.5	63.1
nullspace(v_resid)	13.9	0.4
v_grad	111.5	45.1
rowspace(v_grad)	106.5	40.6
nullspace(v_grad)	2.2	0.0

Patching directions composed of nullspace (causally disconnected) and dormant directions can produce strong but illusory interpretability signals. Removal of the nullspace component eliminates the effect, indicating that genuine PAS interventions require strong mechanistic alignment with the model’s output pathways (Makelov et al., 2023).

5. PASs and the Interpretability Illusion: Mechanistic and Empirical Analysis

A critical finding is that subspace interventions can be deceptive: a patching direction mixing a correlational but disconnected vector with a dormant causal vector may induce the appearance of feature localization without mechanistic faithfulness. In toy models and real tasks (such as indirect object identification and factual recall), patching along such "illusory" subspaces mimics genuine flipping of the output, but does so by activating dormant pathways rather than faithfully tracing the original feature (Makelov et al., 2023).

A dormant subspace is one that is inactive on typical data, yet can steer output if artificially energized. A causally disconnected subspace cannot affect model output for any activation. Notably, the optimal patch in terms of output influence often mixes these two at $\theta = \pi/4$ (equal weights), formalized by $v = \alpha v_{\mathrm{disconnected}} + \beta v_{\mathrm{dormant}}$ (Makelov et al., 2023).

6. PASs, Rank-1 Editing, and Subspace Equivalence

In factual recall, the connection between PASs and rank-1 editing is established via ROME, which modifies model weights by adding a rank-1 update $W_{\mathrm{out}}' = W_{\mathrm{out}} + ab^\top$ to ensure the desired output on average subject activations. The empirical and theoretical correspondence between 1-D subspace patching and rank-1 edits is quantified by matching output directions and variance budgets: $v = \alpha W_{\mathrm{out}}^+ a + u$ with $u \in \ker W_{\mathrm{out}}$ , showing high cosine similarity and near-identical rewrite scores between patch and weight-editing methods across network layers (Makelov et al., 2023).

7. Evaluating the Faithfulness of PASs: Sanity Checks and Formal Criteria

End-to-end flipping success is insufficient for asserting mechanistic faithfulness of a PAS. The following criteria are integral: (a) strong class-based activation correlation; (b) alignment with output-relevant rowspaces (outside of $\ker W_{\mathrm{out}}$ ); (c) dormancy in the irrelevant portion of the PAS on data distribution; (d) correct positioning within known circuit bottlenecks; (e) generalization to new instances; (f) consistency across patching, ablation, and rank-1 editing interventions. Satisfying these conditions supports a PAS as a genuine mechanistic variable rather than an artefactual correlation (Makelov et al., 2023).

8. Empirical Performance Gains and Impact

On a 7-task multimodal continual learning benchmark (MLLM-CTBench), PASs-based methods outperform conventional baselines in both accuracy (AP) and resistance to forgetting (BWT):

Method	AP (%)	BWT
MoELoRA (softmax)	43.36	–6.64
PASs-MoE (Ours)	48.46	–2.15

These results demonstrate the quantitative superiority and parameter efficiency of PASs-guided approaches. Gains include a +5.1 percentage point increase in average performance and substantial reductions in forgetting across domains without additional parameters, confirming that PASs effectuate robust continual learning and stable expert specialization (Hou et al., 19 Jan 2026).

Pathway Activation Subspaces provide both a theoretical and practical foundation for routed expertise in adaptive neural architectures while also raising deep questions of mechanistic interpretability, causal faithfulness, and the distinction between functional subspaces and illusory activation phenomena. The PAS concept thus serves as a focal point for rigorous model analysis and future methodological development.

Markdown Report Issue Upgrade to Chat

References (2)

PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning (2026)

Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pathway Activation Subspace (PASs).