Fisher-Aligned Subspace Compression

Updated 19 January 2026

Fisher-Aligned Subspace Compression (FASC) is a method for reducing high-dimensional data by projecting onto subspaces that retain key Fisher information for inference and learning.
It employs both linear and non-linear techniques, including eigen-decomposition and score-based methods, to identify parameter-sensitive directions while minimizing information loss.
FASC has broad applications from cosmic shear analysis to large language model optimization, achieving significant dimensionality reduction with minimal compromise on performance.

Fisher-Aligned Subspace Compression (FASC) is a family of methodologies for compressing high-dimensional data or model activations by projecting onto subspaces that optimally preserve Fisher information with respect to a target statistical or learning task. FASC leverages the Fisher Information Matrix (FIM) to identify parameter- or loss-sensitive directions, yielding compressions that retain sufficiency for inference or knowledge-critical computations even at aggressive dimensionality reduction rates. This concept unifies approaches across statistical data analysis, machine learning, and recently, LLM architecture compression, enabling efficient and information-theoretically justified dimension reduction.

1. Foundational Definition and Theoretical Properties

FASC constructs a linear or nonlinear map from the full observation space $\mathbb{R}^N$ to a lower-dimensional representation $\mathbb{R}^P$ (typically $P \ll N$ ), such that the reduced statistics retain the Fisher information relevant for parameter inference or loss minimization at a specified operating point. Formally, given an observation $x\in\mathbb{R}^N$ , model parameters $\theta\in\mathbb{R}^P$ , and a likelihood $p(x|\theta)$ , the Fisher-aligned summary is defined via the score function: $S(x,\theta)\equiv\nabla_\theta \ln p(x|\theta)$ At a fiducial $\theta_*$ , the Fisher information is $F(\theta_*) = \mathrm{Cov}_{x|\theta_*}[S(x,\theta_*)]$ . Compression to the score function is optimal in the information-theoretic sense: no other $P$ -dimensional function $\mathbb{R}^P$ 0 achieves lower variance in estimating $\mathbb{R}^P$ 1, and all Fisher information available in the data is preserved under regularity conditions (interchangeability of derivative and expectation, differentiability of $\mathbb{R}^P$ 2) (Alsing et al., 2017). The same underlying principle governs FASC for likelihood-based tasks, empirical loss minimization, and discriminant subspace construction in unsupervised or supervised learning.

2. Mathematical Formalism and Algorithmic Derivation

FASC for continuous data vectors under local-linear (first-order) approximations has well-developed theory:

Linearization: For observable mean $\mathbb{R}^P$ 3 near $\mathbb{R}^P$ 4, $\mathbb{R}^P$ 5, with Jacobian $\mathbb{R}^P$ 6.
Fisher matrix: $\mathbb{R}^P$ 7 using data covariance $\mathbb{R}^P$ 8.
Eigen-decomposition: Find eigenvectors $\mathbb{R}^P$ 9 of $P \ll N$ 0, and compute compression vectors $P \ll N$ 1.
Compressed statistics: $P \ll N$ 2. The Fisher matrix in $P \ll N$ 3 is diagonal, with entries $P \ll N$ 4. Retaining top $P \ll N$ 5 modes maximizes information retention (Asgari et al., 2014).

In non-Gaussian or nonlinear settings (e.g., LLM activations), FASC relies on higher-order Taylor expansions of the loss or log-likelihood: $P \ll N$ 6 with $P \ll N$ 7 replaced by the empirical Fisher $P \ll N$ 8, $P \ll N$ 9. The optimal $x\in\mathbb{R}^N$ 0-dimensional projection $x\in\mathbb{R}^N$ 1 then solves the generalized eigenproblem: $x\in\mathbb{R}^N$ 2 where $x\in\mathbb{R}^N$ 3 are activation, gradient, and cross-covariance matrices, respectively (Shihab et al., 12 Jan 2026).

3. Connections to Information Geometry and Optimal Compression

FASC generalizes and unifies well-known data compression paradigms:

Linear case ( $x\in\mathbb{R}^N$ 4 dependent, fixed $x\in\mathbb{R}^N$ 5): Heavens-Tegmark “moped” linear compression; optimal for signals with parameter-dependent mean.
Quadratic case (fixed $x\in\mathbb{R}^N$ 6, parameter-dependent $x\in\mathbb{R}^N$ 7): Karhunen-Loève transform/power spectrum estimation; optimal for parameter-sensitive variance.
Nonlinear expansions: Sufficient statistics include derivatives of arbitrary order, recoverable by score-based regression or likelihood-ratio estimation (Alsing et al., 2017).

These constructions frame FASC as a practical realization of the information geometry principle—projecting onto the subspace in which the FIM is maximally preserved.

4. Applications: From Astronomy to LLMs

Early deployments of FASC targeted astronomical and cosmological data reduction:

Cosmic shear analysis: Compress shear two-point COSEBI statistics with over 360 elements down to 7–35 sufficient compressed summaries (matching the number of model parameters), recovering the Fisher figure-of-merit to within $x\in\mathbb{R}^N$ 8 of the full data and reducing parameter errors negligibly (Asgari et al., 2014).
Generalized optimal compression: Alsing & Wandelt demonstrated that, in principle, for any $x\in\mathbb{R}^N$ 9-dimensional data vector, $\theta\in\mathbb{R}^P$ 0 compression is possible (for $\theta\in\mathbb{R}^P$ 1 parameters of interest) without Fisher information loss (Alsing et al., 2017).

More recently, FASC has been adapted to LLM post-training compression, addressing knowledge retention in resource-constrained deployments:

Activation compression: Standard SVD retains high-variance activation modes, which often miss knowledge-bearing subspaces aligned with the gradient of model loss. FASC selects subspaces by minimizing a second-order surrogate of the loss—effectively aligning subspace selection with directions of high loss curvature (empirical Fisher metric), retaining factual knowledge performance at aggressive rank reductions (Shihab et al., 12 Jan 2026).

5. Diagnostic Metrics and Layer-Wise Subspace Selection

FASC has prompted the development of new diagnostic metrics:

Dependence Violation Score ( $\theta\in\mathbb{R}^P$ 2): Quantifies cross-covariance between layer activations and gradients. Layers with high $\theta\in\mathbb{R}^P$ 3 display strong activation–gradient coupling and benefit most from Fisher-aligned compression. Practically, $\theta\in\mathbb{R}^P$ 4 is an effective empirical threshold for applying FASC instead of SVD in transformer models (Shihab et al., 12 Jan 2026).
Subspace divergence: Principal angle analysis demonstrates that subspaces selected by FASC and SVD diverge only in high- $\theta\in\mathbb{R}^P$ 5 layers, substantiating the loss-sensitivity criterion.

The algorithmic workflow includes calibration data collection, computation of empirical covariances, gating by $\theta\in\mathbb{R}^P$ 6, and random-projection–based acceleration for scalability in wide layers.

6. Limitations, Failure Modes, and Practical Recommendations

Limitations of FASC are rooted in its reliance on local approximations and regularity:

Tayloring validity: If the statistical or loss landscape is strongly non-Gaussian or parameter-covariances vary sharply, first-order FASC can be suboptimal and higher-order expansion may be necessary (Asgari et al., 2014, Alsing et al., 2017).
Covariance estimation: In high-dimensional inference, inverting ill-conditioned $\theta\in\mathbb{R}^P$ 7 matrices requires $\theta\in\mathbb{R}^P$ 8 mocks for stability. FASC mitigates by compressing to $\theta\in\mathbb{R}^P$ 9 but inherits errors if covariance is poorly estimated.
Domain shift and layer selection: In LLMs, early or late layers with noise-like activations may yield misleading $p(x|\theta)$ 0; FASC gating and hybrid pipelines with standard SVD are recommended (Shihab et al., 12 Jan 2026).

Robustness checks should perturb the fiducial point and covariance, and iterative FASC may be deployed to refine around the MAP or MLE.

7. Empirical Performance and Outlook

Empirical evidence substantiates FASC's efficacy:

In cosmic shear, FASC achieves a reduction in data dimension by an order of magnitude, with negligible loss in parameter constraints (Asgari et al., 2014).
For LLMs, at $p(x|\theta)$ 1 rank reduction, FASC yields 6–8 percentage point higher accuracy on knowledge-heavy tasks (MMLU, LAMA), matching the factual recall of uncompressed models nearly twice the size, with minimal increase in computational overhead (Shihab et al., 12 Jan 2026).
Cross-architecture robustness is demonstrated, and the $p(x|\theta)$ 2 metric provides a fundamental diagnostic of knowledge storage and compression utility.

Future research directions include hybridization with quantization and pruning, domain-adaptive calibration, nonlinear extension to address highly non-Gaussian targets, and expert-specific Fisher compression in mixture-of-expert architectures (Shihab et al., 12 Jan 2026).

Domain/Application	Data Dim.	Compression Factor	Fisher Loss	Citation
Cosmic Shear (COSEBIs)	$p(x\|\theta)$ 3	%%%%51 $\mathbb{R}^P$ 152%%%%	$p(x\|\theta)$ 6	(Asgari et al., 2014)
LLM Activation (Mistral-7B)	$p(x\|\theta)$ 7K	$p(x\|\theta)$ 8	$p(x\|\theta)$ 9pp acc.	(Shihab et al., 12 Jan 2026)

FASC thus provides a principled framework for subspace selection that maximally retains the information content pertinent to the problem of interest, grounding dimensionality reduction in the geometry of parameter sensitivity across scientific and machine learning applications.

Markdown Report Issue Upgrade to Chat

References (3)

Generalized massive optimal data compression (2017)

A New Data Compression Method and its Application to Cosmic Shear Analysis (2014)

Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fisher-Aligned Subspace Compression (FASC).

Fisher-Aligned Subspace Compression

1. Foundational Definition and Theoretical Properties

2. Mathematical Formalism and Algorithmic Derivation

3. Connections to Information Geometry and Optimal Compression

4. Applications: From Astronomy to LLMs

5. Diagnostic Metrics and Layer-Wise Subspace Selection

6. Limitations, Failure Modes, and Practical Recommendations

7. Empirical Performance and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Fisher-Aligned Subspace Compression

1. Foundational Definition and Theoretical Properties

2. Mathematical Formalism and Algorithmic Derivation

3. Connections to Information Geometry and Optimal Compression

4. Applications: From Astronomy to LLMs

5. Diagnostic Metrics and Layer-Wise Subspace Selection

6. Limitations, Failure Modes, and Practical Recommendations

7. Empirical Performance and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research