Fisher Information Matrix Scores

Updated 17 November 2025

Fisher Information Matrix (FIM) scores are quantitative measures derived from the local curvature of the likelihood, guiding uncertainty assessment in parameter estimation.
They distinguish between observed information (sample-based Hessian) and expected information (average over the data distribution), affecting coverage properties.
The expected FIM is shown to yield more accurate mean-squared coverage error for confidence intervals compared to the observed FIM under regular conditions.

The Fisher Information Matrix (FIM) is a central object in asymptotic theory and interval estimation for parametric models, providing local curvature information about the likelihood and thus quantifying parameter uncertainty. "FIM scores" typically refer both to the quantitative entries of the FIM or its inverse—and, by extension, to the resulting measures of statistical precision or confidence in inference tasks. Two primary forms of the Fisher information matrix arise in practice: the observed FIM, computed directly from the sample via the Hessian of the log-likelihood at the maximum-likelihood estimate (MLE), and the expected FIM, computed as the expectation over the data-generating distribution of the observed information. A long-standing question in theory and practice concerns which form yields more accurate coverage properties for confidence regions or intervals constructed via asymptotic normality of the MLE.

1. Formal Definitions: Observed and Expected FIM

Let $X = (X_1, \dots, X_n)$ be independent random variables with joint likelihood $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ and log-likelihood $\ell(\theta; X) = \log L(\theta; X)$ . The multivariate parameter is $\theta \in \mathbb{R}^p$ .

The two principal forms of the Fisher information matrix are:

Observed information at $\theta$ :

$I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$

Expected information at $\theta$ :

$I_{\mathrm{exp}}(\theta) = E_X \left[ I_{\mathrm{obs}}(\theta; X) \right]$

At the MLE $\hat{\theta}_n$ , one typically uses

$\bar{H}_n(\hat{\theta}_n) = n^{-1} \nabla^2_\theta \left[ -\ell(\hat{\theta}_n; X) \right] \quad \text{(observed)}$

and its expectation, $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 0, for the expected information.

The inverse of either matrix provides the standard asymptotic (plug-in) estimator for the covariance matrix of the MLE.

2. Asymptotic Normality and Confidence Region Construction

Classical large-sample theory, under standard regularity conditions (smoothness of likelihood, identifiability, and interchange of differentiation and expectation), yields the following expansion for the score at the MLE:

$L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 1

for some $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 2 between $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 3 and the true parameter $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 4. Rearrangement and scaling by $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 5 gives

$L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 6

By the central limit theorem and consistency, as $L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 7:

$L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 8

$L(\theta; X) = \prod_{i=1}^n p_i(X_i; \theta)$ 9

So $\ell(\theta; X) = \log L(\theta; X)$ 0.

Approximate confidence intervals for a scalar parameter $\ell(\theta; X) = \log L(\theta; X)$ 1 (the $\ell(\theta; X) = \log L(\theta; X)$ 2th component of $\ell(\theta; X) = \log L(\theta; X)$ 3) take the form:

$\ell(\theta; X) = \log L(\theta; X)$ 4

The variance term $\ell(\theta; X) = \log L(\theta; X)$ 5 is typically substituted by the $\ell(\theta; X) = \log L(\theta; X)$ 6 entry of either $\ell(\theta; X) = \log L(\theta; X)$ 7 or $\ell(\theta; X) = \log L(\theta; X)$ 8 at $\ell(\theta; X) = \log L(\theta; X)$ 9.

3. MSE Criterion for FIM-Based Interval Coverage

Rather than comparing confidence interval lengths, Jiang & Spall focus on actual coverage probability—how closely the constructed interval attains the nominal confidence level, on average. For component $\theta \in \mathbb{R}^p$ 0, let $\theta \in \mathbb{R}^p$ 1 denote the true (unknown) asymptotic covariance, and use

$\theta \in \mathbb{R}^p$ 2

to denote the realized coverage when $\theta \in \mathbb{R}^p$ 3 is the plug-in variance estimate (either $\theta \in \mathbb{R}^p$ 4 or $\theta \in \mathbb{R}^p$ 5).

Define the mean-squared error (MSE) of coverage error for each estimator: $\theta \in \mathbb{R}^p$ 6 A smaller MSE indicates that the approximate interval more closely (in the mean-squared sense) attains its nominal coverage on average.

4. Main Theorem: Superiority of the Expected FIM in Coverage MSE

Theorem (Jiang & Spall):

Under the paper's regularity assumptions (existence and boundedness of derivatives, LLN and CLT for i.n.i.d. data, etc.),

$\theta \in \mathbb{R}^p$ 7

If $\theta \in \mathbb{R}^p$ 8 and $\theta \in \mathbb{R}^p$ 9 differ nontrivially in the limit, this inequality is strict.

Interpretation:

Asymptotically, confidence intervals constructed with $\theta$ 0 never have larger mean-squared coverage error than those constructed with $\theta$ 1, and typically perform strictly better component-wise.

Proof Sketch

MLE Error Expansion:

$\theta$ 2

where $\theta$ 3 are standardized normal scores and $\theta$ 4 are entries of $\theta$ 5.

Inverse Matrix Expansions:

$\theta$ 6

$\theta$ 7

with $\theta$ 8, zero mean, and controlled variance.

Taylor Expansion of the Coverage Function:

$\theta$ 9

Combine Terms to Compare Mean-Squared Errors:

The difference in MSEs is driven by the variance of $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 0 (vanishing in expectation for $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 1, but present for $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 2).

5. Practical Implications and Application Guidelines

Interval Estimation:

For constructing confidence intervals or regions based on the asymptotic normality of the MLE, the user must choose to "score" parameter uncertainty with either the observed Hessian or the expected Fisher information.

Empirical Recommendation:

Jiang & Spall's result establishes that, under regularity, using the expected FIM never worsens and generally improves the MSE of coverage. Whenever a closed-form or accurate numerical estimate of $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 3 is available, its use in the construction of confidence intervals or regions is strongly justified.

Exceptions and Variations:

In scalar cases with available ancillary statistics (Efron & Hinkley), observed information conditional on those statistics can sometimes yield even better coverage. However, for multivariate and i.n.i.d. settings, the expected FIM is at least as good as, and often strictly better than, the observed FIM for mean-squared coverage error.

Moderate and Finite Sample Sizes:

While the superiority of $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 4 manifests asymptotically as $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 5 increases, the observed FIM may fluctuate more in small samples due to data-induced noise. When $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 6 can be computed efficiently, it is typically preferred in practice for "scoring" uncertainty.

6. Broader Context and Impact

The findings of Jiang & Spall directly challenge a persistent heuristic preference for the observed FIM in finite-sample estimation without conditional ancillarity. The result is robust across multivariate, non-i.i.d. scenarios and is not limited to specific model classes. In application domains—such as precision forecasting, uncertainty quantification, and experimental design—adopting the expected FIM for uncertainty "scoring" in MLE-based inference yields intervals whose empirical coverage is at least as accurate as, and often more conservative/accurate than, those from observed-FIM-based procedures.

Concisely: In all regular settings where the plug-in asymptotic normal confidence region is used to quantify parameter uncertainty, the expected FIM should be used in place of the observed FIM—inverting $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 7 at $I_{\mathrm{obs}}(\theta; X) = -\nabla_\theta^2 \ell(\theta; X)$ 8 yields, per parameter, intervals that realize coverage probabilities no worse, and generally better, than their observed-FIM-based counterparts in mean-squared error. This conclusion now stands on rigorous asymptotic and componentwise grounds (Jiang, 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Relative Performance of Fisher Information in Interval Estimation (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fisher Information Matrix (FIM) Scores.