Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maximum Likelihood Estimation (MLE)

Updated 15 January 2026
  • MLE is a statistical method for parameter estimation that maximizes the likelihood function to fit probabilistic models.
  • MLE offers rigorous asymptotic properties and connections to information theory, ensuring estimator consistency and efficiency.
  • Computational methods like gradient-based optimization and EM algorithms facilitate MLE in complex models including high-dimensional and latent-variable scenarios.

Maximum Likelihood Estimation (MLE) is a fundamental statistical framework for parameter estimation in probabilistic models. It underpins a substantial portion of modern inference methodology, statistical computation, and experimental physics. The approach leverages the likelihood function, identified with the data-generating distribution evaluated at observed data, and selects as estimator the parameter value(s) that maximize this function. MLE provides rigorous connections to information theory, convex analysis, large-sample theory, and computational optimization.

1. Mathematical Foundations and General Framework

Given observed data yy and a probabilistic model parameterized by θΘ\theta\in\Theta, where P(Y=yθ)=f(y;θ)P(Y=y|\theta)=f(y;\theta), the likelihood function is defined as L(θ)=f(y;θ)L(\theta)=f(y;\theta). The log-likelihood, (θ)=lnf(y;θ)\ell(\theta)=\ln f(y;\theta), is often more tractable analytically and computationally, especially for exponential family models (Vella, 2018).

For scalar θR\theta\in\mathbb{R}, the MLE θ^\hat\theta solves

ddθ(θ)=0 , d2dθ2(θ)<0 at θ=θ^.\frac{d}{d\theta}\ell(\theta)=0 ~,~ \frac{d^2}{d\theta^2}\ell(\theta)<0 \text{ at } \theta=\hat\theta.

The score function, U(θ)=θ(θ)U(\theta)=\frac{\partial}{\partial\theta}\ell(\theta), forms the basis of the likelihood equations; the solution where U(θ)=0U(\theta)=0 yields the MLE. In the multivariate setting, the score vector and Hessian generalize to

U(θ)=θ(θ) , H(θ)=2(θ)θθ ⁣T.U(\theta)=\nabla_\theta\ell(\theta)~,~ H(\theta)=\frac{\partial^2\ell(\theta)}{\partial\theta\,\partial\theta^{\!T}}.

A maximum is achieved if H(θ^)H(\hat\theta) is negative definite.

2. Existence, Uniqueness, and Geometric Characterization

MLE existence and uniqueness depend on model structure, sample configuration, and geometric constraints. For exponential families, particularly log-linear models and Gaussian graphical models, the sufficient statistic tt must lie in the relative interior of a convex set (the marginal cone CAC_A or the cone of sufficient statistics CGC_G). Precisely, triCAt\in\operatorname{ri}C_A is necessary and sufficient for existence; otherwise, the likelihood is maximized on the boundary, estimability is restricted, and extended MLE constructs emerge (Fienberg et al., 2011, Uhler, 2010).

For log-concave MTP2_2 and LLC density classes, MLE existence is ensured with high probability when n3n \geq 3 (MTP2_2) or n2n\geq 2 (LLC), regardless of ambient dimension, due to boundedness imposed by shape constraints and the closure properties of convex hulls or L#^\#-convex hulls (Robeva et al., 2018). In discrete algebraic models, solvability can be attacked via dual varieties and conormal varieties, where the MLE computation is recast as an intersection problem in projective geometry, often decreasing algebraic complexity relative to the primal system (Rodriguez, 2014).

3. Statistical Efficiency, Fisher Information, and Asymptotics

The Fisher information matrix,

I(θ)=E[θ(θ)  θ(θ)]=E[H(θ)],I(\theta)=\mathbb{E}\bigl[\nabla_\theta\ell(\theta)\;\nabla_\theta\ell(\theta)^\top\bigr] =-\mathbb{E}\bigl[H(\theta)\bigr],

encapsulates local curvature and determines estimator variance. The Cramér–Rao bound asserts that for any unbiased estimator θ~\tilde\theta,

Cov(θ~)I(θ)1.\operatorname{Cov}(\tilde\theta)\succeq I(\theta)^{-1}.

Under regularity conditions, the MLE is asymptotically normal and achieves this lower bound. In continuous-time Wishart processes, the asymptotic distribution of the MLE depends on ergodicity: for symmetric bb with b0-b\succ0 and α>d+1\alpha>d+1, T(b^Tb,α^Tα)\sqrt{T}(\widehat b_T-b,\,\widehat\alpha_T-\alpha) converges in distribution to a joint Gaussian law, establishing efficiency via local asymptotic normality (LAN) (Alfonsi et al., 2015). Nonergodic regimes yield altered convergence rates and, in limiting cases, non-Gaussian limits.

Generalized closed-form MLE produces estimators θ^n\hat\theta_n that, under reasonable conditions, are consistent and asymptotically normal, with limiting covariance governed by the sensitivity matrix J(θ0)J(\theta_0) and score variance K(θ0)K(\theta_0):

n(θ^nθ0)N(0,J(θ0)1K(θ0)[J(θ0)1]),\sqrt{n}(\hat\theta_n-\theta_0) \to N(0,\,J(\theta_0)^{-1}K(\theta_0)[J(\theta_0)^{-1}]^\top),

recovering classical Fisher information results when the model is an ordinary exponential family (Ramos et al., 2021).

4. Computational Methods and Algorithmic Innovations

Many likelihood functions do not yield closed-form MLEs, especially in high dimensions or with latent-variable structure. Classical gradient- and Hessian-based maximization is standard, but tailored methods address additional structure:

  • The Frank–Wolfe (conditional gradient) algorithm efficiently computes nonparametric MLEs under MTP2_2/LLC constraints via tent-function parameterizations, solving a finite-dimensional properly-convex program (Robeva et al., 2018).
  • The Equilibrium Expectation (EE) algorithm for exponential random graph models (ERGMs) utilizes properties of Markov chains at equilibrium and on-the-fly parameter adaptation, avoiding repeated full MCMC sampling. This significantly accelerates MLE computation for very large network data compared to MCMC-based moment approximation (Byshkin et al., 2018).
  • For models with latent variables, inequalities involving truncated likelihood-ratio integrals with respect to varying posterior measures allow monotonic likelihood improvement without requiring closed-form E/M-steps, providing a more general monotonicity guarantee than standard EM (Olsen, 2019).
  • In log-linear and contingency table models with incomplete (censored) data, EM algorithms allocate latent or censored counts proportionally to the current parameter estimate, updating iteratively until the likelihood equations are satisfied (Markov, 2011).

5. Special Models and Application Domains

MLE is central in a wide variety of contexts. In optical measurements, maximizing the likelihood over multinomial photon count statistics on a pixel array directly translates physical photon distributions into parameter estimates, with the Fisher information matrix quantifying sensitivity and guiding experimental design; extreme parameter sensitivity is achieved in quantum weak measurements and off-null ellipsometry via iterative re-centering combined with MLE (Vella, 2018).

In high-dimensional structured distributions, such as those under total positivity (MTP2_2) and log-concavity, MLE is feasible and well-posed with very limited samples when shape constraints are leveraged (Robeva et al., 2018). In Gaussian graphical models, algebraic elimination techniques yield exact sample-size thresholds for generic existence of the MLE, with the ML degree measuring computational algebraic complexity in sparse regimes (Uhler, 2010). In dose-response models such as Emax, analytic characterization of MLE existence, Firth-type bias corrections, and design augmentation protocols address scenarios where standard maximum-likelihood theory breaks down (Aletti et al., 2023).

For stochastic processes, such as matrix-valued Wishart processes in finance, MLE theory delivers rate-optimal, asymptotically normal estimators even in complex time-varying and nonergodic settings, with extended Laplace transform results providing new computational tools (Alfonsi et al., 2015). Markov-modulated jump-diffusion models, increasingly important in financial econometrics, employ blockwise EM algorithms combining mixture, stochastic, and threshold-based steps for regime, jump, and diffusion parameter estimation from incomplete data (Eslava et al., 2022).

6. Extensions, Generalizations, and Open Directions

Generalizations of MLE encompass generalized closed-form approaches, permitting real-time estimation in nonstandard models with low computational burden, and procedures for models with interval-censored or incomplete data via observed-range or blockwise EM algorithms (Ramos et al., 2021, Markov, 2011). Algebraic-geometric reformulations via dual varieties provide efficient computational strategies and deeper understanding of MLE solution landscapes (Rodriguez, 2014).

Several open questions persist:

  • Precise combinatorial descriptions of sample-size thresholds for MLE existence in graphical models.
  • Full characterization of the complexity (ML degree) and algorithmic techniques for sparse or constrained discrete or continuous models.
  • Design of general proposal schemes for monotonic likelihood-improving updates in latent-variable models that surpass standard EM (Olsen, 2019).
  • Further understanding of estimator bias and regularity for small-sample, high-dimensional, and boundary cases, especially under complex experimental designs (Aletti et al., 2023).

MLE thus remains a central paradigm, continuously extended and refined to accommodate high-dimensionality, model structure, computational constraints, and domain-specific measurement processes across statistical science (Vella, 2018, Robeva et al., 2018, Alfonsi et al., 2015, Byshkin et al., 2018, Fienberg et al., 2011, Uhler, 2010, Ramos et al., 2021, Aletti et al., 2023, Olsen, 2019, Eslava et al., 2022, Rodriguez, 2014, Markov, 2011).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maximum Likelihood Estimation (MLE).