Papers
Topics
Authors
Recent
Search
2000 character limit reached

Online Monotone Density Estimation

Updated 10 February 2026
  • Online monotone density estimation is a framework for sequentially predicting a nonincreasing probability density on [0,1] using log-loss performance metrics.
  • The online Grenander estimator adapts the classical isotonic MLE to a sequential setting, achieving an O(n^(1/3)) excess KL risk under well-specified i.i.d. conditions.
  • The expert aggregation approach discretizes the density space and applies exponential weighting to secure an O(√(n log n)) adversarial regret bound while quickly adapting to change-points.

Online monotone density estimation addresses the sequential prediction of an unknown probability density that is known a priori to be monotone nonincreasing on [0,1][0,1]. At each time tt, the estimator observes a real-valued data stream x1,x2,x_1, x_2, \ldots and outputs a measurable function f^t\hat{f}_t of past data, so that for every tt, f^t\hat{f}_t estimates the underlying density qDq \in \mathcal{D}, where

D={f:[0,1][0,)01f(u)du=1,f nonincreasing }.\mathcal{D} = \left\{ f : [0,1] \to [0, \infty) \,\Bigg|\, \int_0^1 f(u)\,du = 1,\, f\ \text{nonincreasing}\ \right\}.

Performance is measured by sequential log-loss t(f)=logf(Xt)\ell_t(f) = -\log f(X_t) and its cumulative version L(f^,n)=t=1nt(f^t)=t=1nlogf^t(Xt)L(\hat{f}, n) = \sum_{t=1}^n \ell_t(\hat{f}_t) = -\sum_{t=1}^n \log \hat{f}_t(X_t). The theoretical framework considers two benchmarks: the well-specified stochastic setting (oracle risk under i.i.d. draws from qq) and an adversarial setting (pathwise regret compared to the best hindsight monotone density with amplitude bounds).

1. Formal Problem Statement

The data consists of sequentially observed values X1,X2,X_1, X_2, \ldots assumed to be in [0,1][0,1] (by linear rescaling). At each tt, the algorithm outputs an estimator f^t\hat{f}_t based only on X1,,Xt1X_1,\ldots,X_{t-1}. The performance of a sequence of estimators {f^t}\{\hat{f}_t\} is evaluated in two principal regimes:

  • Stochastic benchmark: X1,,XnX_1,\ldots,X_n are i.i.d. from some qDq\in\mathcal{D}. The cumulative excess Kullback-Leibler risk is

RiskQ(f^;n)=EQ[L(f^,n)]EQ[L(q,n)]=t=1nEQ[KL(qf^t)].\mathrm{Risk}_Q(\hat{f}; n) = \mathbb{E}_Q\left[L(\hat{f}, n)\right] - \mathbb{E}_Q\left[L(q, n)\right] = \sum_{t=1}^n \mathbb{E}_Q [\mathrm{KL}(q \| \hat{f}_t)].

  • Adversarial benchmark (pathwise regret): For arbitrary (well-spaced) sequences, regret is measured against the best monotone density in hindsight within amplitude bounds afb:a\leq f\leq b:

Regret(f^;n,a,b)=L(f^,n)minfDa,bL(f,n),\mathrm{Regret}(\hat{f}; n, a, b) = L(\hat{f}, n) - \min_{f\in \mathcal{D}_{a,b}} L(f, n),

where Da,b={fD:afb}\mathcal{D}_{a,b} = \{f\in\mathcal{D}: a\leq f \leq b\}.

The adversarial regime requires minimal regularity on the input: points are well-spaced (no two closer than nβn^{-\beta}), and not close to the upper boundary (1nγ\leq 1 - n^{-\gamma}).

2. The Online Grenander (OG) Estimator

The classical Grenander estimator provides the offline (batch) maximum likelihood estimator (MLE) over D\mathcal{D}, resulting in a piecewise-constant histogram supported at the order statistics. The online analogue (OG) operates as follows at each tt:

  • Input: Amplitude bounds 0ab0 \leq a \leq b \leq \infty; initialize f^11\hat{f}_1 \equiv 1.
  • For t=2,3,t=2,3,\dots: Solve

f^targmaxfDa,bi=1t1logf(Xi).\hat{f}_t \leftarrow \underset{f \in \mathcal{D}_{a,b}}{\mathrm{argmax}} \sum_{i=1}^{t-1} \log f(X_i).

Theorem 2.1 (Excess KL risk of OG).

If X1,,XnX_1, \ldots, X_n are i.i.d. Q\sim Q with qDa,bq \in \mathcal{D}_{a,b},

RiskQ(f^a,bOG;n)ΓOG(a,b)n1/3.\mathrm{Risk}_Q(\hat{f}^{OG}_{a,b}; n) \leq \Gamma_{OG}(a, b) n^{1/3}.

The derivation utilizes classical entropy arguments: the (offline) Grenander estimator achieves E[KL(qf~t1,a,bMLE)]=O((t1)2/3)\mathbb{E} [\mathrm{KL}(q \| \tilde{f}^{MLE}_{t-1, a, b})] = O((t-1)^{-2/3}); summing over tt gives an aggregate O(n1/3)O(n^{1/3}) cumulative risk.

3. The Expert Aggregation (EA) Estimator

To bypass the computational burden of solving isotonic MLEs at each step, the Expert Aggregation (EA) estimator discretizes Da,b\mathcal{D}_{a,b} into a finite net of monotone histograms, and employs exponential weighting:

  • Expert set construction:
    • Grid breakpoints B{0,Δ,2Δ,,1}B \approx \{0, \Delta, 2\Delta, \ldots, 1\}, Δ=n(β+1)\Delta = n^{-(\beta+1)}.
    • Log-heights grid Λ={loga+jVΔ:j=0,1,}\Lambda = \{\log a + j V \Delta : j = 0,1,\ldots\}, V=log(b/a)V = \log(b/a).
    • Ek,n,β\mathcal{E}_{k, n, \beta}: All monotone histograms with at most kk bins, breakpoints from BB, heights from exp(Λ)\exp(\Lambda), normalized to integrate to 1 and lying in [a,b][a, b].
    • Cardinality: Ek,n,β(nβ+1)2k|\mathcal{E}_{k, n, \beta}| \lesssim (n^{\beta+1})^{2k}.
  • Exponential weighting over m=Ek,n,βm=|\mathcal{E}_{k, n, \beta}| experts {gj}\{g_j\}, w1(j)=1/mw_1(j) = 1/m, and for t1t \geq 1

wt+1(j)wt(j)gj(Xt),f^t+1EA(x)=j=1mwt+1(j)gj(x).w_{t+1}(j) \propto w_t(j) \cdot g_j(X_t), \qquad \hat{f}^{EA}_{t+1}(x) = \sum_{j=1}^m w_{t+1}(j) g_j(x).

  • Mixability Lemma: For log-loss, t=1nlogf^tEA(Xt)maxj=1,,mt=1nloggj(Xt)logm\sum_{t=1}^n \log \hat{f}^{EA}_t(X_t) \geq \max_{j=1,\ldots, m} \sum_{t=1}^n \log g_j(X_t) - \log m.

Theorem 2.2 (Pathwise regret of EA).

Under mild regularity (the data in Sn(β,γ)S_n(\beta, \gamma)), choosing kn/lognk \asymp \sqrt{n/\log n} for Ek,n,β\mathcal{E}_{k, n, \beta} yields

Regret(f^a,bEA;n,a,b)Γ(a,b,β)nlogn.\mathrm{Regret}(\hat{f}^{EA}_{a,b}; n, a, b) \leq \Gamma(a, b, \beta) \sqrt{n \log n}.

The proof uses histogram compression (bin-merge for kk-bin approximation, error O(nV/k)O(n V/k)), bin/height rounding (error O(nVΔ+kV)O(n V \Delta + kV)), the bound logEklogn\log |\mathcal{E}| \lesssim k \log n, and exponential-weights log-loss mixability. Selecting kn/lognk \asymp \sqrt{n/\log n} produces the displayed regret rate.

4. Log-Optimal Sequential pp-to-ee Calibration

An application of online monotone density estimation appears in sequential hypothesis testing for calibration of pp-values to ee-values. Under the null hypothesis, the pp-values PtP_t are sequentially super-uniform: Pr(Ptupast)u\Pr(P_t \leq u \mid \text{past}) \leq u. A pp-to-ee calibrator is a nonincreasing function h:[0,1][0,)h : [0,1] \to [0, \infty) with 01h(p)dp1\int_0^1 h(p) dp \leq 1, so that h(Pt)h(P_t) is a valid ee-value. Admissible calibrators correspond exactly to monotone densities on [0,1][0,1].

Under an alternative QQ with density qDq \in \mathcal{D}, the log-optimal calibrator is the true density: hopt()=q()h^{opt}(\cdot) = q(\cdot). Thus, estimating the log-optimal pp-to-ee calibrator corresponds to online monotone density estimation of qq.

An empirical adaptive procedure: at each tt, set h^t=f^t\hat{h}_t = \hat{f}_t (OG or EA) applied to the observed P1,,Pt1P_1, \ldots, P_{t-1}, and define the ee-process Mt=s=1th^s(Ps)M_t = \prod_{s=1}^t \hat{h}_s(P_s).

Theorem 3.1 (Asymptotic log-optimality).

If P1,,PnP_1,\ldots,P_n are i.i.d. Q\sim Q with qDa,bq \in \mathcal{D}_{a,b}, then for either OG- or EA-based calibrator,

1n[logMn(h^)logMn(hopt)]0almost surely.\frac{1}{n}\left[ \log M_n(\hat{h}) - \log M_n(h^{opt}) \right] \to 0\quad \text{almost surely}.

Corollary (Consistency of ee-process).

If QUniformQ \neq \mathrm{Uniform} and qlogq>0\int q \log q > 0, then logMn(h^)\log M_n(\hat{h}) grows linearly at rate qlogq>0\int q \log q > 0, implying that the sequential test rejects H0H_0 in finite time with probability 1.

5. Empirical Investigation

Numerical assessments are performed in both well-specified and mis-specified regimes:

  • Stochastic setting (n=1000n=1000, B=50B=50 replicates):
    • Densities: linear (q(u)=5/4u/2q(u) = 5/4 - u/2), quadratic (q(u)=3(1u)2q(u) = 3(1-u)^2), and a 4-bin piecewise constant.
    • Both OG and EA estimators closely track the oracle log-likelihood L(q,t)-L(q,t); EA demonstrates smaller finite-sample regret, particularly when qq is milder (e.g., linear).
  • Change-point (mis-specified) setting:
    • Data follows one linear monotone form for t200t\leq 200, then switches to another.
    • EA quickly adapts to the new regime via weight reallocation among experts—incurring substantially lower post-change regret—whereas OG, which relies on the full data history, adapts more slowly.

These experiments corroborate the theoretical O(n1/3)O(n^{1/3}) excess-risk for OG and O(nlogn)O(\sqrt{n \log n}) pathwise regret for EA. A plausible implication is that expert aggregation offers practical advantages in nonstationary environments.

6. Summary Table: Algorithmic and Statistical Properties

Estimator Stochastic (i.i.d.) Excess Risk Adversarial Regret Bound
Online Grenander O(n1/3)O(n^{1/3}) (Theorem 2.1) Not minimized; recomputes MLE
Expert Aggregation O(n1/3)O(n^{1/3}) (via expert approximation) O(nlogn)O(\sqrt{n \log n}) (Theorem 2.2)

Both algorithms support online monotone density estimation, but EA provides tighter adversarial guarantees and superior adaptivity in nonstationary or change-point settings.

7. Context and Significance

Online monotone density estimation extends classical nonparametric MLEs to the sequential, potentially adversarial context, providing both theoretical guarantees in KL-risk and worst-case regret. Its connection to sequential calibration for pp-to-ee procedures emphasizes the foundational role of monotonicity constraints in adaptive hypothesis testing. The link to exponential weighting and mixture-of-experts methods also places this problem at the intersection of shape-constrained inference, online learning, and sequential testing theory. The approach synthesizes classical statistical theory (Grenander estimator, entropy arguments) with modern online learning methods (expert aggregation, mixability) (Hore et al., 9 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Online Monotone Density Estimation.