A complete characterization of testable hypotheses

Published 8 Jan 2026 in math.ST, cs.IT, and math.PR | (2601.05217v1)

Abstract: We revisit a fundamental question in hypothesis testing: given two sets of probability measures $\mathcal{P}$ and $\mathcal{Q}$, when does a nontrivial (i.e.\ strictly unbiased) test for $\mathcal{P}$ against $\mathcal{Q}$ exist? Le~Cam showed that, when $\mathcal{P}$ and $\mathcal{Q}$ have a common dominating measure, a test that has power exceeding its level by more than $\varepsilon$ exists if and only if the convex hulls of $\mathcal{P}$ and $\mathcal{Q}$ are separated in total variation distance by more than $\varepsilon$. The requirement of a dominating measure is frequently violated in nonparametric statistics. In a passing remark, Le~Cam described an approach to address more general scenarios, but he stopped short of stating a formal theorem. This work completes Le~Cam's program, by presenting a matching necessary and sufficient condition for testability: for the aforementioned theorem to hold without assumptions, one must take the closures of the convex hulls of $\mathcal{P}$ and $\mathcal{Q}$ in the space of bounded finitely additive measures. We provide simple elucidating examples, and elaborate on various subtle measure theoretic and topological points regarding compactness and achievability.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a necessary and sufficient condition for testability by leveraging weak-* closures in the space of bounded finitely additive measures.
It extends classical minimax results by addressing limitations in scenarios where the dominating measure assumption fails in nonparametric contexts.
The findings emphasize the crucial role of finitely additive measures in accurately characterizing minimax risk and enhancing robust hypothesis testing.

Complete Characterization of Testable Hypotheses: A Formal Essay

Introduction and Problem Formulation

The paper "A complete characterization of testable hypotheses" (2601.05217) addresses a central foundational question in hypothesis testing: Given two collections $\mathcal{P}$ and %%%%1%%%% of probability measures on a measurable space, under what conditions does there exist a nontrivial (i.e., strictly unbiased) test that can distinguish between these hypotheses? The work provides a definitive answer, extending and correcting the classical results that depend on restrictive domination and closure assumptions, and resolves several open measure-theoretic subtleties regarding the existence and achievability of hypothesis tests—especially in nonparametric or undominated contexts that arise in modern statistics.

Let $\mathcal{P}$ and $\mathcal{Q}$ denote arbitrary nonempty subsets of countably additive probability measures on $(\Omega, \mathcal{F})$ . A test is a measurable function $\phi \colon \Omega \to [0,1]$ ; the worst-case type-I error is $\sup_{\mu \in \mathcal{P}} \mathbb{E}_\mu[\phi]$ and worst-case power $\inf_{\nu \in \mathcal{Q}} \mathbb{E}_\nu[\phi]$ . A test is nontrivial if its power exceeds its type-I error uniformly over the hypotheses.

Classical Results and Limitations

The classical minimax theorem, stemming from work by Le Cam and Kraft, asserts:

If $\mathcal{P}$ and $\mathcal{Q}$ are dominated by a common measure $\gamma$ , a nontrivial test exists if and only if the closures of their convex hulls (in total variation (TV) distance) are separated by a nonzero gap.

Formally,

$\exists\,\phi:\inf_{\nu \in \mathcal{Q}} \mathbb{E}_\nu[\phi] > \sup_{\mu \in \mathcal{P}} \mathbb{E}_\mu[\phi] + \varepsilon \iff d_\text{TV}(\operatorname{cl}_\text{TV} \operatorname{co} \mathcal{P}, \operatorname{cl}_\text{TV} \operatorname{co} \mathcal{Q}) > \varepsilon$

However, the dominating measure condition fails in nonparametric settings: e.g., in families consisting of all distributions with a given mean and bounded variance, location-symmetric distributions, or TV balls around a distribution. In these cases, the classical conclusion is not valid—the TV closure can be too small or too large, and perfect tests may exist or fail to exist in ways the standard result does not anticipate.

Main Theorem: Necessary and Sufficient Condition via Weak-* Closure in ba

The central contribution is a necessary and sufficient condition for the existence of nontrivial tests, without any domination assumption. The criterion fundamentally involves convex hull closures in the weak-* topology of the space of bounded finitely additive measures (ba):

Theorem:

Let $\mathcal{P}, \mathcal{Q}$ be nonempty subsets of $\mathcal{M}_1$ , and $\varepsilon \geq 0$ . Then

$\exists\,\phi\,:\, \inf_{\nu \in \mathcal{Q}} \mathbb{E}_\nu[\phi] > \sup_{\mu \in \mathcal{P}} \mathbb{E}_\mu[\phi] + \varepsilon \iff d_\text{TV}(\overline{\operatorname{co}}^* \mathcal{P}, \overline{\operatorname{co}}^* \mathcal{Q}) > \varepsilon$

and moreover,

$R(\mathcal{P}, \mathcal{Q}) = 1 - d_\text{TV}(\overline{\operatorname{co}}^* \mathcal{P}, \overline{\operatorname{co}}^* \mathcal{Q}),$

where $\overline{\operatorname{co}}^* \mathcal{P}$ is the weak-* (ba) closure of the convex hull, and $R$ is the minimax risk.

This result generalizes classical minimax results and is the first to provide a matching necessary and sufficient condition without extraneous technical constraints. Notably, finitely additive probability measures are essential in the characterization, even though all observable phenomena and statistical models, as traditionally formulated, use countably additive measures.

Examples and Subtleties

The TV closure of convex hulls in the dominated setting coincides with the closure in the weak-* (ba) topology, but when domination fails (e.g., in certain nonparametric examples), these sets differ and TV separation no longer characterizes testability.
Taking the closure in the weak topology (weak convergence of probabilities) is also incorrect, as this can be either too large or too small (explicit counterexamples are given in the text).
The proof leverages convex duality (Fan's minimax theorem) in the space of ba measures, exploiting the compactness of closed convex subsets in the weak-* topology.

Implications for Statistical Practice and Theory

Measure-Theoretic Considerations

The weak-* closure in ba is the maximal extension of a hypothesis class preserving all test risks. No further relaxation is possible without losing statistical meaning.
The necessity of finitely additive measures is not a matter of mathematical taste. There exist (even simple) hypothesis testing problems for which the minimax characterization necessarily concerns non-countably additive charges present only in $\mathsf{ba}$ ; the countably additive part alone does not suffice for the correct minimax risk.

Connection to e-Variables and Recent Developments

When the alternative is a singleton (simple-vs-composite) problem, the weak-* ba closure aligns (in the countably additive part) with the effective null set defined via e-variables and the bipolar theorem, as characterized in [larsson2024numeraire]. However, for general composite/composite problems, the use of the ba closure is essential for proper risk analysis.

Relation to Le Cam's Generalized Tests

Le Cam previously described necessary and sufficient conditions for testability when tests are permitted to be arbitrary functionals (potentially not measurable functions on the original probability space, but continuous functionals on measures). The present result shows that, for the risk associated with tests that are sample-measurable functions, a strictly stronger notion of closure (ba weak-*) must be used to obtain the tight minimax risk and power error characterization.

Practical and Theoretical Consequences

In robust/hard nonparametric testing scenarios, the difference between ba and standard probability measure closures can affect both the existence of tests and the achievable minimax risk.
The work clarifies that—even when practitioners are only interested in observable, countably additive models—determining whether any test whatsoever can nontrivially distinguish two composite hypotheses may require essentially broader measure-theoretic considerations.

Future Directions

The necessity of finitely additive measures in the correct minimax characterization raises several questions for modern statistical methodology:

Is it possible to operationalize or approximate the (often non-constructive) extremal ba measures appearing in the risk bounds?
How does the result propagate to sequential analysis, adaptive inference, or situations involving additional structure (e.g., exchangeability, partial sufficiency)?
What are the computational or algorithmic implications for robust hypothesis testing frameworks where no dominating measure exists?

Conclusion

This work establishes the complete framework for the existence and risk calculation of nontrivial tests between arbitrary composite hypotheses. The transition from TV or weak closures in probability laws to the weak-* closure in the space of bounded finitely additive measures is both necessary and sufficient for nonparametric, undominated settings. The results rectify gaps in the classical theory, refine measure-theoretic underpinnings, and provide a foundational reference for both theoretical and applied researchers dealing with nonstandard hypothesis testing problems (2601.05217).

Markdown Report Issue