A comparison of methods for Poisson regression in the presence of background

Published 3 Apr 2026 in stat.ME, astro-ph.IM, and stat.AP | (2604.02664v1)

Abstract: This paper provides a statistical analysis of three common methods of regression for Poisson data in the presence of Poisson background, namely the joint fit with two parametric models for the source and the background, the use of a non-parametric model for the background known as the wstat method, and the regression with a fixed background. The non-parametric background method, which is a popular method for spectral data, is found to be significantly biased, especially in the low-count and background-dominated regimes. Similar conclusions apply to the fixed-background regression. The joint-fit method, on the other hand, simultaneously affords reliable hypothesis testing by means of the usual Cash statistic and unbiased reconstruction of source parameters. We also investigate the effect of non-parametric regression on the number of effective degrees of freedom by means of the Efron degree of freedom function. We find that the wstat method adds a significantly larger number of degrees of freedom, compared to the number of free parameters in the source model. The other two methods have a number of degrees of freedom consistent with the number of adjustable parameters, at least for the simple models investigated in this paper.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper demonstrates that joint likelihood fitting yields unbiased source estimates while wstat and fixed-background methods suffer from systematic bias in low count regimes.
Monte Carlo simulations and theoretical analysis reveal that nonparametric approaches can overfit by increasing effective model degrees of freedom.
Practical implications emphasize using parametric joint models over nonparametric methods in background-dominated settings to ensure reliable hypothesis testing.

Comparative Assessment of Poisson Regression Methods with Background Contributions

Introduction

Poisson regression underlies the modeling of event counts in astronomy and related disciplines when discrete phenomena must be inferred in the presence of measurement noise and systematically varying backgrounds. The statistical treatment is especially nontrivial for low count-rate regimes, typical in high-energy astrophysics, where both the source and the background follow Poisson statistics, and traditional Gaussian approximations fail. The paper "A comparison of methods for Poisson regression in the presence of background" (2604.02664) presents a rigorous analysis of three established approaches: (1) joint likelihood fitting with parametric models for both source and background, (2) the wstat (W) method using a nonparametric, stepwise-constant background model, and (3) Poisson regression with fixed background. Numerical experiments and theoretical considerations clarify the properties and biases of each.

Poisson Data Model and Regression Approaches

The core model comprises independent observations over a source region and a background region, each yielding per-bin counts according to Poisson processes with possibly distinct means and exposures. The primary aim is to estimate parameters describing the source signal, with or without simultaneous inference of background parameters. The three methods considered are:

Joint Fit: Both the source and background regions are assigned parametric models with a joint Poisson likelihood function. The method rigorously incorporates background uncertainty, but requires a valid background model with its own parameters.
wstat (Nonparametric Background): The background is modeled as a stepwise-constant function with one parameter per bin, optimized along with the source parameters using a restricted MLE approach. This method eschews a parametric background but can introduce high model complexity and potential overfitting.
Fixed Background: The background is held fixed to measured counts. The source is fitted without accounting for stochastic background fluctuations. This can lead to model misspecification, particularly in background-dominated data.

A typical realization of Poisson counts for $\theta=\beta=1$ demonstrates the overlap in source and background regimes and the respective model fits for each method.

Figure 1: Sample data and best-fit models ( $\theta = \beta = 1$ , $N = 100$ ), illustrating stepwise (wstat), parametric (joint), and fixed background fits.

Statistical Properties and Goodness-of-Fit Behavior

Asymptotics and Wilks' Theorem

All three estimators are derived from maximum likelihood principles, which under appropriate regularity and large-sample conditions enables the use of likelihood ratio test statistics (notably, the Cash/C or W statistics) with $\chi^2$ distributions per Wilks' theorem. However, these conditions fail for low counts or nonparametric complexity, as demonstrated for the W method. Deviations from $\chi^2$ behavior in mean and variance are significant for small Poisson means.

Degrees of Freedom and the Efron Function

For linear or parametric Poisson regression, the effective degrees of freedom align with the number of model parameters. Nonlinear or nonparametric approaches, however, necessitate more general complexity measures. The Efron function, $df(\mu)$ , defined as the sum of the covariances between fitted means and observed data normalized by per-point variances, quantifies the effective degrees of freedom, capturing overfitting or redundancy. For the W method, $df(\mu)$ may substantially exceed the number of nominal source parameters, particularly in background-dominated regimes, whereas for parametric joint fits it remains consistent.

Simulations: Bias and Distributional Properties

Extensive Monte Carlo experiments probe the performance under a range of source-to-background ratios. The empirical cumulative distribution function (eCDF) for both the fit statistic and parameter estimates are analyzed for representative cases.

Figure 2: eCDFs of fit statistics and parameter biases, $\theta = \beta = 1$ , $N = 100$ ; comparison of joint, wstat, and fixed background methods.

The joint fit approach provides unbiased estimates of the source parameter across all data regimes, with fit statistics conforming to theoretical expectations.
The wstat method introduces systematic positive bias in the estimated source parameters when both source and background means are of order unity or less, particularly in low-count, background-dominated scenarios. In these cases, W statistics are systematically smaller than the corresponding fixed-background C statistics but reflect large effective model complexity due to overfitting.
The fixed background method is also prone to significant bias under background-dominated conditions, sometimes exceeding the wstat bias.

Figure 3: (Top) eCDFs for $\theta = \beta = 0.1$ , (Bottom) eCDFs for $\theta = \beta = 1$ 0, $\theta = \beta = 1$ 1; both $\theta = \beta = 1$ 2.

Figure 4: eCDFs for the limiting case $\theta = \beta = 1$ 3, $\theta = \beta = 1$ 4, $\theta = \beta = 1$ 5; highlighting truncation and systematic bias in wstat and fixed-background fits.

Implications for Hypothesis Testing and Model Selection

The empirical results indicate that for joint fits, hypothesis testing and confidence interval construction maintain statistical rigor, even at low counts, provided correct expectations for mean and variance of the fit statistic are used. For nonparametric (wstat) and fixed-background approaches, hypothesis tests premised on a one-to-one correspondence between parameter count and degrees of freedom become invalid. The W method in particular may present spurious goodness-of-fit through excessive effective degrees of freedom, and can strongly overestimate source intensities in challenging regimes.

The paper conjectures that using the Efron $\theta = \beta = 1$ 6 function as the degrees of freedom in Wilks-based inference may correct for this complexity in nonparametric and nonlinear models, but this is validated only in the parametric, constant-model settings.

Practical and Theoretical Implications

From a practical perspective, classical Cash-statistic joint Poisson fitting—where background is modeled parametrically—should be the standard for inference in astrophysical Poisson count data where background is relevant and modeling is feasible. The wstat method, although widespread in spectral analysis software, should be restricted to cases with high mean counts and strong source dominance over the background, as its use elsewhere is empirically shown to be biased and susceptible to overfitting. Fixed-background regression is demonstrably inadequate except when background is of negligible magnitude.

From a statistical modeling standpoint, the consideration of effective model complexity (via $\theta = \beta = 1$ 7 or analogous quantities) is essential for hypothesis testing and model selection beyond simple parametric settings. The results underscore the necessity of proper goodness-of-fit benchmarks and caution against applying naive interpretations of fit statistics and conventional p-values in complex or sparse data settings.

Prospects for improvement include development of adjusted nonparametric background models (e.g., via adaptive rebinning analogous to Bayesian blocks) to mitigate the nonlinearity and excessive complexity of the stepwise wstat procedure, retaining the flexibility where background modeling is otherwise infeasible.

Conclusion

This work delivers a comprehensive statistical evaluation of major Poisson regression techniques in the context of background-limited data, combining theoretical insight with exhaustive numerical investigations. The findings underscore the superiority of joint parametric modeling for bias-free estimation and rigorously interpretable hypothesis testing. Nonparametric background approaches such as wstat entail significant bias and overfitting risks under low-count or background-dominated conditions, necessitating careful scrutiny of their application and interpretation. The adaptation of the Efron function and related concepts of effective complexity to Poisson regression provides a pathway for improved inference in nonlinear or over-parameterized contexts.

The implications extend broadly to statistical methodology in high-energy astrophysics, particle physics, and other domains where Poisson statistics with background are characteristic, and reinforce the criticality of complexity-aware, model-consistent regression analysis.

Markdown Report Issue