Hawkes Processes: Power Law Kernels

Updated 12 January 2026

Hawkes processes with power law kernels are self-exciting point processes characterized by slow decay, imparting long-range memory suitable for modeling clustered events.
They employ power law decaying kernels and nonparametric estimation methods to capture critical behavior and scale-free dynamics in complex systems.
Applications include quantitative finance, seismology, and neuroscience, where these models effectively replicate burstiness and heavy-tailed event distributions.

A Hawkes process with power law kernel is a self-exciting point process in which the excitation (memory) kernel decays slowly, typically as an inverse power of time. This structure imparts long-range dependence (“long memory”) to the process, making it fundamentally different from the Markovian (exponential) case. Power law kernels are prevalent across quantitative finance, seismology, and neuroscience, especially for modeling phenomena where the influence of past events decays slowly, such as in high-frequency limit order books, event clustering in earthquakes, and rough volatility dynamics.

1. Mathematical Formulation and Kernel Structure

Let $N(t)$ denote the counting process with conditional intensity $\lambda(t)$ . The Hawkes intensity with a power law kernel is

$\lambda(t) = \mu + \int_{0}^{t} \phi(t-s)dN(s),$

where $\mu > 0$ is the baseline and $\phi$ is the excitation kernel.

Typical power law form: $\phi(t) = \frac{\alpha}{(t+\varepsilon)^{\beta}}, \qquad \alpha > 0, \; \beta > 1, \; \varepsilon > 0,$ so for large $t$ , $\phi(t) \propto t^{-\beta}$ (Batra, 19 Mar 2025). In multivariate (e.g., bivariate for buy/sell events) settings, the kernel generalizes to $\phi_{ij}(t)$ , governing the excitation from dimension $j$ to $i$ :

$\lambda_i(t) = \mu_i + \sum_{j=1}^d \int_{0}^{t} \phi_{ij}(t-s)dN_j(s).$

Mittag-Leffler kernel (“fractional Hawkes”):

A specific, analytically tractable case is the Mittag-Leffler kernel,

$\phi(t) = t^{\beta-1} E_{\beta,\beta}(-t^{\beta}),$

with Laplace transform $\mathcal{L}\{\phi\}(s) = 1/(1+s^{\beta})$ , yielding a power law tail $\sim t^{-(\beta+1)}$ (Chen et al., 2020, Habyarimana et al., 2022).

2. Stability, Stationarity, and Memory Properties

Stationarity criterion requires the spectral radius of the matrix of $L^1$ -norms of the kernel components: $M_{ij} = \int_0^\infty \phi_{ij}(t)dt$ must satisfy $\rho(M) < 1$ (Batra, 19 Mar 2025, Bacry et al., 2014, Bacry et al., 2011). For univariate power law kernels,

$\int_0^\infty \frac{\alpha}{(t+\varepsilon)^{\beta}}dt = \frac{\alpha}{(\beta-1)\varepsilon^{\beta-1}} < 1.$

Power law kernels with $\beta \leq 1$ do not admit finite $L^1$ norm, leading to non-stationarity or criticality.

Long-range memory: For $1<\beta<2$ , the kernel's slow decay creates long memory, producing autocorrelations decaying as power laws. In “fractional Hawkes,” the parameter $\beta$ tunes the memory: smaller $\beta$ leads to heavier tails and slower convergence to stationarity (Chen et al., 2020).

Critical and nearly critical cases: For kernels $\phi(t) \sim c t^{-(1+\alpha)}$ , critical branching corresponds to normalization $\|\phi\|_1 = 1$ . In this regime, the process loses exponential memory and acquires nontrivial scaling limits (see Sec. 6).

3. Parameter Estimation and Nonparametric Recovery

Likelihood-based estimation: Maximum-likelihood inference optimizes

$\mathcal{L}(\theta) = \sum_{i=1}^d \left[ \int_0^T \!\log\lambda_i(t;\theta)dN_i(t) - \int_0^T \!\lambda_i(t;\theta)dt \right]$

subject to positivity constraints for kernel parameters (Batra, 19 Mar 2025).

Nonparametric estimation: Techniques based on second-order statistics and covariance inversion allow empirical recovery of the kernel's shape,

$\Phi(t) \approx C t^{-p}$

where exponents $p \approx 1$ have been empirically observed for high-frequency market data (Bacry et al., 2011, Bacry et al., 2014). Robust procedures utilize multiscale grids and adaptive quadrature to recover slow decay over up to six decades in time.

4. Statistical and Empirical Characteristics

Goodness-of-fit: Power law models outperform exponential models in negative log-likelihood and capture clustering and burstiness in high-frequency transaction data (Batra, 19 Mar 2025).
Memory and clustering: Slower decay kernels generate realistic heavy-tailed inter-arrival time distributions, strong self-excitation, and cross-excitation effects observed in empirical order flow (Batra, 19 Mar 2025, Bacry et al., 2014).
Long-range dependence: Estimated exponents near 1 ( $p \approx 1$ ) mark the border of marginal summability, indicating persistent self-excitation and scale-free event dependence (Bacry et al., 2011).
Endogeneity: In high-frequency financial data, the endogenous (self- and cross-excited) component is dominant; exogenous arrivals form a small fraction of events (Bacry et al., 2014).

5. Scaling Limits, Rough Volatility, and Criticality

Nearly unstable (critical) regime: As $\|\phi\|_1 \to 1$ , Hawkes processes with $\phi(t) \sim t^{-(1+\alpha)}$ , $\alpha \in (1/2,1)$ , converge under scaling to rough fractional diffusions or Volterra processes, with

$dV_t = \kappa(\theta - V_t)dt + \xi V_t^{1/2}dB^H_t,$

where $H = \alpha - 1/2$ is the Hurst index (Jaisson et al., 2015, Horst et al., 2023). In finance, this explains the microstructural origin of “rough” volatility (very low Hurst exponents). Both the subcritical (fractional Brownian) and supercritical (fractional CIR-type) limits emerge, depending on the approach to criticality (Xu et al., 23 Apr 2025).

Genealogical encoding: The cluster representation leverages the underlying branching structure. Critical Hawkes processes can be described using Kesten trees to capture infinite genealogies under unit branching ratio (Kirchner, 2017).

Scaling asymptotics: For power law kernels,

$\mathbb{E}[N(t)] \sim \mu_0 t + O(t^{\alpha+1}), \quad \operatorname{Var}[N(t)] \sim t^{2\alpha+1}$

with corrections described by second-order regular variation (Horst et al., 2023). Explicit two-term asymptotics are possible for pure power law and Mittag-Leffler kernels.

6. Markovian Representations and Numerical Approximations

While power law kernels are inherently non-Markovian, general Volterra (and thus Hawkes with power law memory) dynamics can be approximated arbitrarily closely via finite mixtures of exponentials. This allows for Markovian “lifts” of the system:

Finite-dimensional Markov approximation: Any $L^1 \cap L^2$ kernel (including $t^{-\alpha}$ , $\alpha > 1$ ) can be approximated by $\sum_{k=1}^n \eta_k e^{-\beta_k t}$ , leading to finite-dimensional Markov jump-diffusion representations with explicit error bounds on the intensity and path (Khabou et al., 15 Jul 2025).
Mittag-Leffler “fractional” Hawkes: Power law memory kernels admitting Laplace transforms as rational functions (e.g., Mittag-Leffler kernels) allow explicit solution of first- and second-order statistics, spectral densities, and simulation via mixture-of-exponentials methods (Chen et al., 2020, Habyarimana et al., 2022).

7. Field-Theoretic and Functional Perspectives

Infinite-dimensional Markov embedding: Any power law kernel $g(t) = A(t+\tau_0)^{-1-\theta}$ can be viewed as a continuum superposition of exponentials, leading to an infinite-dimensional Markov field representation. The corresponding master equation, in Laplace transform, exhibits a transcritical bifurcation at the critical branching ratio $n=1$ , resulting in an intermediate asymptotic regime where the probability density function (PDF) of the intensity follows a power law (Kanazawa et al., 2020).

Quantum field analogies: The functional master equation for the intensity field admits a formal mapping to non-Hermitian quantum field theory, with the branching structure arising as a non-local jump operator. Near the critical point, the intensity PDF exhibits a cross-over from a power-law to an exponential cutoff, as the characteristic intensity diverges (Kanazawa et al., 2020).

In summary, Hawkes processes with power law kernels provide a robust framework for modeling phenomena characterized by bursty, long-range self-excitation. They exhibit distinctive memory and clustering properties, tractable scaling limits that link microstructure to rough volatility models, and admit both nonparametric estimation and Markovian approximations. These features make them an essential component of contemporary stochastic modeling in high-frequency finance and beyond.