Efficient smoothness selection for nonparametric Markov-switching models via quasi restricted maximum likelihood

Published 18 Nov 2024 in stat.ME and stat.CO | (2411.11498v1)

Abstract: Markov-switching models are powerful tools that allow capturing complex patterns from time series data driven by latent states. Recent work has highlighted the benefits of estimating components of these models nonparametrically, enhancing their flexibility and reducing biases, which in turn can improve state decoding, forecasting, and overall inference. Formulating such models using penalized splines is straightforward, but practically feasible methods for a data-driven smoothness selection in these models are still lacking. Traditional techniques, such as cross-validation and information criteria-based selection suffer from major drawbacks, most importantly their reliance on computationally expensive grid search methods, hampering practical usability for Markov-switching models. Michelot (2022) suggested treating spline coefficients as random effects with a multivariate normal distribution and using the R package TMB (Kristensen et al., 2016) for marginal likelihood maximization. While this method avoids grid search and typically results in adequate smoothness selection, it entails a nested optimization problem, thus being computationally demanding. We propose to exploit the simple structure of penalized splines treated as random effects, thereby greatly reducing the computational burden while potentially improving fixed effects parameter estimation accuracy. Our proposed method offers a reliable and efficient mechanism for smoothness selection, rendering the estimation of Markov-switching models involving penalized splines feasible for complex data structures.

Abstract PDF Upgrade to Chat

Summary

The paper's primary contribution is introducing a qREML algorithm to efficiently select smoothness in nonparametric Markov-switching models.
It leverages penalized likelihood and random effects for spline coefficients, significantly reducing computational burdens compared to traditional methods.
Empirical case studies on caracaras, Spanish energy prices, and African elephant movement demonstrate enhanced model fit and faster computation.

Efficient Smoothness Selection for Nonparametric Markov-Switching Models

This paper introduces an efficient method for smoothness selection in nonparametric Markov-switching models using a quasi restricted maximum likelihood (qREML) approach. The method addresses the computational challenges associated with traditional smoothness selection techniques like cross-validation and information criteria, which are particularly burdensome in the context of Markov-switching models. The paper demonstrates how to apply ideas from GLMMs to HMMs, drastically reducing the computational burden of marginal ML by converting a nested (numerical) optimization problem to a procedure that results in iterating model fitting based on sequentially updated penalty strengths.

Motivation and Background

Markov-switching models are widely used for modeling time series data with latent state dynamics. Incorporating nonparametric methods, such as penalized splines, enhances model flexibility and reduces biases. However, selecting the appropriate smoothness for these splines remains challenging due to the computational demands of traditional methods. The authors highlight that an inadequate choice of the parametric family of state-dependent distributions can lead to an overestimation of the number of states. Marginal maximum likelihood within a random effects framework has emerged as a promising alternative but still involves nested optimization, adding computational complexity.

Model Formulation and Parameter Estimation

The paper begins by formulating a basic HMM, defining the observed process $\{X_t\}$ and the latent state process $\{S_t\}$ . The conditional distribution of $X_t$ given $S_t$ is denoted as $f_i(x_t)$ . The state process is modeled as a first-order Markov chain with a transition probability matrix $\boldsymbol{\Gamma}^{(t)}$ . The authors then discuss several extensions of the basic HMM, incorporating nonparametric elements:

Nonparametric state-dependent densities: State-dependent densities $f_i(x)$ are modeled as finite linear combinations of basis functions $B_k(x)$ , i.e. $f_i(x) = \sum_{k=0}^K \alpha_k^{(i)} B_k(x)$ , where the coefficients $\alpha_k^{(i)}$ are constrained to ensure that $f_i$ is a valid probability density function.
Markov-switching generalized additive models (MS-GAMs): The expectation of the state-dependent distribution depends on covariates $\bm{z}_t$ , i.e. $g(\mu_t^{(i)}) = \eta^{(i)}_t = \beta_0^{(i)} + \sum_{q=1}^Q s_q^{(i)}(z_{tq})$ , where $s_q^{(i)}$ are smooth functions represented as finite linear combinations of basis functions.
Nonparametric modeling of transition probabilities: Transition probabilities $\gamma_{ij}^{(t)}$ are modeled as functions of covariates using a multinomial logistic regression, i.e. $\gamma_{ij}^{(t)} = \frac{\exp(\eta^{(ij)}_{t})}{\sum_{k=1}^N \exp(\eta^{(ik)}_{t})}$ , where the predictors $\eta^{(ij)}_{t}$ include smooth functions of covariates.

The paper uses the forward algorithm to calculate the likelihood recursively:

$\mathcal{L}(\bm{\theta}) = \bm{\delta}^{(1)} \bm{P}(x_1) \bm{\Gamma}^{(2)} \bm{P}(x_2) \bm{\Gamma}^{(3)} \dotsc \bm{\Gamma}^{(T)} \bm{P}(x_T) \bm{1},$

where $\bm{\delta}^{(1)}$ is the initial distribution, $\bm{\Gamma}^{(t)}$ is the transition probability matrix, and $\bm{P}(x_t)$ is the matrix of state-dependent densities or probability mass functions. Model flexibility is controlled by penalizing the curvature of the nonparametric components using a quadratic penalty, resulting in a penalized log-likelihood:

$l_p(\bm{\theta}; \bm{\lambda}) = l(\bm{\theta}) - \frac{1}{2} \sum_{i=1}^p \lambda_i \bm{b}_i^\intercal \bm{S}_i \bm{b}_i,$

where $\bm{\lambda}$ is a vector of smoothing strengths and $\bm{S}_i$ is a penalty matrix.

Quasi Restricted Maximum Likelihood (qREML)

The core contribution of the paper is the qREML algorithm for efficient smoothness selection. The authors treat spline coefficients as random effects with a Gaussian distribution and derive an approximate marginal log-likelihood by integrating out both the spline coefficients and the fixed effects. The marginal log-likelihood is approximated using a Laplace approximation around the mode $(\Hat{\bm{a}, \Hat{\bm{b}})$:

$l(\bm{\lambda}) \approx \frac{1}{2} \sum_{i=1}^p (K - m_i) \log(\lambda_i) + l(\Hat{\bm{a}, \Hat{\bm{b}) - \frac{1}{2} \sum_{i=1}^p \lambda_i \Hat{\bm{b}_i^\intercal \bm{S}_i \Hat{\bm{b}_i - \frac{1}{2} \log \det (\bm{V}^\intercal \bm{J}_p(\bm{\lambda}) \bm{V}),$

where $\bm{J}_p(\bm{\lambda})$ is the Hessian of the penalized log-likelihood and $\bm{V}$ is a matrix related to the basis functions. This leads to an iterative procedure for updating the penalty strength:

$\lambda_i = \frac{K - \lambda_i \Tr\bigl( (\bm{J}_p(\bm{\lambda})^{-1})_{ii} \bm{S}_i \bigr) - m_i}{\Hat{\bm{b}_i^\intercal \bm{S}_i \Hat{\bm{b}.$

The computational savings arise because the outer optimization problem is simplified by treating the fixed effects as random effects and by exploiting the structure of penalized splines, where the penalty is linear in the penalty strength parameters.

Practical Implementation and Case Studies

The authors implemented the qREML algorithm in the R package LaMa, including functions qreml() and penalty(). The implementation leverages the R package RTMB for automatic differentiation in the penalized likelihood estimation.

The paper presents three case studies to demonstrate the practical use of qREML for different types of nonparametric Markov-switching models:

Caracaras activity levels: A 3-state nonparametric HMM is fitted to acceleration data from juvenile striated caracaras, modeling the state-dependent distributions nonparametrically. The qREML algorithm is shown to achieve a better fit than parametric HMMs, and an AIC comparison also favors the nonparametric model.
Spanish energy prices: An MS-GAMLSS is used to model Spanish energy prices as a function of oil prices, with smooth functions linking the oil price to the mean and standard deviation of the state-dependent distributions. The authors showed that fitting the model using qREML was significantly faster than using full REML or marginal maximum likelihood methods.
African elephant: A 2-state HMM is fitted to the movement track of an African elephant, modeling the transition probabilities nonparametrically as a function of the time of day using cyclic P-splines. The results show that the nonparametric model captures the diurnal variation in behavior and accounts for time-varying uncertainty in the transition probabilities.

\begin{figure} \includegraphics[width=1\textwidth]{figs/caracaras_simple.pdf} \caption{Histogram of the acceleration data complemented with the state-dependent distributions and the marginal distribution of the 3-state (left panel) and 4-state (right panel) normal HMM fitted to the caracara data.} \label{fig:caracaras_simple} \end{figure}

\begin{figure} \includegraphics[width=1\textwidth]{figs/energy_oil.pdf} \caption{Estimated conditional means (solid lines) and quantiles (dashed lines) of the state-dependent distribution as a function of the oil price (left panel) and energy price time series colored according to the Viterbi-decoded state sequence (right panel).} \label{fig:energy} \end{figure}

\begin{figure} \includegraphics[width=1\textwidth]{figs/elephant_transprobs.pdf} \caption{Transition probabilities as a function of the time of day of the parametric model (left panel) and nonparametric model (right panel). Pointwise 95\% confidence intervals (shown in gray) are obtained by sampling from the joint distribution of the MLE --- for the nonparameteric model as described in Section \ref{subsec:practical}.} \label{fig:transprobs_elephant} \end{figure}

Simulation Experiments

The authors conducted simulation experiments to demonstrate the practicality of the qREML approach. The simulation setup involved generating data from a 2-state Gaussian HMM with transition probabilities varying smoothly with covariates. The results indicated that the smoothness penalty was chosen adequately in almost all runs, producing satisfactory function estimates. The convergence speed of the qREML algorithm increased with increasing sample size $T$ .

Conclusion

The qREML algorithm introduced in this paper provides an efficient and practical approach for smoothness selection in nonparametric Markov-switching models. The method addresses the computational limitations of traditional techniques and enables the application of these models to complex data structures. The authors show that qREML is especially promising for models requiring high computational resources. They also show that this smoothness selection procedure can be extended to several related classes of models, including continuous-time HMMs, state-space models and Markov-modulated Poisson processes.