Interrupted Time Series Models

Updated 27 January 2026

Interrupted Time Series models are quasi-experimental methods that assess intervention effects by comparing pre- and post-event longitudinal data.
These models incorporate ARMA/ARIMA error structures to account for autocorrelation and seasonal trends, ensuring robust statistical inference.
Recent extensions include multilevel, nonlinear, Bayesian, and copula-based frameworks, enhancing flexibility in analyzing complex data.

Interrupted time series (ITS) models are quasi-experimental frameworks used to infer the effect of a discrete intervention or policy change imposed at a known point in time, leveraging dense and equally spaced longitudinal outcome data before and after the event. Formally introduced to the social sciences by Campbell and Stanley (1963) and developed for rigorous statistical analysis by Box and Tiao (1975) in the ARMA time-series context, modern ITS designs are essential for causal inference in scenarios where randomization is impossible or impractical (Berk, 2021).

1. Foundational Model Structure and Formal Specification

The canonical ITS model considers a univariate time series $\{y_t: t = 1, \dots, T\}$ , with an intervention at time $T_0$ . The mean function $m_t(\kappa, \zeta)$ represents baseline (e.g., intercept $\mu$ , pre-intervention slope $\beta_1$ ) and intervention (e.g., immediate level change $\delta$ , slope change $\gamma$ ) parameters:

$y_t = m_t(\kappa, \zeta) + N_t$

Here, $N_t$ contains serially correlated noise, typically modeled as ARMA or ARIMA errors. In practice, one frequently implements an AR( $p$ )X( $T_0$ 0) model or a regression with ARIMA errors:

$T_0$ 1

$T_0$ 2

where $T_0$ 3 for $T_0$ 4. Seasonal components (e.g., $T_0$ 5 for weekly seasonality) and differencing (e.g., $T_0$ 6) are frequently incorporated as needed (Berk, 2021).

2. Intervention Parameterization and Formulation

ITS permits various parameterizations for intervention effects:

Step (level change): Sudden shift at $T_0$ 7; modeled by $T_0$ 8.
Ramp (slope change): Change in trajectory post- $T_0$ 9; $m_t(\kappa, \zeta)$ 0.
Pulse, multiple steps, or more complex structures: Modeled by additional indicator or basis functions.

The transfer function formalism (Box et al., 2016) enables flexible mapping of discrete policy impulses into dynamic expected outcomes (Berk, 2021).

In multivariate or hierarchical contexts, the mean structure generalizes to accommodate multiple units, each indexed by $m_t(\kappa, \zeta)$ 1:

$m_t(\kappa, \zeta)$ 2

where $m_t(\kappa, \zeta)$ 3 is a (possibly estimated) common intervention point across units, and $m_t(\kappa, \zeta)$ 4, $m_t(\kappa, \zeta)$ 5, $m_t(\kappa, \zeta)$ 6 may vary by unit (Cruz et al., 2018).

For proportional outcomes with excess zeros and ones, marginalized zero-one-inflated Beta time series models encode covariate effects on the marginal mean, while still capturing temporal structure and allowing explicit inference on both level and slope changes (Ye et al., 2022).

3. Stochastic Error Structure and Quasi-Experimental Assumptions

The serial dependence $m_t(\kappa, \zeta)$ 7 is critical for both inference and uncertainty quantification. ITS typically accounts for autocorrelation via AR( $m_t(\kappa, \zeta)$ 8), ARMA( $m_t(\kappa, \zeta)$ 9, $\mu$ 0), or state-space approaches. In complex scenarios, white-noise residuals may be insufficient; hence AR(1) or higher-order processes are frequently fit segment-wise (e.g., separate pre- and post-intervention autocorrelation and variance) (Cruz et al., 2017):

$\mu$ 1

Change in stochastic properties (variance, autocorrelation) are not only modeled but directly tested, enabling valid inference on intervention-induced variance reduction or increased autocorrelation post-intervention (Cruz et al., 2017).

State-space models enable estimation of latent trends and time-varying level effects, with intervention coefficients encoded as time-invariant or time-varying state elements. The Kalman filter and smoother provide optimal estimation and uncertainty for the time-localized effect and structural trend (Brakel et al., 2010).

4. Model Selection, Multiplicity, and Robust Inference

Adaptive model selection (choice of ARIMA orders $\mu$ 2, seasonal forms, intervention window, etc.) introduces test multiplicity and inflates the overall Type I error rate. For $\mu$ 3 candidate models, the familywise error rate is:

$\mu$ 4

Standard suboptimal practice is cherry-picking plausible models based on residual diagnostics; this implicitly tests multiple hypotheses and risks false positives. To guard against this, rigorous corrections are required:

Bonferroni: Adjust $\mu$ 5 to $\mu$ 6.
Max-t bootstrap: Simultaneous confidence by bootstrapping the maximum t-statistic across all models, thus reflecting the dependence structure and providing sharper control of the familywise error rate (Berk, 2021).

These adjustments are essential for post-model-selection inference, especially when the analysis is concisely exploratory or includes multiple plausible specifications.

5. Change-Point Estimation and Sensitivity

While many ITS analyses fix the intervention at a prior date, advanced models search for the most likely change point over a candidate window $\mu$ 7 around the formal date:

$\mu$ 8

Likelihood- or Wald-based procedures, often controlling for familywise error or false discovery rate across multiple possible change points, permit estimation of anticipatory or lagged effects rather than assuming instantaneous policy impact (Cruz et al., 2017, Cruz et al., 2018).

Sensitivity to the intervention date is often high; substantive conclusions may reverse (in sign and magnitude) when $\mu$ 9 is shifted by a few periods, particularly if policy debates or implementation ramp-up affect responses pre- or post-official launch (Berk, 2021).

6. Extensions: Multilevel, Nonlinear, and Bayesian ITS

Recent developments expand ITS methodology to address multi-unit, multilevel, nonlinear, and Bayesian contexts.

Multilevel Nonlinear ITS: Combines latent time series decomposition (random walk, vector-autoregression for secular/seasonal trends) with group-specific generalized additive model (GAM) smooths for flexible, nonlinear interruption effect modeling. Hierarchical model selection and regularization priors provide partial pooling and control over subgroup variability. This approach enables poststratification for direct population generalization (Waken et al., 7 Nov 2025).
Bayesian Hierarchical ITS: Models with exchangeable or smoothed spatial ( $\beta_1$ 0) and temporal ( $\beta_1$ 1) random effects, group-by-time interaction, and explicit segmentation for exposure groups, implemented with latent Gaussian models and inference via INLA (Gascoigne et al., 2023).
Copula-Based ITS for Bounded Outcomes: For proportional outcomes with mass at zero and one, marginalized three-part models embed copula dependence and explicitly encode segmented regression on the marginal mean. Covariate effects, change points, and hypothesis tests are all on the population-average scale (Ye et al., 2022).
Robust ITS for Variance and Correlation Changes: Identifies not only mean shifts, but also differences in residual variance and autocorrelation pre/post intervention, with likelihood-based estimation and formal hypothesis tests on all components (Cruz et al., 2017).

7. Design, Power, Practical Guidelines, and Limitations

ITS power depends critically on series length, autocorrelation, and effect size. For multilevel designs, power increases with the number of units and length of time series; for highly autocorrelated series, at least $\beta_1$ 2 is advised (Cruz et al., 2018). Sample-size and power calculations for comparative ITS, incorporating autocorrelation, cluster randomization, and unequal spacing, are available in closed-form for variance of treatment effect estimators (Schochet, 2021).

Key guidelines for valid ITS analysis include:

Pre-specification of intervention functional form, plausible covariates, and seasonal structure prior to model fitting;
Limiting order search $\beta_1$ 3 to an a priori grid, avoiding stepwise procedures;
Fitting all candidate models, confirming white-noise residuals, and applying simultaneous inference corrections;
Interpreting post-selection estimates as model-based associations, not as definitive causal effects, unless robustness is established across models and via sensitivity analyses;
Embedding ITS within broader research strategies (replication, meta-analysis, triangulation) to support causal inference (Berk, 2021).

ITS validity is sensitive to exogeneity of the intervention, absence of unobserved concurrent shocks, accurate modeling of autocorrelation and seasonality, and diligent control of statistical and model selection error.

References:

(Berk, 2021) Post-Model-Selection Statistical Inference with Interrupted Time Series Designs: An Evaluation of an Assault Weapons Ban in California
(Cruz et al., 2018) Assessing Health Care Interventions via an Interrupted Time Series Model: Study Power and Design Considerations
(Gascoigne et al., 2023) Bayesian Interrupted Time Series for evaluating policy change on mental well-being: an application to England's welfare reform
(Miratrix, 2020) Using Simulation to Analyze Interrupted Time Series Designs
(Brakel et al., 2010) Intervention analysis with state-space models to estimate discontinuities due to a survey redesign
(Schochet, 2021) Statistical Power for Estimating Treatment Effects Using Difference-in-Differences and Comparative Interrupted Time Series Designs with Variation in Treatment Timing
(Cruz et al., 2017) A Robust Interrupted Time Series Model for Analyzing Complex Healthcare Intervention Data
(Waken et al., 7 Nov 2025) Multilevel non-linear interrupted time series analysis
(Ye et al., 2022) A marginalized three-part interrupted time series regression model for proportional data