Non-Stationary Markov Process

Updated 16 January 2026

Non-stationary Markov processes are stochastic models with time-dependent probabilities that accurately capture systems with changing dynamics and external influences.
They are formulated in both discrete and continuous frameworks, using time-indexed transition matrices or generator matrices to model abrupt shifts, periodicity, and smooth drifts.
Their robust modeling underpins applications in epidemiology, queueing theory, reinforcement learning, and statistical physics, driving novel computational and control schemes.

A non-stationary Markov process is a stochastic process in which the Markov property holds, but the transition probabilities or rates depend explicitly on time or on other evolving external variables. Unlike classical stationary Markov processes, where the law of evolution is time-homogeneous, non-stationary variants capture systems with dynamically changing environments, parameters, or underlying structures. This generalization is fundamental in accurately describing complex real-world phenomena that exhibit temporal heterogeneity, abrupt regime shifts, or gradual evolution, and underpins a large spectrum of modeling and algorithmic approaches in statistical physics, epidemiology, queueing, control, and reinforcement learning.

1. Mathematical Formulations

Discrete and Continuous-Time Models

A discrete-time Markov decision process (MDP) with time-dependent transitions is formally defined as the tuple $(\mathcal{S},\,\mathcal{A},\,\{P_t\},\,\{R_t\},\,\gamma)$ , where for each $t\ge0$ ,

$P_t(s' \mid s,a) = \Pr\bigl[S_{t+1}=s' \mid S_t=s,\,A_t=a\bigr]$

and $R_t(s,a)$ is the immediate reward function (Keplinger et al., 16 Jan 2025). The core characteristic is that either $P_t$ or $R_t$ (or both) depend on $t$ or on an exogenous process $\theta_t$ .

In continuous time, a non-stationary Markov jump process is characterized by a (possibly time-dependent) generator (rate) matrix $Q(t)$ with

$\frac{d}{dt}P(t_0, t) = P(t_0, t) Q(t), \qquad P(t_0, t_0) = I,$

where each off-diagonal $q_{ij}(t)\geq 0$ and $q_{ii}(t) = -\sum_{j\neq i}q_{ij}(t)$ (Tiomela et al., 22 May 2025, Fischer et al., 9 Jun 2025).

Examples of Explicit Non-Stationarity

Time-dependent transition parameters: $P_{ij}(t) = f_{ij}(t)$ with $\sum_j f_{ij}(t)=1$ for each $t$ (Tiomela et al., 22 May 2025).
Exogenous parameter-driven transitions: $P_t(s'|s,a) = P(s'|s,a,\theta_t)$ with $\theta_t$ a stochastic process, e.g., a Markov chain or random walk (Keplinger et al., 16 Jan 2025).

2. Modeling Frameworks and Classes

Markov Chains and Processes

Various classes arise from the specific structure of non-stationarity:

Class	Defining Feature	Key Reference
Piecewise-stationary	Blocks of constant $P_t$ , switching at change points	(Keplinger et al., 16 Jan 2025)
Smoothly time-varying	$P_t$ drifts continuously with $t$	(Keplinger et al., 16 Jan 2025)
Periodic (cyclostationary)	$P_t = P_{t+T}$ for some period $T$	(Fischer et al., 9 Jun 2025)
Exogenous parameter-driven	$\theta_t$ stochastic, $P_t = P(\cdot, \theta_t)$	(Keplinger et al., 16 Jan 2025)
Path-dependent Markovian	Transition rates depend on both $n$ and $t$	(Barraza et al., 5 Mar 2025)
Copula-based non-stationary	Markov property encoded via time-varying copulas	(Gobbi et al., 2017)
Switching MDP (SNS-MDP)	Underlying unobserved mode $\theta_t$ Markov chain	(Amiri et al., 24 Mar 2025)

Non-stationarity may be abrupt (stepwise), continuous (drift), or periodic, with modeling choices depending on the dynamics under study (Keplinger et al., 16 Jan 2025).

3. Analytical Results and Computational Schemes

Chapman–Kolmogorov and Balance Systems

Non-stationary Markov processes obey time-dependent forward equations. For discrete time, balanced systems relate compartment counts via increments (e.g., $\Delta_{|S|} = \Delta_{10} - \Delta_1$ , etc.) (Tiomela et al., 22 May 2025). In continuous time, the Kolmogorov equation generalizes as: $\frac{d\pi_i(t)}{dt} = \sum_{j\neq i} \pi_j(t)q_{ji}(t) - \pi_i(t)\sum_{j\neq i}q_{ij}(t)$ or, for controlled settings, with explicit policy dependence (Tiomela et al., 22 May 2025, Fischer et al., 9 Jun 2025).

Limit Theorems and Long-run Behavior

Law of Large Numbers (LLN) and Central Limit Theorems (CLT) have been established for non-stationary Markov jump processes:

Under mild regularity, cumulative reward $R(t)$ satisfies $R(t)/E[R(t)] \to 1$ almost surely as $t\to\infty$ .
If transitions and rewards are periodic in $t$ , then the time-averaged reward converges to the periodic mean, and normalized fluctuations are asymptotically normal (Fischer et al., 9 Jun 2025).

For certain classes, explicit limit cycles or absorbing structures can arise, as in time-inhomogeneous chains with feedback or reinforcement (Awoniyi, 2023).

Performance Approximations

For slowly varying $P_t$ , rigorous first-order corrections to stationary performance measures (e.g., discounted rewards, hitting times, expected occupation times) are derived via linear systems with perturbed matrices, providing $O(\epsilon)$ -accurate approximations with complexity identical to the stationary case (Zheng et al., 2018).

4. Stochastic Diffusion, Anomalous Dynamics, and Memory

Non-Stationary Anomalous Diffusion

Markovian replication processes (NMRP) on lattices, with time-dependent replication probability $p(t)$ , yield generalized telegrapher equations: $\frac{\partial \rho}{\partial t} + \mathcal{R}(t)\frac{\partial^2\rho}{\partial t^2} = \mathcal{D}(t)\frac{\partial^2\rho}{\partial x^2},$ with $\mathcal{D}(t)$ and $\mathcal{R}(t)$ determined by $p(t)$ (Choi et al., 2017). Classification is governed by $p(t)$ 's functional form—alternating, power-law, or marginal—producing a spectrum of diffusion behaviors (sub-, super-, or ultra-slow diffusion).

A further generalization introduces both state and time dependence in transition rates: $\lambda_n(t) = \frac{\beta + \gamma n}{1+\rho t}$ The dynamics balance a contagion term and a time-damping, with phase diagram (sub-, superdiffusive, ballistic, hyperballistic) indexed by $H=\gamma/\rho$ (Barraza et al., 5 Mar 2025). Non-stationarity is necessary for all regimes but the ballistic case.

Deviations from Gaussianity and violations of the classical CLT arise generically due to non-stationarity and autocorrelation (Barraza et al., 5 Mar 2025, Choi et al., 2017).

5. Algorithmic and Control Implications

Reinforcement Learning and Decision Processes

Non-stationarity in MDPs fundamentally impacts both policy structure and algorithm design:

Time-indexed value functions and Bellman recursions: $V_t^*(s)$ and $Q_t^*(s,a)$ are recomputed for each $t$ , requiring time-aware dynamic programming or Q-learning (Chen et al., 17 Nov 2025, Tiomela et al., 22 May 2025, Keplinger et al., 16 Jan 2025).
Switching environments: SNS-MDPs with latent Markovian mode switches retain TD-learning and Q-learning convergence due to ergodicity of the joint $(\theta_t, s_t)$ process (Amiri et al., 24 Mar 2025).
Delayed reinforcement: In delayed MDPs, optimal policies must be non-stationary Markov (i.e., $a_t = d_t(s_t)$ , not time-invariant), as stationary Markov policies can be strictly sub-optimal when delay $m>0$ (Derman et al., 2021).
Algorithmic approaches: ASP(RL), hybridization with logical solvers, and smooth forgetting via exponential weights in value estimation support adaptation to evolving dynamics (Keplinger et al., 16 Jan 2025, Touati et al., 2020, Ferreira et al., 2017).

Practical Benchmarks

Simulation toolkits such as NS-Gym enable systematic benchmarking of algorithms on non-stationary environments, offering a modular framework for emulating parametric (e.g., periodic, abrupt, or drifting) evolution of underlying MDP parameters (Keplinger et al., 16 Jan 2025).

6. Statistical, Dynamical, and Nonparametric Models

Bayesian nonparametric models construct non-stationary Markovian dynamics on real-valued data without pre-imposed functional forms or stationarity assumptions. For example, transition densities can be specified via Dirichlet process mixtures of bivariate normals, yielding time-homogeneous but marginally non-stationary Markov models suitable for capturing evolving or heteroscedastic time series (DeYoreo et al., 2016).

Similarly, copula-based constructions facilitate both the representation and verification of $\beta$ -mixing (absolute regularity) under time-varying dependence parameters, with explicit bounds on mixing rates related to the maximal-correlation coefficients of the evolving copulas (Gobbi et al., 2017).

7. Applications and Empirical Insights

Non-stationary Markov process modeling is central to a range of empirical domains:

Epidemiological modeling: Time-varying compartment transition rates enable accurate simulation of disease waves, policy response, and resource allocation, outperforming stationary models which fail to capture non-equilibrium dynamics (Tiomela et al., 22 May 2025, Barraza et al., 5 Mar 2025).
Healthcare and system maintenance: Feedback-driven non-stationary Markov chains predict treatment or repair cycles, optimizing resource management in complex service systems (Awoniyi, 2023).
Queueing and service operations: Time-of-day or week-dependent rates require LLN/CLT development for performance analysis under realistic, fluctuating workloads (Fischer et al., 9 Jun 2025).
Communications and adaptive protocols: Switching MDPs capture network channels with Markovian mode-switching (e.g., due to fading), guiding robust protocol adaptation (Amiri et al., 24 Mar 2025).
Algorithmic robustness: Benchmark environments synthesized via NS-Gym, as well as theoretical regret bounds for non-stationary linear MDPs, illustrate the necessity of temporal adaptation and model update mechanisms (Keplinger et al., 16 Jan 2025, Touati et al., 2020).

Empirical evidence across these domains consistently demonstrates superior fidelity and policy efficacy when explicitly modeling or learning with non-stationary Markovian dynamics.

In summary, non-stationary Markov processes provide a canonical framework for representing, analyzing, and controlling complex systems in which time or exogenous factors drive structural shifts. Their mathematical characterization demands explicit temporal indexing or dynamic parameter evolution, and their effective deployment encompasses new algorithms, limit theorems, and empirical methodologies, all underpinned by a diverse and technically rigorous research literature (Tiomela et al., 22 May 2025, Amiri et al., 24 Mar 2025, Keplinger et al., 16 Jan 2025, Choi et al., 2017, Awoniyi, 2023, Barraza et al., 5 Mar 2025, Fischer et al., 9 Jun 2025, Ferreira et al., 2017, Zheng et al., 2018, Chen et al., 17 Nov 2025, Derman et al., 2021, DeYoreo et al., 2016, Touati et al., 2020, Gobbi et al., 2017).