HDP-HSMM: Bayesian Model with Explicit Durations
- HDP-HSMM is a Bayesian nonparametric time-series model that generalizes HMMs by incorporating explicit, arbitrary state-duration distributions.
- It addresses the geometric sojourn-time limitation of conventional models, leading to improved segmentation in applications like speaker diarization and driving pattern analysis.
- Efficient inference is achieved via blocked Gibbs samplers and robust variants mitigate over-segmentation by merging redundant states.
A Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM) is a Bayesian nonparametric time-series model that generalizes both the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) and traditional hidden semi-Markov models (HSMMs). It allows an unbounded number of latent states and, crucially, supports arbitrary explicit state-duration distributions, overcoming the geometric sojourn-time limitation of conventional HMMs and HDP-HMMs. The model is particularly suited for sequential data where the temporal dynamics exhibit non-geometric segment lengths and where the number of dynamical regimes is unknown or potentially infinite (Johnson et al., 2012, Johnson et al., 2012, Wang et al., 2017, Aguirre et al., 2023, Taniguchi et al., 2015).
1. Generative Construction and Hierarchical DP Priors
The HDP-HSMM extends the HDP-HMM by enriching the Markov process over hidden states with explicit, state-dependent duration distributions. The generative process is as follows:
- Global stick-breaking (GEM) for base state weights:
forms a random probability measure over the (potentially infinite) set of states.
- State-specific transition measures:
Each is a probability vector governing transitions out of state , with base measure .
- State-specific emission and duration parameters:
parametrizes the emission distribution, ; parametrizes the duration law (e.g., Poisson, negative-binomial, or any explicit non-geometric dwell law).
- Sequential Generation:
- Select the next state , with as a dummy start state.
- Draw a segment length .
- Emit observations i.i.d. from : , where .
- Joint Probability:
Unlike the plain HDP-HMM, the HDP-HSMM explicitly decouples duration statistics from transition dynamics, thus breaking the geometric tenure constraint inherent to Markovian state transitions (Johnson et al., 2012, Johnson et al., 2012, Wang et al., 2017).
2. Posterior Inference: Blocked Gibbs Samplers
Inference in HDP-HSMMs is based on Markov chain Monte Carlo (MCMC) techniques, notably direct-assignment and weak-limit (finite truncation) blocked Gibbs samplers, leveraging segment-based updating to accelerate mixing:
- Direct-assignment sampler:
- Alternates updating super-state labels () and durations () using a Chinese Restaurant Franchise–style update.
- In each iteration:
- Update segment labels conditioned on the rest:
where is the marginal likelihood under . - Re-segment given current super-state chain via backward-messages and forward sampling. - Update stick-breaking weights , transition distributions , emission and duration parameters from conjugate posteriors.
- Weak-limit approximation:
- Employs a finite Dirichlet prior of dimension for computational tractability:
- Blocked Gibbs steps alternate segment sampling (via semi-Markov message passing), emission/duration parameter updates, and parameter resampling.
Both approaches exploit semi-Markov message-passing (backward-filtering/forward-sampling) to efficiently handle arbitrary duration laws. The weak-limit sampler generally exhibits improved mixing performance over direct-assignment, especially for moderate (Johnson et al., 2012, Johnson et al., 2012, Taniguchi et al., 2015).
3. Explicit-Duration Modeling and Its Impact
The distinguishing feature of the HDP-HSMM is the explicit parametrization of state dwell-time distributions. Each state's segment lengths are governed by an arbitrary prior , which can be Poisson, negative binomial, delayed geometric, or another suitable discrete measure. This flexibility permits modeling of non-geometric sojourns, critical for applications where HMMs and their nonparametric extensions (e.g., HDP-HMM) fail due to their implicit geometric assumption (Johnson et al., 2012, Johnson et al., 2012):
- In a standard HMM (and HDP-HMM), the probability that a state persists for consecutive observations is (i.e., geometric).
- The HDP-HSMM allows , fully decoupled from transitions.
- The model supports direct incorporation of prior knowledge about expected segment durations and enforces interpretable, application-specific dwell-time constraints.
A plausible implication is that the HDP-HSMM avoids "over-segmentation" and reduces false state switching, resulting in more semantically interpretable latent states (Johnson et al., 2012, Wang et al., 2017, Aguirre et al., 2023).
4. Empirical Results and Applications
Speaker Diarization
In speaker diarization tasks (NIST Rich Transcription data), HDP-HSMMs outperform both the sticky HDP-HMM and the plain HDP-HMM:
- HDP-HSMM achieves normalized Hamming error in iterations, while sticky HDP-HMM requires $5$k–$30$k iterations.
- Inferred number of speakers closely matches ground truth (within for most cases).
- The approach models non-geometric speaking turns, yielding superior interpretability (Johnson et al., 2012).
Morse-Code Pattern Discovery
For Morse code audio, the HDP-HSMM recovers all true latent classes (dot, dash, silence), correctly modeling segment lengths, whereas the HDP-HMM collapses states (tone/silence only) and cannot distinguish dots from dashes due to the geometric duration assumption (Johnson et al., 2012).
Driving Pattern Analysis
Applied to naturalistic car-following data, HDP-HSMMs uncover semantically meaningful primitive patterns per driver, supporting clustering and semantic labeling of driving behaviors. The explicit dwell-time modeling prevents spurious rapid switching, commonly observed in HDP-HMM and sticky HDP-HMM (Wang et al., 2017, Aguirre et al., 2023).
Extension to Language Acquisition
The HDP-HSMM forms a building block for hierarchical models integrating language and acoustic constraints—e.g., the Hierarchical Dirichlet Process Hidden LLM (HDP-HLM) for unsupervised discovery of words and phonemes directly from continuous speech signals (Taniguchi et al., 2015).
5. Robust Variants and Over-Splitting Mitigation
The nonparametric nature of the HDP prior can induce redundant state splitting (i.e., multiple near-identical clusters). The robust HDP-HSMM (rHDP-HSMM) addresses this by merging states with similar emission parameters within each Gibbs iteration:
- Redundant states are identified if , for a user-specified .
- Merging involves reassigning all timepoints from the redundant set to a single representative and down-weighting the stick-breaking mass of others.
- Empirical studies show rHDP-HSMM yields fewer, more interpretable states, faster convergence, and stable parameter estimates (e.g., correctly recovering true states in of replicates vs. near for the unmodified HDP-HSMM) (Aguirre et al., 2023).
6. Comparative Perspective and Implementation Considerations
The following comparison summarizes key differences relevant to practical time-series modeling:
| Model | Duration Law | State Cardinality | Transition Prior | Tendency to Over-Split? |
|---|---|---|---|---|
| HMM, HDP-HMM | Implicit geometric | Fixed/HDP | Categorical/HDP | Yes (HDP-HMM) |
| Sticky HDP-HMM | Implicit geometric (with self-bias) | HDP | Sticky HDP | Reduced |
| HDP-HSMM | Arbitrary explicit | HDP | HDP/Segmented | Yes (mitigated by rHDP) |
| rHDP-HSMM | Arbitrary explicit | HDP | HDP/Segmented | Suppressed (via merging) |
Efficient inference is achieved through blocked Gibbs sampling with backward-filter/forward-sample recursions, and computational complexity per iteration is for the truncated weak-limit sampler. Changepoint detection and precomputing emission likelihoods further reduce computational overhead (Johnson et al., 2012, Johnson et al., 2012).
7. Extensions and Ongoing Developments
The HDP-HSMM's explicit semi-Markov structure and Bayesian nonparametric machinery provide a foundation for further model extensions, such as:
- Factorial source models for power disaggregation (Johnson et al., 2012).
- Double-articulation analyzers for direct language acquisition from continuous speech via extensions like the HDP-HLM (Taniguchi et al., 2015).
- Robust inference via explicit state-merging schemes (rHDP-HSMM) to address over-fragmentation (Aguirre et al., 2023).
The model and its inference algorithms have seen implementation in toolkits such as "pyhsmm" supporting blocked Gibbs and beam sampling, facilitating application to complex real-world structured time-series data.
References:
(Johnson et al., 2012, Johnson et al., 2012, Wang et al., 2017, Taniguchi et al., 2015, Aguirre et al., 2023)