Progressive k-Annealing: Adaptive Clustering

Updated 6 February 2026

Progressive k-Annealing is a data-adaptive learning paradigm that dynamically adjusts prototypes and model complexity through a temperature-based annealing process.
It utilizes a temperature-parameterized free-energy formulation with Gibbs soft assignments and stochastic approximation to ensure robustness and convergence.
The method automatically grows clusters via bifurcation phenomena, is robust to initialization, and extends to hierarchical and reinforcement learning frameworks.

Progressive k-Annealing is a data-adaptive learning paradigm in which both the number of prototypes (k) and their locations are dynamically adapted online as a function of a decreasing “annealing” parameter, generally the temperature $T$ , or its inverse, $\beta$ . This approach extends classical deterministic annealing for clustering and classification, enabling automatic model complexity growth through bifurcation phenomena while mitigating sensitivity to initialization and poor local minima. The methodology is grounded in free-energy minimization, stochastic approximation, and Bregman divergence regularization, yielding an interpretable, robust, and complexity-adaptive framework for unsupervised and supervised learning (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022).

1. Free-Energy Formulation and Annealing Principle

At its core, Progressive k-Annealing replaces the non-convex hard-clustering objective with a temperature-parameterized free-energy functional: $F_T[M,p] = D[M,p] - T\,H[p]$ where $D[M, p] = \mathbb{E}_X \bigl[\sum_{i=1}^k p(\mu_i|X) d(X, \mu_i)\bigr]$ is the expected divergence between data points and prototypes, and $H[p] = -\mathbb{E}_X\bigl[\sum_{i=1}^k p(\mu_i|X) \log p(\mu_i|X)\bigr]$ is the Shannon entropy of the assignment probabilities. Here $p(\mu_i|x)$ are soft assignments, and $d(\cdot,\cdot)$ is a user-selected Bregman divergence (e.g., Euclidean distance, KL divergence) (Mavridis et al., 2021). As $T\to0$ , the entropy term vanishes, recapitulating hard assignments as in Lloyd’s algorithm; as $T\to\infty$ , the assignments become uniform and complexity collapses to $k=1$ .

The soft-assignment takes the Gibbs form: $p(\mu_i|x) = \frac{\exp(-d(x,\mu_i)/T)}{\sum_{j=1}^k \exp(-d(x,\mu_j)/T)}$ which ensures the differentiability and tractability of the optimization path as $T$ is lowered. The cluster centroids update in closed-form as weighted means: $\mu_i^* = \frac{\mathbb{E}_X \left[ X\,p(\mu_i|X) \right]}{\mathbb{E}_X \left[ p(\mu_i|X) \right]}$ for all members of the Bregman divergence class (Mavridis et al., 2021).

2. Bifurcation Phenomenon and Automatic Model Complexity Growth

Unlike fixed- $k$ clustering, the annealing process in Progressive k-Annealing enables the number of clusters to increase naturally via bifurcations as $T$ decreases. The critical temperature $T_c$ at which an existing prototype bifurcates is characterized by a loss in local stability of the free-energy minimum, most precisely by the criterion: $\det\left[I - \frac{1}{T} H_\phi(\mu_i)^{1/2} \widetilde{C}_i H_\phi(\mu_i)^{1/2}\right] = 0$ where $H_\phi$ is the Hessian of the Bregman generator $\phi$ , and $\widetilde{C}_i$ is the local covariance matrix in the dual space. In the canonical Euclidean case, the condition simplifies to $\lambda_{\max}(C_i) = T_c / 2$ (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022). In practice, split detection can be implemented via the “virtual split” heuristic: after convergence at a given $T$ , perturb each prototype by $\pm\delta$ , update, and observe if the perturbed prototypes diverge (indicating a true bifurcation and increment in $k$ ) or coalesce (no split) (Mavridis et al., 2021).

3. Stochastic Approximation and Online Prototype Updates

Progressive k-Annealing is realized via online, gradient-free stochastic approximation algorithms. For each prototype $i$ , running estimates of cluster responsibility $\rho_i$ and assigned data-weighted sum $\sigma_i$ are tracked: $\begin{aligned} \rho_i(n+1) &= \rho_i(n) + \alpha_n\left[ w_i(n) - \rho_i(n) \right] \ \sigma_i(n+1) &= \sigma_i(n) + \alpha_n\left[ x_n w_i(n) - \sigma_i(n) \right] \ \mu_i(n+1) &= \frac{\sigma_i(n+1)}{\rho_i(n+1)} \end{aligned}$ where weights $w_i(n)$ are derived from the Gibbs assignment, and $\alpha_n$ is the step size ( $\sum\alpha_n = \infty,\, \sum\alpha_n^2 < \infty$ ). For settings involving local parametric models within clusters, a two-timescale stochastic approximation is employed: cluster prototypes evolve on a slow timescale $\{\alpha_n\}$ , and local model parameters are updated on a faster timescale $\{\beta_n\}$ , with $\alpha_n/\beta_n \to 0$ (Mavridis et al., 2022).

Convergence of these updates to a local minimizer of $F_T$ is guaranteed under standard regularity and step-size conditions (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022).

4. Algorithmic Workflow and Hyper-Parameter Control

The progressive k-annealing workflow can be summarized as follows (Mavridis et al., 2021, Mavridis et al., 2022):

Initialization:

Select Bregman divergence, temperature schedule $\{T_n\}$ (typically $T_{n+1} = \gamma T_n$ with $\gamma\in(0,1)$ ), perturbation $\delta$ , tolerances $\varepsilon_c$ , $\epsilon_n$ , $\epsilon_r$ , and maximum $k_{\max}$ .
Start with $k=1$ prototype.

Annealing Loop:

For each temperature $T_n$ $T_{n}$ :
- For each prototype, attempt a virtual split by $\pm\delta$ .
- Iterate stochastic approximation updates until convergence.
- Prune coalesced or idle prototypes (small $\rho_i$ ).
- Increment $k$ when a genuine bifurcation is detected.
- Decrease $T$ for the next stage.

Termination:

Optionally fine-tune assignments with hard-clustering at $T\approx0$ .

Hyper-parameter roles:

$T_{\max}$ determines initial smoothness (single cluster stability).
$\gamma$ sets the annealing rate (proximal steps vs computational effort).
$\delta$ and merge thresholds ( $\epsilon_n$ , $\epsilon_r$ ) regularize the minimal cluster "resolution" and suppress redundant splits.

The table below summarizes key functional aspects:

Element	Description	Typical Range
$T_{\max}$	Initial temperature (smoothness)	$T_{\max} \gg \max_{x,\mu}d(x,\mu)$
$\gamma$	Decay factor for temperature	$0.7$--$0.9$
$\delta$	Split perturbation size	$\sim\varepsilon_c$
$\varepsilon_c$	Convergence tolerance for updates	Small ( $\ll1$ )
$\epsilon_n$	Prototype merge threshold	Moderate ( $\sim$ data scale)
$\epsilon_r$	Pruning threshold for idle prototypes	Small

5. Application Domains and Hierarchical Extensions

Progressive k-Annealing provides a general-purpose, online, robust clustering and classification methodology. It extends seamlessly to hierarchical and multi-resolution settings as realized in Multi-Resolution Online Deterministic Annealing (MRODA), whereby codebooks are organized as a tree structure, and ODA is invoked recursively within each node. This hierarchical organization localizes search, preserves computational tractability ( $O(k^2\,L)$ for depth $L$ ), and naturally exploits data locality and variable-rate partitioning akin to deep architectures. Within each resolution, bifurcations drive local complexity growth, and the process yields adaptive, interpretable variable-depth partitioning (Mavridis et al., 2022).

In reinforcement learning, the two-timescale stochastic approximation of k-annealing integrates with Q-learning via joint updates: fast-timescale temporal difference (TD) learning for Q-values and slower prototype updates, yielding an adaptive state-action aggregation scheme (Mavridis et al., 2022).

6. Convergence Guarantees and Empirical Performance

Convergence of Progressive k-Annealing is established via the ODE method for stochastic approximation. Provided the step-sizes satisfy $\sum\alpha_n=\infty,\, \sum\alpha_n^2<\infty$ and standard martingale-difference conditions, the iterates converge almost surely to locally stable equilibria of the corresponding ODE in the parameter space. In the two-timescale case, joint convergence of prototype and local-model parameters is guaranteed (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022).

Empirical evaluations show that Progressive k-Annealing:

Automatically discovers the effective number of clusters $k$ , with graceful complexity-performace trade-offs as measured by distortion or classification error.
Matches or surpasses the online convergence rate and accuracy of batch deterministic annealing, k-means, linear SVMs, and approaches shallow neural network or random forest accuracy, often with greater interpretability (Mavridis et al., 2021).
Is robust to initialization and avoids poor local minima due to annealing from high- $T$ solutions.
Enables online “dial-in” of desired model complexity by stopping annealing early, providing flexible computational and representational control.

7. Relation to Classical Deterministic Annealing and Extensions

Progressive k-Annealing is a direct online, gradient-free stochastic approximation realization of classical deterministic annealing frameworks (e.g., Rose ’98), where the number $k$ emerges at eigenvalue-driven bifurcation points as the temperature is reduced (Mavridis et al., 2021, Mavridis et al., 2022). The methodology extends to hierarchical architectures, two-timescale estimation, variable-resolution clustering/classification, and function approximation in each partition. In contrast to offline deterministic annealing, progressive k-annealing operates in a single pass, with the capacity to grow $k$ as data and complexity demands, and provides online adaptivity and interpretability (Mavridis et al., 2022).

Markdown Report Issue Upgrade to Chat

References (3)

Online Deterministic Annealing for Classification and Clustering (2021)

Annealing Optimization for Progressive Learning with Stochastic Approximation (2022)

Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Progressive k-Annealing.

Progressive k-Annealing: Adaptive Clustering

1. Free-Energy Formulation and Annealing Principle

2. Bifurcation Phenomenon and Automatic Model Complexity Growth

3. Stochastic Approximation and Online Prototype Updates

4. Algorithmic Workflow and Hyper-Parameter Control

5. Application Domains and Hierarchical Extensions

6. Convergence Guarantees and Empirical Performance

7. Relation to Classical Deterministic Annealing and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Progressive k-Annealing: Adaptive Clustering

1. Free-Energy Formulation and Annealing Principle

2. Bifurcation Phenomenon and Automatic Model Complexity Growth

3. Stochastic Approximation and Online Prototype Updates

4. Algorithmic Workflow and Hyper-Parameter Control

5. Application Domains and Hierarchical Extensions

6. Convergence Guarantees and Empirical Performance

7. Relation to Classical Deterministic Annealing and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research