Papers
Topics
Authors
Recent
Search
2000 character limit reached

Progressive k-Annealing: Adaptive Clustering

Updated 6 February 2026
  • Progressive k-Annealing is a data-adaptive learning paradigm that dynamically adjusts prototypes and model complexity through a temperature-based annealing process.
  • It utilizes a temperature-parameterized free-energy formulation with Gibbs soft assignments and stochastic approximation to ensure robustness and convergence.
  • The method automatically grows clusters via bifurcation phenomena, is robust to initialization, and extends to hierarchical and reinforcement learning frameworks.

Progressive k-Annealing is a data-adaptive learning paradigm in which both the number of prototypes (k) and their locations are dynamically adapted online as a function of a decreasing “annealing” parameter, generally the temperature TT, or its inverse, β\beta. This approach extends classical deterministic annealing for clustering and classification, enabling automatic model complexity growth through bifurcation phenomena while mitigating sensitivity to initialization and poor local minima. The methodology is grounded in free-energy minimization, stochastic approximation, and Bregman divergence regularization, yielding an interpretable, robust, and complexity-adaptive framework for unsupervised and supervised learning (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022).

1. Free-Energy Formulation and Annealing Principle

At its core, Progressive k-Annealing replaces the non-convex hard-clustering objective with a temperature-parameterized free-energy functional: FT[M,p]=D[M,p]TH[p]F_T[M,p] = D[M,p] - T\,H[p] where D[M,p]=EX[i=1kp(μiX)d(X,μi)]D[M, p] = \mathbb{E}_X \bigl[\sum_{i=1}^k p(\mu_i|X) d(X, \mu_i)\bigr] is the expected divergence between data points and prototypes, and H[p]=EX[i=1kp(μiX)logp(μiX)]H[p] = -\mathbb{E}_X\bigl[\sum_{i=1}^k p(\mu_i|X) \log p(\mu_i|X)\bigr] is the Shannon entropy of the assignment probabilities. Here p(μix)p(\mu_i|x) are soft assignments, and d(,)d(\cdot,\cdot) is a user-selected Bregman divergence (e.g., Euclidean distance, KL divergence) (Mavridis et al., 2021). As T0T\to0, the entropy term vanishes, recapitulating hard assignments as in Lloyd’s algorithm; as TT\to\infty, the assignments become uniform and complexity collapses to k=1k=1.

The soft-assignment takes the Gibbs form: p(μix)=exp(d(x,μi)/T)j=1kexp(d(x,μj)/T)p(\mu_i|x) = \frac{\exp(-d(x,\mu_i)/T)}{\sum_{j=1}^k \exp(-d(x,\mu_j)/T)} which ensures the differentiability and tractability of the optimization path as TT is lowered. The cluster centroids update in closed-form as weighted means: μi=EX[Xp(μiX)]EX[p(μiX)]\mu_i^* = \frac{\mathbb{E}_X \left[ X\,p(\mu_i|X) \right]}{\mathbb{E}_X \left[ p(\mu_i|X) \right]} for all members of the Bregman divergence class (Mavridis et al., 2021).

2. Bifurcation Phenomenon and Automatic Model Complexity Growth

Unlike fixed-kk clustering, the annealing process in Progressive k-Annealing enables the number of clusters to increase naturally via bifurcations as TT decreases. The critical temperature TcT_c at which an existing prototype bifurcates is characterized by a loss in local stability of the free-energy minimum, most precisely by the criterion: det[I1THϕ(μi)1/2C~iHϕ(μi)1/2]=0\det\left[I - \frac{1}{T} H_\phi(\mu_i)^{1/2} \widetilde{C}_i H_\phi(\mu_i)^{1/2}\right] = 0 where HϕH_\phi is the Hessian of the Bregman generator ϕ\phi, and C~i\widetilde{C}_i is the local covariance matrix in the dual space. In the canonical Euclidean case, the condition simplifies to λmax(Ci)=Tc/2\lambda_{\max}(C_i) = T_c / 2 (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022). In practice, split detection can be implemented via the “virtual split” heuristic: after convergence at a given TT, perturb each prototype by ±δ\pm\delta, update, and observe if the perturbed prototypes diverge (indicating a true bifurcation and increment in kk) or coalesce (no split) (Mavridis et al., 2021).

3. Stochastic Approximation and Online Prototype Updates

Progressive k-Annealing is realized via online, gradient-free stochastic approximation algorithms. For each prototype ii, running estimates of cluster responsibility ρi\rho_i and assigned data-weighted sum σi\sigma_i are tracked: ρi(n+1)=ρi(n)+αn[wi(n)ρi(n)] σi(n+1)=σi(n)+αn[xnwi(n)σi(n)] μi(n+1)=σi(n+1)ρi(n+1)\begin{aligned} \rho_i(n+1) &= \rho_i(n) + \alpha_n\left[ w_i(n) - \rho_i(n) \right] \ \sigma_i(n+1) &= \sigma_i(n) + \alpha_n\left[ x_n w_i(n) - \sigma_i(n) \right] \ \mu_i(n+1) &= \frac{\sigma_i(n+1)}{\rho_i(n+1)} \end{aligned} where weights wi(n)w_i(n) are derived from the Gibbs assignment, and αn\alpha_n is the step size (αn=,αn2<\sum\alpha_n = \infty,\, \sum\alpha_n^2 < \infty). For settings involving local parametric models within clusters, a two-timescale stochastic approximation is employed: cluster prototypes evolve on a slow timescale {αn}\{\alpha_n\}, and local model parameters are updated on a faster timescale {βn}\{\beta_n\}, with αn/βn0\alpha_n/\beta_n \to 0 (Mavridis et al., 2022).

Convergence of these updates to a local minimizer of FTF_T is guaranteed under standard regularity and step-size conditions (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022).

4. Algorithmic Workflow and Hyper-Parameter Control

The progressive k-annealing workflow can be summarized as follows (Mavridis et al., 2021, Mavridis et al., 2022):

Initialization:

  • Select Bregman divergence, temperature schedule {Tn}\{T_n\} (typically Tn+1=γTnT_{n+1} = \gamma T_n with γ(0,1)\gamma\in(0,1)), perturbation δ\delta, tolerances εc\varepsilon_c, ϵn\epsilon_n, ϵr\epsilon_r, and maximum kmaxk_{\max}.
  • Start with k=1k=1 prototype.

Annealing Loop:

  • For each temperature TnT_n:
    • For each prototype, attempt a virtual split by ±δ\pm\delta.
    • Iterate stochastic approximation updates until convergence.
    • Prune coalesced or idle prototypes (small ρi\rho_i).
    • Increment kk when a genuine bifurcation is detected.
    • Decrease TT for the next stage.

Termination:

  • Optionally fine-tune assignments with hard-clustering at T0T\approx0.

Hyper-parameter roles:

  • TmaxT_{\max} determines initial smoothness (single cluster stability).
  • γ\gamma sets the annealing rate (proximal steps vs computational effort).
  • δ\delta and merge thresholds (ϵn\epsilon_n, ϵr\epsilon_r) regularize the minimal cluster "resolution" and suppress redundant splits.

The table below summarizes key functional aspects:

Element Description Typical Range
TmaxT_{\max} Initial temperature (smoothness) Tmaxmaxx,μd(x,μ)T_{\max} \gg \max_{x,\mu}d(x,\mu)
γ\gamma Decay factor for temperature $0.7$--$0.9$
δ\delta Split perturbation size εc\sim\varepsilon_c
εc\varepsilon_c Convergence tolerance for updates Small (1\ll1)
ϵn\epsilon_n Prototype merge threshold Moderate (\sim data scale)
ϵr\epsilon_r Pruning threshold for idle prototypes Small

5. Application Domains and Hierarchical Extensions

Progressive k-Annealing provides a general-purpose, online, robust clustering and classification methodology. It extends seamlessly to hierarchical and multi-resolution settings as realized in Multi-Resolution Online Deterministic Annealing (MRODA), whereby codebooks are organized as a tree structure, and ODA is invoked recursively within each node. This hierarchical organization localizes search, preserves computational tractability (O(k2L)O(k^2\,L) for depth LL), and naturally exploits data locality and variable-rate partitioning akin to deep architectures. Within each resolution, bifurcations drive local complexity growth, and the process yields adaptive, interpretable variable-depth partitioning (Mavridis et al., 2022).

In reinforcement learning, the two-timescale stochastic approximation of k-annealing integrates with Q-learning via joint updates: fast-timescale temporal difference (TD) learning for Q-values and slower prototype updates, yielding an adaptive state-action aggregation scheme (Mavridis et al., 2022).

6. Convergence Guarantees and Empirical Performance

Convergence of Progressive k-Annealing is established via the ODE method for stochastic approximation. Provided the step-sizes satisfy αn=,αn2<\sum\alpha_n=\infty,\, \sum\alpha_n^2<\infty and standard martingale-difference conditions, the iterates converge almost surely to locally stable equilibria of the corresponding ODE in the parameter space. In the two-timescale case, joint convergence of prototype and local-model parameters is guaranteed (Mavridis et al., 2021, Mavridis et al., 2022, Mavridis et al., 2022).

Empirical evaluations show that Progressive k-Annealing:

  • Automatically discovers the effective number of clusters kk, with graceful complexity-performace trade-offs as measured by distortion or classification error.
  • Matches or surpasses the online convergence rate and accuracy of batch deterministic annealing, k-means, linear SVMs, and approaches shallow neural network or random forest accuracy, often with greater interpretability (Mavridis et al., 2021).
  • Is robust to initialization and avoids poor local minima due to annealing from high-TT solutions.
  • Enables online “dial-in” of desired model complexity by stopping annealing early, providing flexible computational and representational control.

7. Relation to Classical Deterministic Annealing and Extensions

Progressive k-Annealing is a direct online, gradient-free stochastic approximation realization of classical deterministic annealing frameworks (e.g., Rose ’98), where the number kk emerges at eigenvalue-driven bifurcation points as the temperature is reduced (Mavridis et al., 2021, Mavridis et al., 2022). The methodology extends to hierarchical architectures, two-timescale estimation, variable-resolution clustering/classification, and function approximation in each partition. In contrast to offline deterministic annealing, progressive k-annealing operates in a single pass, with the capacity to grow kk as data and complexity demands, and provides online adaptivity and interpretability (Mavridis et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Progressive k-Annealing.