High-Dimensional Sparse Clustering via Iterative Semidefinite Programming Relaxed K-Means
Abstract: We propose an iterative algorithm for clustering high-dimensional data, where the true signal lies in a much lower-dimensional space. Our method alternates between feature selection and clustering, without requiring precise estimation of sparse model parameters. Feature selection is performed by thresholding a rough estimate of the discriminative direction, while clustering is carried out via a semidefinite programming (SDP) relaxation of K-means. In the isotropic case, the algorithm is motivated by the minimax separation bound for exact recovery of cluster labels using varying sparse subsets of features. This bound highlights the critical role of variable selection in achieving exact recovery. We further extend the algorithm to settings with unknown sparse precision matrices, avoiding full model parameter estimation by computing only the minimally required quantities. Across a range of simulation settings, we find that the proposed iterative approach outperforms several state-of-the-art methods, especially in higher dimensions.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.