Papers
Topics
Authors
Recent
Search
2000 character limit reached

High-dimensional Clustering onto Hamiltonian Cycle

Published 27 Apr 2023 in cs.AI | (2304.14531v2)

Abstract: Clustering aims to group unlabelled samples based on their similarities. It has become a significant tool for the analysis of high-dimensional data. However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems. First, HCHC combines global structure with local structure in one objective function for deep clustering, improving the labels as relative probabilities, to mine the similarities between different clusters while keeping the local structure in each cluster. Then, the anchors of different clusters are sorted on the optimal Hamiltonian cycle generated by the cluster similarities and mapped on the circumference of a circle. Finally, a sample with a higher probability of a cluster will be mapped closer to the corresponding anchor. In this way, our framework allows us to appreciate three aspects visually and simultaneously - clusters (formed by samples with high probabilities), cluster similarities (represented as circular distances), and outliers (recognized as dots far away from all clusters). The experiments illustrate the superiority of HCHC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. A comparison of automatic cell identification methods for single-cell rna sequencing data. Genome biology, 20:1–19, 2019.
  2. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4):433–459, 2010.
  3. Clustering with deep learning: Taxonomy and new methods. arXiv preprint arXiv:1801.07648, 2018.
  4. Towards enhancing radviz analysis and interpretation. In 2019 IEEE Visualization Conference (VIS), pp.  226–230. IEEE, 2019.
  5. ASUNCION, A. Uci machine learning repository. http://www. ics. uci. edu/~ mlearn/MLRepository. html, 2007.
  6. Balasubramanian, M. The isomap algorithm and topological stability. Science, 295(7a), 2002.
  7. Spectral clustering with graph neural networks for graph pooling. In International Conference on Machine Learning, pp. 874–883. PMLR, 2020.
  8. An iterative clustering algorithm for the contextual stochastic block model with optimality guarantees. In International Conference on Machine Learning, pp. 2257–2291. PMLR, 2022.
  9. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
  10. Deep adaptive image clustering. In Proceedings of the IEEE International Conference on Computer Vision, pp.  5879–5887, 2017.
  11. Chen, G. Deep learning with nonparametric clustering. arXiv preprint arXiv:1501.03084, 2015.
  12. Radviz deluxe: An attribute-aware display for multivariate data. Processes, 5(4):75, 2017.
  13. Colormap nd: A data-driven approach and tool for mapping multivariate data to color. IEEE Transactions on Visualization and Computer Graphics, 25(2):1361–1377, 2018.
  14. Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  15. Dirac, G. A. Some theorems on abstract graphs. Proceedings of the London Mathematical Society, 3(1):69–81, 1952.
  16. Donoho, D. L. et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture, 1(2000):32, 2000.
  17. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pp.  226–231, 1996.
  18. Fortunato, S. Community detection in graphs. Physics Reports, 486(3-5):75–174, 2010.
  19. High-dimensional visualizations. In Proceedings of the Visual Data Mining Workshop, KDD, volume 2, pp.  120, 2001.
  20. Information visualization in data mining and knowledge discovery. Morgan Kaufmann, 2002.
  21. Cure: An efficient clustering algorithm for large databases. ACM Sigmod Record, 27(2):73–84, 1998.
  22. Improved deep embedded clustering with local structure preservation. In IJCAI, pp.  1753–1759, 2017.
  23. Deep embedded clustering with data augmentation. In Asian Conference on Machine Learning, pp.  550–565. PMLR, 2018.
  24. Hartigan, J. A. Clustering algorithms. John Wiley & Sons, Inc., 1975.
  25. Stochastic neighbor embedding. Advances in Neural Information Processing Systems, 15, 2002.
  26. Hoogeveen, J. Analysis of christofides’ heuristic: Some paths are more difficult than cycles. Operations Research Letters, 10(5):291–295, 1991.
  27. An adaptive kernelized rank-order distance for clustering non-spherical data with high noise. Int. J. Mach. Learn. Cybern., 11(8):1735–1747, 2020.
  28. Consolidation of structure of high noise data by a new noise index and reinforcement learning. Information Sciences, 614:206–222, 2022.
  29. Hull, J. J. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554, 1994.
  30. Data clustering: a review. ACM Computing Surveys (CSUR), 31(3):264–323, 1999.
  31. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9865–9874, 2019.
  32. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75, 1999.
  33. Kruskal, J. B. Multidimensional scaling. Sage, 1978.
  34. Kuhn, H. W. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2):83–97, 1955.
  35. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5(Apr):361–397, 2004.
  36. Consistent representation learning for high dimensional data analysis. arXiv preprint arXiv:2012.00481, 2020.
  37. Contrastive clustering. In 2021 AAAI Conference on Artificial Intelligence (AAAI), 2021.
  38. Lin, C.-J. Large-scale kernel machines. MIT Press, 2007.
  39. A tighter analysis of spectral clustering, and beyond. In International Conference on Machine Learning, pp. 14717–14742. PMLR, 2022a.
  40. A tighter analysis of spectral clustering, and beyond. In Proceedings of the 39th International Conference on Machine Learning, pp.  14717–14742. PMLR, 2022b.
  41. MacQueen, J. et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pp.  281–297. Oakland, CA, USA, 1967.
  42. Applications of machine learning and high-dimensional visualization in cancer detection, diagnosis, and management. Annals of the New York Academy of Sciences, 1020(1):239–262, 2004.
  43. N2d:(not too) deep clustering via clustering the local manifold of an autoencoded embedding. In 2020 25th International Conference on Pattern Recognition (ICPR), pp.  5145–5152. IEEE, 2021.
  44. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  45. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 14, 2001.
  46. Nielsen, F. Hierarchical clustering. In Introduction to HPC with MPI for Data Science, pp. 195–211. Springer, 2016.
  47. Spice: Semantic pseudo-labeling for image clustering. IEEE Transactions on Image Processing, 31:7264–7278, 2022.
  48. Improving unsupervised image clustering with robust learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12278–12287, 2021.
  49. A study of hierarchical clustering algorithm. ter S & on Te SIT, 2:113, 2013.
  50. Rasmussen, C. The infinite gaussian mixture model. Advances in Neural Information Processing Systems, 12, 1999.
  51. Lsd-c: Linearly separable deep clusters. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1038–1046, 2021.
  52. Spectralnet: Spectral clustering using deep neural networks. In 6th International Conference on Learning Representations, ICLR 2018. The Weizmann Institute of Science, 2018.
  53. Vectorized radviz and its application to multiple cluster datasets. IEEE transactions on Visualization and Computer Graphics, 14(6):1444–1427, 2008.
  54. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
  55. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, pp.  127–140, 2015.
  56. Learning deep representations for graph clustering. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp.  1293–1299, 2014.
  57. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.
  58. Scan: Learning to classify images without labels. In European Conference on Computer Vision, pp.  268–285. Springer, 2020.
  59. Verleysen, M. et al. Learning high-dimensional data. Nato Science Series Sub Series III Computer And Systems Sciences, 186:141–162, 2003.
  60. Von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
  61. Progressive self-supervised clustering with novel category discovery. IEEE Transactions on Cybernetics, 2021.
  62. Convergence and recovery guarantees of the k-subspaces method for subspace clustering. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, pp.  22884–22918. PMLR, 2022.
  63. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  64. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning, pp. 478–487. PMLR, 2016.
  65. Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  5147–5156, 2016.
  66. Deep spectral clustering using dual autoencoder network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4066–4075, 2019.
  67. Multitask spectral clustering by exploring intertask correlation. IEEE Transactions on Cybernetics, 45(5):1083–1094, 2014.
  68. Evnet: An explainable deep network for dimension reduction. IEEE Transactions on Visualization and Computer Graphics, 2022.
  69. Self-adapted multi-task clustering. In IJCAI, pp.  2357–2363, 2016.
  70. Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE transactions on visualization and computer graphics, 25(1):12–21, 2018.
  71. SpaceMAP: Visualizing high-dimensional data by space expansion. In Proceedings of the 39th International Conference on Machine Learning, pp.  27707–27723. PMLR, 2022.
Citations (7)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.