Papers
Topics
Authors
Recent
Search
2000 character limit reached

Constrained Hierarchical Clustering via Graph Coarsening and Optimal Cuts

Published 7 Dec 2023 in cs.LG and math.OC | (2312.04209v1)

Abstract: Motivated by extracting and summarizing relevant information in short sentence settings, such as satisfaction questionnaires, hotel reviews, and X/Twitter, we study the problem of clustering words in a hierarchical fashion. In particular, we focus on the problem of clustering with horizontal and vertical structural constraints. Horizontal constraints are typically cannot-link and must-link among words, while vertical constraints are precedence constraints among cluster levels. We overcome state-of-the-art bottlenecks by formulating the problem in two steps: first, as a soft-constrained regularized least-squares which guides the result of a sequential graph coarsening algorithm towards the horizontal feasible set. Then, flat clusters are extracted from the resulting hierarchical tree by computing optimal cut heights based on the available constraints. We show that the resulting approach compares very well with respect to existing algorithms and is computationally light.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” Proceedings of the National Academy of Sciences, vol. 95, no. 25, 1998.
  2. Cambridge university press, 2020.
  3. I. Diez, P. Bonifazi, I. Escudero, B. Mateos, M. A. Muñoz, S. Stramaglia, and J. M. Cortes, “A novel brain partition highlights the modular skeleton shared by structure and function,” Scientific reports, vol. 5, no. 1, p. 10532, 2015.
  4. M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” KDD workshop on text mining, vol. 400, pp. 525–526, 2000.
  5. M. Tumminello, F. Lillo, and R. N. Mantegna, “Correlation, hierarchies, and networks in financial markets,” Journal of Economic Behavior & Organization, vol. 75, pp. 40–58, jul 2010.
  6. S. Basu, A. Banerjee, and R. J. Mooney, “Active semi-supervision for pairwise constrained clustering,” in Proceedings of the 2004 SIAM international conference on data mining, pp. 333–344, SIAM, 2004.
  7. I. Davidson and S. Ravi, “Agglomerative hierarchical clustering with constraints: Theoretical and empirical results,” in Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005. Proceedings 9, pp. 59–70, Springer, 2005.
  8. K. Bade and A. Nurnberger, “Personalized hierarchical clustering,” in 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI’06), pp. 181–187, IEEE, 2006.
  9. J. Kawale and D. Boley, “Constrained spectral clustering using l1 regularization,” in Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 103–111, SIAM, 2013.
  10. E. Bair, “Semi-supervised clustering methods,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 5, no. 5, pp. 349–361, 2013.
  11. R. Florence, B. Nogueira, and R. Marcacini, “Constrained hierarchical clustering for news events,” in Proceedings of the 21st International Database Engineering & Applications Symposium, pp. 49–56, 2017.
  12. G. Carlsson, F. Mémoli, A. Ribeiro, and S. Segarra, “Hierarchical clustering of asymmetric networks,” Advances in Data Analysis and Classification, vol. 12, pp. 65–105, 2018.
  13. V. Chatziafratis, R. Niazadeh, and M. Charikar, “Hierarchical clustering with structural constraints,” in International conference on machine learning, pp. 774–783, PMLR, 2018.
  14. W. Huang and A. Ribeiro, “Hierarchical clustering given confidence intervals of metric distances,” IEEE Transactions on Signal Processing, vol. 66, no. 10, pp. 2600–2615, 2018.
  15. C.-L. Liu, W.-H. Hsaio, T.-H. Chang, and H.-H. Li, “Clustering data with partial background information,” International Journal of Machine Learning and Cybernetics, vol. 10, pp. 1123–1138, 2019.
  16. D. Bakkelund, “Order preserving hierarchical agglomerative clustering,” Machine Learning, vol. 111, no. 5, pp. 1851–1901, 2022.
  17. J. Cai, J. Hao, H. Yang, X. Zhao, and Y. Yang, “A review on semi-supervised clustering,” Information Sciences, 2023.
  18. K. L. Wagstaff and C. Cardie, “Clustering with instance-level constraints,” in AAAI/IAAI, 2000.
  19. T. Yang, N. Pasquier, and F. Precioso, “Semi-supervised consensus clustering based on closed patterns,” Knowledge-Based Systems, vol. 235, p. 107599, 2022.
  20. L. Zheng and T. Li, “Semi-supervised hierarchical clustering,” in IEEE 11th International Conference on Data Mining, pp. 982–991, 2011.
  21. W. Chen and G. Feng, “Spectral clustering: A semi-supervised approach,” Neurocomputing, vol. 77, no. 1, pp. 229–242, 2012.
  22. T. Semertzidis, D. Rafailidis, M. Strintzis, and P. Daras, “Large-scale spectral clustering based on pairwise constraints,” Information Processing & Management, vol. 51, no. 5, pp. 616–624, 2015.
  23. S. Liu, C. Ding, F. Jiang, Y. Wang, and B. Yin, “Auto-weighted multi-view learning for semi-supervised graph clustering,” Neurocomputing, vol. 362, pp. 19–32, 2019.
  24. J. Sander, X. Qin, Z. Lu, N. Niu, and A. Kovarsky, “Automatic extraction of clusters from hierarchical clustering representations,” in Advances in Knowledge Discovery and Data Mining, pp. 75–87, Springer, 2003.
  25. A. Loukas, “Graph reduction with spectral and cut guarantees.,” J. Mach. Learn. Res., vol. 20, no. 116, pp. 1–42, 2019.
  26. D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” arXiv preprint arXiv:1109.2378, 2011.
  27. A. Modi, “Hotel reviews data science.” https://github.com/abhikasd6523/HotelReview_DataScience/blob/master/Hotel_Reviews.csv, 2017.
  28. S. Dasgupta, “A cost function for similarity-based hierarchical clustering,” in Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 118–127, 2016.
  29. F. Murtagh, Multidimensional clustering algorithms. Physica-Verlag, 1985.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.