Papers
Topics
Authors
Recent
Search
2000 character limit reached

HPSCAN: Human Perception-Based Scattered Data Clustering

Published 27 Apr 2023 in cs.LG and cs.HC | (2304.14185v4)

Abstract: Cluster separation is a task typically tackled by widely used clustering techniques, such as k-means or DBSCAN. However, these algorithms are based on non-perceptual metrics, and our experiments demonstrate that their output does not reflect human cluster perception. To bridge the gap between human cluster perception and machine-computed clusters, we propose HPSCAN, a learning strategy that operates directly on scattered data. To learn perceptual cluster separation on such data, we crowdsourced the labeling of 7,320 bivariate (scatterplot) datasets to 384 human participants. We train our HPSCAN model on these human-annotated data. Instead of rendering these data as scatterplot images, we used their x and y point coordinates as input to a modified PointNet++ architecture, enabling direct inference on point clouds. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate the perceptual agreement of cluster separation for real-world data. We also report the training and evaluation protocol for HPSCAN and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. We explore predicting point-wise human agreement to detect ambiguities. Finally, we compare our approach to ten established clustering techniques and demonstrate that HPSCAN is capable of generalizing to unseen and out-of-scope data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Rasika Amarasiri, Damminda Alahakoon and Kate A Smith “HDGSOM: a modified growing self-organizing map for high dimensional data clustering” In Fourth International Conference on Hybrid Intelligent Systems (HIS’04), 2004, pp. 216–221 IEEE
  2. “Clustme: A visual quality measure for ranking monochrome scatterplots based on cluster patterns” In Computer Graphics Forum 38.3, 2019, pp. 225–236 Wiley Online Library
  3. “OPTICS: Ordering points to identify the clustering structure” In ACM Sigmod record 28.2 ACM New York, NY, USA, 1999, pp. 49–60
  4. Vincent Arel-Bundock “Rdatasets: A collection of datasets originally distributed in various R packages” R package version 1.0.0, 2023 URL: https://vincentarelbundock.github.io/Rdatasets
  5. “Toward perception-based evaluation of clustering techniques for visual analytics” In 2019 IEEE Visualization Conference (VIS), 2019, pp. 141–145 IEEE
  6. “ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings” In arXiv preprint arXiv:2106.00599, 2021 URL: https://api.semanticscholar.org/CorpusID:235266213
  7. “Inference in model-based cluster analysis” In statistics and Computing 7 Springer, 1997, pp. 1–10
  8. James C Bezdek, Robert Ehrlich and William Full “FCM: The fuzzy c-means clustering algorithm” In Computers & geosciences 10.2-3 Elsevier, 1984, pp. 191–203
  9. Michael Buhrmester, Tracy Kwang and Samuel D Gosling “Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?” In Perspectives on psychological science 6.1 Sage Publications Sage CA: Los Angeles, CA, 2011, pp. 3–5
  10. “A simple framework for contrastive learning of visual representations” In International conference on machine learning, 2020, pp. 1597–1607 PMLR
  11. “Mean shift: A robust approach toward feature space analysis” In IEEE Transactions on pattern analysis and machine intelligence 24.5 IEEE, 2002, pp. 603–619
  12. “Unsupervised learning of visual features by contrasting cluster assignments” In Advances in neural information processing systems 33, 2020, pp. 9912–9924
  13. Zhipeng Ding, Xu Han and Marc Niethammer “Votenet: A deep learning label fusion method for multi-atlas segmentation” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, 2019, pp. 202–210 Springer
  14. Inderjit S Dhillon, Subramanyam Mallela and Dharmendra S Modha “Information-theoretic co-clustering” In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 89–98
  15. K-L Du “Clustering: A neural network approach” In Neural networks 23.1 Elsevier, 2010, pp. 89–107
  16. “Role of human perception in cluster-based visual analysis of multidimensional data projections” In 2014 International Conference on Information Visualization Theory and Applications (IVAPP), 2014, pp. 276–283 IEEE
  17. “A density-based algorithm for discovering clusters in large spatial databases with noise.” In kdd 96.34, 1996, pp. 226–231
  18. “Sparse subspace clustering: Algorithm, theory, and applications” In IEEE transactions on pattern analysis and machine intelligence 35.11 IEEE, 2013, pp. 2765–2781
  19. Brendan J Frey and Delbert Dueck “Clustering by passing messages between data points” In science 315.5814 American Association for the Advancement of Science, 2007, pp. 972–976
  20. “Fast and accurate cnn-based brushing in scatterplots” In Computer Graphics Forum 37.3, 2018, pp. 111–120 Wiley Online Library
  21. “On sketch-based selections from scatterplots using KDE, compared to Mahalanobis and CNN brushing” In IEEE Computer Graphics and Applications 41.5 IEEE, 2021, pp. 67–78
  22. “A novel self-organizing map (SOM) neural network for discrete groups of data clustering” In Applied Soft Computing 11.4 Elsevier, 2011, pp. 3771–3778
  23. “Crowdsourcing graphical perception: using mechanical turk to assess visualization design” In Proceedings of the SIGCHI conference on human factors in computing systems, 2010, pp. 203–212 ACM
  24. “Neural network-based clustering using pairwise constraints” In arXiv preprint arXiv:1511.06321, 2015
  25. Yen-Chang Hsu, Zhaoyang Lv and Zsolt Kira “Learning to cluster in order to transfer across domains and tasks” In arXiv preprint arXiv:1711.10125, 2017
  26. “Multi-class classification without multi-class labels” In arXiv preprint arXiv:1901.00544, 2019
  27. “Monte carlo convolution for learning on non-uniformly sampled point clouds” In ACM Transactions on Graphics (TOG) 37.6 ACM New York, NY, USA, 2018, pp. 1–12
  28. “Learning Human Viewpoint Preferences from Sparsely Annotated Models” In Computer Graphics Forum 41.6, 2022, pp. 453–466 Wiley Online Library
  29. John A Hartigan and Manchek A Wong “Algorithm AS 136: A k-means clustering algorithm” In Journal of the royal statistical society. series c (applied statistics) 28.1 JSTOR, 1979, pp. 100–108
  30. Ian T Jolliffe “Principal component analysis for special types of data” Springer, 2002
  31. “End-to-end 3D point cloud instance segmentation without detection” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12796–12805
  32. Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
  33. Teuvo Kohonen “Self-organizing maps: ophmization approaches” In Artificial neural networks Elsevier, 1991, pp. 981–990
  34. “Gradient-based learning applied to document recognition” In Proceedings of the IEEE 86.11 Ieee, 1998, pp. 2278–2324
  35. Thomas M Martinetz, Stanislav G Berkovich and Klaus J Schulten “’Neural-gas’ network for vector quantization and its application to time-series prediction” In IEEE transactions on neural networks 4.4 IEEE, 1993, pp. 558–569
  36. “Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm” In arXiv preprint arXiv:1111.6285, 2011
  37. “Scatternet: A deep subjective similarity model for visual analysis of scatterplots” In IEEE transactions on visualization and computer graphics 26.3 IEEE, 2018, pp. 1562–1576
  38. “Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search” In Advances in neural information processing systems 30, 2017
  39. Andrew Ng, Michael Jordan and Yair Weiss “On spectral clustering: Analysis and an algorithm” In Advances in neural information processing systems 14, 2001
  40. “Towards understanding human similarity perception in the analysis of large sets of scatter plots” In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 3659–3669
  41. “Scikit-learn: Machine learning in Python” In the Journal of machine Learning research 12 JMLR. org, 2011, pp. 2825–2830 URL: https://scikit-learn.org/stable/modules/classes.html%5C#module-sklearn.manifold
  42. “Automatic Scatterplot Design Optimization for Clustering Identification” In IEEE Transactions on Visualization and Computer Graphics IEEE, 2022
  43. Ghulam Jilani Quadri and Paul Rosen “Modeling the influence of visual density on cluster perception in scatterplots using topology” In IEEE Transactions on Visualization and Computer Graphics 27.2 IEEE, 2020, pp. 1829–1839
  44. Ghulam Jilani Quadri and Paul Rosen “A survey of perception-based visualization studies by task” In IEEE transactions on visualization and computer graphics IEEE, 2021
  45. “Pointnet: Deep learning on point sets for 3d classification and segmentation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660
  46. “Pointnet++: Deep hierarchical feature learning on point sets in a metric space” In Advances in neural information processing systems 30, 2017
  47. Peter J Rousseeuw “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis” In Journal of computational and applied mathematics 20 Elsevier, 1987, pp. 53–65
  48. “Data-driven evaluation of visual quality measures” In Computer Graphics Forum 34.3, 2015, pp. 201–210 Wiley Online Library
  49. “Clustering with Deep Neural Networks–An Overview of Recent Methods” In Network 39, 2020
  50. Michael Sedlmair, Tamara Munzner and Melanie Tory “Empirical guidance on scatterplot and dimension reduction technique choices” In IEEE transactions on visualization and computer graphics 19.12 IEEE, 2013, pp. 2634–2643
  51. “Selecting good views of high-dimensional data using class consistency” In Computer Graphics Forum 28.3, 2009, pp. 831–838 Wiley Online Library
  52. “Generalized learning vector quantization” In Advances in neural information processing systems 8, 1995
  53. “Line Weaver: Importance-Driven Order Enhanced Rendering of Dense Line Charts” In Computer Graphics Forum 40.3, 2021, pp. 399–410 Wiley Online Library
  54. “Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data” In Proceedings of the international conference on advanced visual interfaces, 2010, pp. 49–56
  55. “Agreement between an isolated rater and a group of raters” In Statistica Neerlandica 63.1 Wiley Online Library, 2009, pp. 82–100
  56. Laurens Van der Maaten and Geoffrey Hinton “Visualizing data using t-SNE.” In Journal of machine learning research 9.11, 2008
  57. Christian Onzenoodt, Pere-Pau Vázquez and Timo Ropinski “Out of the Plane: Flower Vs. Star Glyphs to Support High-Dimensional Exploration in Two-Dimensional Embeddings” In IEEE transactions on visualization and computer graphics IEEE, 2022
  58. “Interactive visual cluster analysis by contrastive dimensionality reduction” In IEEE Transactions on Visualization and Computer Graphics IEEE, 2022
  59. “Visual clustering factors in scatterplots” In IEEE Computer Graphics and Applications 41.5 IEEE, 2021, pp. 79–89
  60. Tian Zhang, Raghu Ramakrishnan and Miron Livny “BIRCH: A new data clustering algorithm and its applications” In Data mining and knowledge discovery 1.2 Springer, 1997, pp. 141–182
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.