Papers
Topics
Authors
Recent
Search
2000 character limit reached

Massively Parallel Computation of Similarity Matrices from Piecewise Constant Invariants

Published 10 Apr 2024 in stat.CO, cs.MS, and math.AT | (2404.07183v1)

Abstract: We present a computational framework for piecewise constant functions (PCFs) and use this for several types of computations that are useful in statistics, e.g., averages, similarity matrices, and so on. We give a linear-time, allocation-free algorithm for working with pairs of PCFs at machine precision. From this, we derive algorithms for computing reductions of several PCFs. The algorithms have been implemented in a highly scalable fashion for parallel execution on CPU and, in some cases, (multi-)GPU, and are provided in a \proglang{Python} package. In addition, we provide support for multidimensional arrays of PCFs and vectorized operations on these. As a stress test, we have computed a distance matrix from 500,000 PCFs using 8 GPUs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “Global and Relative Topological Features from Homological Invariants of Subsampled Datasets.” In Topological, Algebraic and Geometric Learning Workshops 2023, pp. 302–312. PMLR.
  2. Bubenik P (2015). “Statistical topological data analysis using persistence landscapes.” Journal of Machine Learning Research, 16(1), 77–102.
  3. Bubenik P, Dłotko P (2017). “A persistence landscapes toolbox for topological statistics.” Journal of Symbolic Computation, 78, 91–114.
  4. Burton FW, Sleep MR (1981). “Executing functional programs on a virtual tree of processors.” In Proceedings of the 1981 conference on Functional programming languages and computer architecture, pp. 187–194.
  5. Carlsson G (2009). “Topology and data.” Bulletin of the American Mathematical Society, 46(2), 255–308.
  6. Carlsson G, Vejdemo-Johansson M (2021). Topological data analysis with applications. Cambridge University Press.
  7. Chachólski W, Riihimäki H (2020). “Metrics and stabilization in one parameter persistence.” SIAM Journal on Applied Algebra and Geometry, 4(1), 69–98.
  8. Cohen S (1999). Finding color and shape patterns in images. PhD thesis, Stanford University.
  9. Dłotko P, Gurnari D (2022). “Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems.” arXiv preprint arXiv:2212.01666.
  10. “Combinatorial Topological Models for Phylogenetic Reconstruction Networks and the Mergegram Invariant.” arXiv preprint arXiv:2305.04860.
  11. “Topological persistence and simplification.” Discrete & Computational Geometry, 28, 511–533.
  12. Frosini P (1990). “A distance for similarity classes of submanifolds of a Euclidean space.” Bulletin of the Australian Mathematical Society, 42(3), 407–415. 10.1017/S0004972700028574.
  13. Gäfvert O, Chachólski W (2017). “Stable invariants for multiparameter persistence.” arXiv preprint arXiv:1703.03632.
  14. Ghrist R (2008). “Barcodes: the persistent topology of data.” Bulletin of the American Mathematical Society, 45(1), 61–75.
  15. “Clique topology reveals intrinsic geometric structure in neural correlations.” Proceedings of the National Academy of Sciences, 112(44), 13455–13460.
  16. “Taskflow: A lightweight parallel and heterogeneous task graph computing system.” IEEE Transactions on Parallel and Distributed Systems, 33(6), 1303–1320.
  17. Hunter JD (2007). “Matplotlib: A 2D graphics environment.” Computing in Science & Engineering, 9(3), 90–95. 10.1109/MCSE.2007.55.
  18. Programming massively parallel processors : A hands-on approach. Fourth edition. edition. Morgan Kaufmann, Cambridge, MA.
  19. “pybind11 – Seamless operability between C++11 and Python.” https://github.com/pybind/pybind11.
  20. “Single-cell RNA sequencing technologies and applications: A brief overview.” Clinical and Translational Medicine, 12(3), e694.
  21. “A reproducing kernel Hilbert space framework for pairwise time series distances.” In Proceedings of the 25th international conference on Machine learning, pp. 624–631.
  22. xtensor: Multi-dimensional arrays with broadcasting and lazy computing. URL https://github.com/xtensor-stack/xtensor.
  23. Meyer D, Buchta C (2022). proxy: Distance and Similarity Measures. R package version 0.4-27, URL https://CRAN.R-project.org/package=proxy.
  24. “Distance Measures for Time Series in R: The TSdist Package.” R journal, 8(2), 451–459. URL https://journal.r-project.org/archive/2016/RJ-2016-058/index.html.
  25. “Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for?” Queue, 6(2), 40–53.
  26. NVIDIA (2024). “CUDA C++ Programming Guide Release 12.3.” URL https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
  27. “Cliques of neurons bound into cavities provide a missing link between structure and function.” Frontiers in computational neuroscience, 11, 48.
  28. Robins V (1999). “Towards computing homology from finite approximations.” In Topology proceedings, volume 24, pp. 503–532.
  29. “Multidimensional persistence and noise.” Foundations of Computational Mathematics, 17, 1367–1406.
  30. Sihn D, Kim SP (2019). “A spike train distance robust to firing rate changes based on the earth mover’s distance.” Frontiers in Computational Neuroscience, 13. 10.3389/fncom.2019.00082.
  31. “giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration.” Journal of Machine Learning Research, 22(39), 1–6. URL http://jmlr.org/papers/v22/20-325.html.
  32. Umeda Y (2017). “Time series classification via topological data analysis.” Information and Media Technologies, 12, 228–239.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.