Massively Parallel Computation of Similarity Matrices from Piecewise Constant Invariants
Abstract: We present a computational framework for piecewise constant functions (PCFs) and use this for several types of computations that are useful in statistics, e.g., averages, similarity matrices, and so on. We give a linear-time, allocation-free algorithm for working with pairs of PCFs at machine precision. From this, we derive algorithms for computing reductions of several PCFs. The algorithms have been implemented in a highly scalable fashion for parallel execution on CPU and, in some cases, (multi-)GPU, and are provided in a \proglang{Python} package. In addition, we provide support for multidimensional arrays of PCFs and vectorized operations on these. As a stress test, we have computed a distance matrix from 500,000 PCFs using 8 GPUs.
- “Global and Relative Topological Features from Homological Invariants of Subsampled Datasets.” In Topological, Algebraic and Geometric Learning Workshops 2023, pp. 302–312. PMLR.
- Bubenik P (2015). “Statistical topological data analysis using persistence landscapes.” Journal of Machine Learning Research, 16(1), 77–102.
- Bubenik P, Dłotko P (2017). “A persistence landscapes toolbox for topological statistics.” Journal of Symbolic Computation, 78, 91–114.
- Burton FW, Sleep MR (1981). “Executing functional programs on a virtual tree of processors.” In Proceedings of the 1981 conference on Functional programming languages and computer architecture, pp. 187–194.
- Carlsson G (2009). “Topology and data.” Bulletin of the American Mathematical Society, 46(2), 255–308.
- Carlsson G, Vejdemo-Johansson M (2021). Topological data analysis with applications. Cambridge University Press.
- Chachólski W, Riihimäki H (2020). “Metrics and stabilization in one parameter persistence.” SIAM Journal on Applied Algebra and Geometry, 4(1), 69–98.
- Cohen S (1999). Finding color and shape patterns in images. PhD thesis, Stanford University.
- Dłotko P, Gurnari D (2022). “Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems.” arXiv preprint arXiv:2212.01666.
- “Combinatorial Topological Models for Phylogenetic Reconstruction Networks and the Mergegram Invariant.” arXiv preprint arXiv:2305.04860.
- “Topological persistence and simplification.” Discrete & Computational Geometry, 28, 511–533.
- Frosini P (1990). “A distance for similarity classes of submanifolds of a Euclidean space.” Bulletin of the Australian Mathematical Society, 42(3), 407–415. 10.1017/S0004972700028574.
- Gäfvert O, Chachólski W (2017). “Stable invariants for multiparameter persistence.” arXiv preprint arXiv:1703.03632.
- Ghrist R (2008). “Barcodes: the persistent topology of data.” Bulletin of the American Mathematical Society, 45(1), 61–75.
- “Clique topology reveals intrinsic geometric structure in neural correlations.” Proceedings of the National Academy of Sciences, 112(44), 13455–13460.
- “Taskflow: A lightweight parallel and heterogeneous task graph computing system.” IEEE Transactions on Parallel and Distributed Systems, 33(6), 1303–1320.
- Hunter JD (2007). “Matplotlib: A 2D graphics environment.” Computing in Science & Engineering, 9(3), 90–95. 10.1109/MCSE.2007.55.
- Programming massively parallel processors : A hands-on approach. Fourth edition. edition. Morgan Kaufmann, Cambridge, MA.
- “pybind11 – Seamless operability between C++11 and Python.” https://github.com/pybind/pybind11.
- “Single-cell RNA sequencing technologies and applications: A brief overview.” Clinical and Translational Medicine, 12(3), e694.
- “A reproducing kernel Hilbert space framework for pairwise time series distances.” In Proceedings of the 25th international conference on Machine learning, pp. 624–631.
- xtensor: Multi-dimensional arrays with broadcasting and lazy computing. URL https://github.com/xtensor-stack/xtensor.
- Meyer D, Buchta C (2022). proxy: Distance and Similarity Measures. R package version 0.4-27, URL https://CRAN.R-project.org/package=proxy.
- “Distance Measures for Time Series in R: The TSdist Package.” R journal, 8(2), 451–459. URL https://journal.r-project.org/archive/2016/RJ-2016-058/index.html.
- “Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for?” Queue, 6(2), 40–53.
- NVIDIA (2024). “CUDA C++ Programming Guide Release 12.3.” URL https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
- “Cliques of neurons bound into cavities provide a missing link between structure and function.” Frontiers in computational neuroscience, 11, 48.
- Robins V (1999). “Towards computing homology from finite approximations.” In Topology proceedings, volume 24, pp. 503–532.
- “Multidimensional persistence and noise.” Foundations of Computational Mathematics, 17, 1367–1406.
- Sihn D, Kim SP (2019). “A spike train distance robust to firing rate changes based on the earth mover’s distance.” Frontiers in Computational Neuroscience, 13. 10.3389/fncom.2019.00082.
- “giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration.” Journal of Machine Learning Research, 22(39), 1–6. URL http://jmlr.org/papers/v22/20-325.html.
- Umeda Y (2017). “Time series classification via topological data analysis.” Information and Media Technologies, 12, 228–239.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.