Papers
Topics
Authors
Recent
Search
2000 character limit reached

Structured Transforms Across Spaces with Cost-Regularized Optimal Transport

Published 9 Nov 2023 in cs.LG, math.OC, and stat.ML | (2311.05788v2)

Abstract: Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points. When these measures live in the same metric space, the ground cost often defaults to its distance. When instantiated across two different spaces, however, choosing that cost in the absence of aligned data is a conundrum. As a result, practitioners often resort to solving instead a quadratic Gromow-Wasserstein (GW) problem. We exploit in this work a parallel between GW and cost-regularized OT, the regularized minimization of a linear OT objective parameterized by a ground cost. We use this cost-regularized formulation to match measures across two different Euclidean spaces, where the cost is evaluated between transformed source points and target points. We show that several quadratic OT problems fall in this category, and consider enforcing structure in linear transform (e.g. sparsity), by introducing structure-inducing regularizers. We provide a proximal algorithm to extract such transforms from unaligned data, and demonstrate its applicability to single-cell spatial transcriptomics/multiomics matching tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Towards optimal transport with global invariances. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1870–1879. PMLR.
  2. Convex analysis and monotone operator theory in hilbert spaces. CMS Books in Mathematics.
  3. Bertsekas, D. P. (1997). Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334.
  4. Smooth and sparse optimal transport. In Storkey, A. and Perez-Cruz, F., editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 880–889. PMLR.
  5. Sliced and radon wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51:22–45.
  6. Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Communications on pure and applied mathematics, 44(4):375–417.
  7. Learning single-cell perturbation responses using neural optimal transport. Nature Methods, pages 1–10.
  8. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nature methods, 13(10):833–836.
  9. The proximity operator repository. user’s guide.
  10. Faster wasserstein distance estimation with the sinkhorn divergence. Advances in Neural Information Processing Systems, 33:2257–2269.
  11. The earth mover’s distance under transformation sets. In Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 2, pages 1076–1083. IEEE.
  12. Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30.
  13. Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26:2292–2300.
  14. Monge, Bregman and occam: Interpretable optimal transport in high-dimensions with feature-sparse maps. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 6671–6682. PMLR.
  15. De Philippis, G. (2013). Regularity of optimal transport maps and applications, volume 17. Springer Science & Business Media.
  16. Scot: single-cell multi-omics alignment with optimal transport. Journal of Computational Biology, 29(1):3–18.
  17. Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10648–10656.
  18. Dudley, R. M. (1966). Weak convergence of probabilities on nonseparable metric spaces and empirical measures on euclidean spaces. Illinois Journal of Mathematics, 10(1):109–126.
  19. On the existence of monge maps for the gromov-wasserstein distance. arXiv preprint arXiv:2210.11945.
  20. Sample complexity of sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1574–1583.
  21. Learning generative models with sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, pages 1608–1617.
  22. Unsupervised alignment of embeddings with wasserstein procrustes. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1880–1890. PMLR.
  23. Multi-subject meg/eeg source imaging with sparse multi-task regression. NeuroImage, 220:116847.
  24. Mapping cells through time and space with moscot. bioRxiv, pages 2023–05.
  25. Learning costs for structured monge displacements. arXiv preprint arXiv:2306.11895.
  26. Multimodal single cell data integration challenge: Results and lessons learned. In NeurIPS 2021 Competitions and Demonstrations Track, pages 162–176. PMLR.
  27. Entropic gromov-wasserstein between Gaussian distributions. In International Conference on Machine Learning, pages 12164–12203. PMLR.
  28. Tree-sliced variants of wasserstein distances. Advances in neural information processing systems, 32.
  29. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations.
  30. Sparsity-constrained optimal transport. In The Eleventh International Conference on Learning Representations.
  31. Mémoli, F. (2011). Gromov–wasserstein distances and the metric approach to object matching. Foundations of computational mathematics, 11(4):417–487.
  32. Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Mem. Math. Phys. Acad. Royale Sci., pages 666–704.
  33. Wasserstein training of restricted boltzmann machines. Advances in Neural Information Processing Systems, 29.
  34. Action matching: Learning stochastic dynamics from samples. In International conference on machine learning.
  35. Action matching: Learning stochastic dynamics from samples. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 25858–25889. PMLR.
  36. Neural gromov-wasserstein optimal transport. arXiv preprint arXiv:2303.05978.
  37. Estimation of wasserstein distances in the spiked transport model. Bernoulli, 28(4):2663–2688.
  38. Subspace robust wasserstein distances. In International conference on machine learning, pages 5072–5081. PMLR.
  39. Regularized optimal transport is ground cost adversarial. In International Conference on Machine Learning, pages 7532–7542. PMLR.
  40. Gromov-wasserstein averaging of kernel and distance matrices. In International conference on machine learning, pages 2664–2672. PMLR.
  41. Entropic estimation of optimal transport maps. arXiv preprint arXiv:2109.12004.
  42. Entropic gromov-wasserstein distances: Stability, algorithms, and distributional limits. arXiv preprint arXiv:2306.00182.
  43. Improving gans using optimal transport. In International Conference on Learning Representations.
  44. Gromov-wasserstein distances between Gaussian distributions. Journal of Applied Probability, 59(4).
  45. Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94.
  46. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pages 9344–9354. PMLR.
  47. Unbalanced low-rank optimal transport solvers. arXiv preprint arXiv:2305.19727.
  48. Linear-time gromov Wasserstein distances using low rank couplings and costs. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S., editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 19347–19365. PMLR.
  49. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943.
  50. Spatial atlas of the mouse central nervous system at molecular resolution. Nature, pages 1–10.
  51. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations.
  52. Sturm, K.-T. (2006). On the geometry of metric measure spaces. Acta Mathematica, 196(1):65–131.
  53. Trajectorynet: A dynamic optimal transport network for modeling cellular dynamics. In International conference on machine learning, pages 9526–9536. PMLR.
  54. Improving and generalizing flow-based generative models with minibatch optimal transport. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems.
  55. Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32:3732–3745.
  56. Vayer, T. (2020). A contribution to optimal transport on incomparable spaces. arXiv preprint arXiv:2011.04447.
  57. Fused gromov-wasserstein distance for structured objects. Algorithms, 13(9):212.
  58. Villani, C. (2003). Topics in optimal transportation. Graduate Studies in Mathematics.
  59. Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance. Bernoulli, 25(4A).
  60. Matcher: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome biology, 18(1):1–19.
  61. Wasserstein adversarial examples via projected sinkhorn iterations. In International Conference on Machine Learning, pages 6808–6817. PMLR.
  62. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1934–1945.
  63. Gromov-wasserstein distances: Entropic regularization, duality, and sample complexity. arXiv preprint arXiv:2212.12848.
Citations (4)

Summary

  • The paper demonstrates equivalence between quadratic OT formulations like Gromov-Wasserstein and cost-regularized OT to impose structured transforms across high-dimensional spaces.
  • The paper introduces Prox-ROT, a proximal algorithm leveraging sparsity and low-rank regularizers to efficiently handle structured linear transforms in complex datasets.
  • The method extends Monge map theory to linear, cost-regularized settings and shows strong numerical improvements in single-cell and spatial transcriptomics data integration.

Structured Transforms Across Spaces with Cost-Regularized Optimal Transport

The paper "Structured Transforms Across Spaces with Cost-Regularized Optimal Transport" addresses a significant challenge in the field of optimal transport (OT): matching probability distributions across heterogeneous high-dimensional spaces without a predefined ground cost function. The authors propose an innovative approach to leveraging regularized optimal transport to solve such problems effectively, including those encountered in fields like single-cell spatial transcriptomics and multi-omics integration.

Overview of Contributions

  1. Cost-Regularized OT Framework: The paper provides a theoretical foundation by demonstrating equivalence between quadratic OT formulations—like Gromov-Wasserstein (GW)—and cost-regularized OT problems. This equivalence allows imposing structure in transportation tasks across different Euclidean spaces.
  2. Structured Linear Transforms: The authors propose a proximal algorithm, Prox-ROT, that enforces structure on the linear transformations across spaces. This is achieved by utilizing structure-inducing regularizers, such as sparsity and low-rank constraints, which are particularly beneficial for high-dimensional data.
  3. Monge Map Existence and Extension: The work extends the framework of Monge maps, traditionally used in OT problems with same-space distributions, to linear cost-regularized problems across spaces. This contributes new theoretical insights into the existence and practical computation of Monge maps in these settings.
  4. Practical Applications and Numerical Results: The paper applies its theoretical insights to real-world datasets, showing improvements over existing methods in tasks like integrating multi-omics single-cell data and addressing spatial transcriptomics.

Strong Numerical Results and Implications

The paper reports enhanced performance in single-cell data integration tasks measured by Label Transfer Accuracy and demonstrates efficient handling of high-dimensional spatial transcriptomics data through adaptive feature selection and low-rank solution spaces. These results suggest the method's practicality and efficiency in processing large datasets common in modern biological research.

Implications for Future Work

This research opens pathways to further exploration of regularization techniques within the field of optimal transport, particularly for applications in bioinformatics. Future work may consider extending these techniques to neural network architectures or exploring different types of regularization approaches to tackle even more complex multimodal datasets.

In summary, the paper presents a methodologically sound approach to handling complex multivariate distribution matching problems by bridging gaps between linear and quadratic OT frameworks through cost-regularization. By offering theoretical results along with practical algorithmic solutions, this work significantly enriches the toolkit for researchers dealing with high-dimensional, heterogeneous data in fields like computational biology and beyond.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 221 likes about this paper.