Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recent Advances in Optimal Transport for Machine Learning

Published 28 Jun 2023 in cs.LG, math.PR, and stat.ML | (2306.16156v2)

Abstract: Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2023, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport and its extensions, such as partial, unbalanced, Gromov and Neural Optimal Transport, and its interplay with Machine Learning practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (155)
  1. G. Monge, “Mémoire sur la théorie des déblais et des remblais,” Histoire de l’Académie Royale des Sciences de Paris, 1781.
  2. L. Kantorovich, “On the transfer of masses (in russian),” in Doklady Akademii Nauk, vol. 37, no. 2, 1942, pp. 227–229.
  3. U. Frisch, S. Matarrese, R. Mohayaee, and A. Sobolevski, “A reconstruction of the initial conditions of the universe by optimal mass transportation,” Nature, vol. 417, no. 6886, pp. 260–262, 2002.
  4. Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover’s distance as a metric for image retrieval,” International journal of computer vision, vol. 40, no. 2, pp. 99–121, 2000.
  5. L. Ambrosio and N. Gigli, “A user’s guide to optimal transport,” in Modelling and optimisation of flows on networks.   Springer, 2013, pp. 1–155.
  6. S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde, “Optimal mass transport: Signal processing and machine-learning applications,” IEEE signal processing magazine, vol. 34, no. 4, pp. 43–59, 2017.
  7. J. Solomon, “Optimal transport on discrete domains,” AMS Short Course on Discrete Differential Geometry, 2018.
  8. B. Lévy and E. L. Schwindt, “Notions of optimal transport theory and how to implement them on a computer,” Computers & Graphics, vol. 72, pp. 135–148, 2018.
  9. G. Peyré, M. Cuturi et al., “Computational optimal transport: With applications to data science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
  10. R. Flamary, “Transport optimal pour l’apprentissage statistique,” Habilitation à diriger des recherches. Université Côte d’Azur, 2019.
  11. F. Santambrogio, “Optimal transport for applied mathematicians,” Birkäuser, NY, vol. 55, no. 58-63, p. 94, 2015.
  12. Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,” Communications on pure and applied mathematics, vol. 44, no. 4, pp. 375–417, 1991.
  13. J.-D. Benamou and Y. Brenier, “A computational fluid mechanics solution to the monge-kantorovich mass transfer problem,” Numerische Mathematik, vol. 84, no. 3, pp. 375–393, 2000.
  14. M. Agueh and G. Carlier, “Barycenters in the wasserstein space,” SIAM Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, 2011.
  15. A. Müller, “Integral probability metrics and their generating classes of functions,” Advances in Applied Probability, vol. 29, no. 2, pp. 429–443, 1997.
  16. I. Csiszár, “Information-type measures of difference of probability distributions and indirect observation,” studia scientiarum Mathematicarum Hungarica, vol. 2, pp. 229–318, 1967.
  17. B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. Lanckriet, “On the empirical estimation of integral probability metrics,” Electronic Journal of Statistics, vol. 6, pp. 1550–1599, 2012.
  18. A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola, “A kernel approach to comparing distributions,” in Proceedings of the National Conference on Artificial Intelligence, vol. 22, no. 2.   Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2007, p. 1637.
  19. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine learning, vol. 79, no. 1, pp. 151–175, 2010.
  20. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
  21. D. P. Kingma and M. Welling, “Autoencoding variational bayes,” in International Conference on Learning Representations, 2020.
  22. I. Kobyzev, S. Prince, and M. Brubaker, “Normalizing flows: An introduction and review of current methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  23. M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018.
  24. E. F. Montesuma, M. Mulas, F. Corona, and F. M. N. Mboula, “Cross-domain fault diagnosis through optimal transport for a cstr process,” IFAC-PapersOnLine, 2022.
  25. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009.
  26. R. Bellman, “On the theory of dynamic programming,” Proceedings of the National Academy of Sciences of the United States of America, vol. 38, no. 8, p. 716, 1952.
  27. C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279–292, 1992.
  28. R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
  29. M. Köppen, “The curse of dimensionality,” in 5th online world conference on soft computing in industrial applications (WSC5), vol. 1, 2000, pp. 4–8.
  30. R. M. Dudley, “The speed of mean glivenko-cantelli convergence,” The Annals of Mathematical Statistics, vol. 40, no. 1, pp. 40–50, 1969.
  31. M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” Advances in neural information processing systems, vol. 26, pp. 2292–2300, 2013.
  32. R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier et al., “Pot: Python optimal transport,” Journal of Machine Learning Research, vol. 22, no. 78, pp. 1–8, 2021.
  33. M. Cuturi, L. Meng-Papaxanthos, Y. Tian, C. Bunne, G. Davis, and O. Teboul, “Optimal transport tools (ott): A jax toolbox for all things wasserstein,” arXiv preprint arXiv:2201.12324, 2022.
  34. G. B. Dantzig, “Reminiscences about the origins of linear programming,” in Mathematical programming the state of the art.   Springer, 1983, pp. 78–86.
  35. A. Genevay, G. Peyré, and M. Cuturi, “Learning generative models with sinkhorn divergences,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2018, pp. 1608–1617.
  36. G. Luise, A. Rudi, M. Pontil, and C. Ciliberto, “Differential properties of sinkhorn approximation for learning with wasserstein distance,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  37. A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré, “Sample complexity of sinkhorn divergences,” in The 22nd international conference on artificial intelligence and statistics.   PMLR, 2019, pp. 1574–1583.
  38. J. Rabin, G. Peyré, J. Delon, and M. Bernot, “Wasserstein barycenter and its application to texture mixing,” in International Conference on Scale Space and Variational Methods in Computer Vision.   Springer, 2011, pp. 435–446.
  39. N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, “Sliced and radon wasserstein barycenters of measures,” Journal of Mathematical Imaging and Vision, vol. 51, no. 1, pp. 22–45, 2015.
  40. S. Kolouri, S. R. Park, and G. K. Rohde, “The radon cumulative distribution transform and its application to image classification,” IEEE transactions on image processing, vol. 25, no. 2, pp. 920–934, 2015.
  41. I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing, “Max-sliced wasserstein distance and its use for gans,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 648–10 656.
  42. S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde, “Generalized sliced wasserstein distances,” Advances in neural information processing systems, vol. 32, 2019.
  43. G. Beylkin, “The inversion problem and applications of the generalized radon transform,” Communications on pure and applied mathematics, vol. 37, no. 5, pp. 579–599, 1984.
  44. F.-P. Paty and M. Cuturi, “Subspace robust wasserstein distances,” in International conference on machine learning.   PMLR, 2019, pp. 5072–5081.
  45. T. Lin, C. Fan, N. Ho, M. Cuturi, and M. Jordan, “Projection robust wasserstein distance and riemannian optimization,” Advances in neural information processing systems, vol. 33, pp. 9383–9397, 2020.
  46. M. Huang, S. Ma, and L. Lai, “A riemannian block coordinate descent method for computing the projection robust wasserstein distance,” in International Conference on Machine Learning.   PMLR, 2021, pp. 4446–4455.
  47. S. Ferradans, N. Papadakis, G. Peyré, and J.-F. Aujol, “Regularized discrete optimal transport,” SIAM Journal on Imaging Sciences, vol. 7, no. 3, pp. 1853–1882, 2014.
  48. N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1853–1865, 2017.
  49. D. Alvarez-Melis, T. Jaakkola, and S. Jegelka, “Structured optimal transport,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2018, pp. 1771–1780.
  50. L. Lovász, “Mathematical programming–the state of the art,” 1983.
  51. A. Forrow, J.-C. Hütter, M. Nitzan, P. Rigollet, G. Schiebinger, and J. Weed, “Statistical optimal transport via factored couplings,” in The 22nd International Conference on Artificial Intelligence and Statistics.   PMLR, 2019, pp. 2454–2465.
  52. J. E. Cohen and U. G. Rothblum, “Nonnegative ranks, decompositions, and factorizations of nonnegative matrices,” Linear Algebra and its Applications, vol. 190, pp. 149–168, 1993.
  53. A. Makkuva, A. Taghvaei, S. Oh, and J. Lee, “Optimal transport mapping via input convex neural networks,” in International Conference on Machine Learning.   PMLR, 2020, pp. 6672–6681.
  54. B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in International Conference on Machine Learning.   PMLR, 2017, pp. 146–155.
  55. J. Fan, A. Taghvaei, and Y. Chen, “Scalable computations of wasserstein barycenter via input convex neural networks,” in International Conference on Machine Learning.   PMLR, 2021, pp. 1571–1581.
  56. G. Montavon, K.-R. Müller, and M. Cuturi, “Wasserstein training of restricted boltzmann machines.” in NIPS, 2016, pp. 3711–3719.
  57. M. Sommerfeld, J. Schrieber, Y. Zemel, and A. Munk, “Optimal transport: Fast probabilistic approximation with exact solvers.” J. Mach. Learn. Res., vol. 20, pp. 105–1, 2019.
  58. B. B. Damodaran, B. Kellenberger, R. Flamary, D. Tuia, and N. Courty, “Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 447–463.
  59. E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert, “On parameter estimation with the wasserstein distance,” Information and Inference: A Journal of the IMA, vol. 8, no. 4, pp. 657–676, 2019.
  60. K. Fatras, Y. Zine, R. Flamary, R. Gribonval, and N. Courty, “Learning with minibatch wasserstein: asymptotic and gradient properties,” in AISTATS, 2020.
  61. M. G. Bellemare, I. Danihelka, W. Dabney, S. Mohamed, B. Lakshminarayanan, S. Hoyer, and R. Munos, “The cramer distance as a solution to biased wasserstein gradients,” arXiv preprint arXiv:1705.10743, 2017.
  62. K. Fatras, T. Sejourne, R. Flamary, and N. Courty, “Unbalanced minibatch optimal transport; applications to domain adaptation,” in International Conference on Machine Learning.   PMLR, 2021, pp. 3186–3197.
  63. L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard, “Scaling algorithms for unbalanced optimal transport problems,” Mathematics of Computation, vol. 87, no. 314, pp. 2563–2609, 2018.
  64. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
  65. M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, “From word embeddings to document distances,” in International conference on machine learning.   PMLR, 2015, pp. 957–966.
  66. V. Titouan, N. Courty, R. Tavenard, and R. Flamary, “Optimal transport for structured data with application on graphs,” in International Conference on Machine Learning.   PMLR, 2019, pp. 6275–6284.
  67. F. Mémoli, “Gromov–wasserstein distances and the metric approach to object matching,” Foundations of computational mathematics, vol. 11, no. 4, pp. 417–487, 2011.
  68. C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio, “Learning with a wasserstein loss,” Advances in Neural Information Processing Systems, vol. 28, pp. 2053–2061, 2015.
  69. X. Liu, Y. Han, S. Bai, Y. Ge, T. Wang, X. Han, S. Li, J. You, and J. Lu, “Importance-aware semantic segmentation in self-driving with discrete wasserstein training,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 629–11 636.
  70. X. Liu, W. Ji, J. You, G. E. Fakhri, and J. Woo, “Severity-aware semantic segmentation with reinforced wasserstein training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 566–12 575.
  71. M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi, “Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment,” in Proceedings of the 26th international conference on world wide web, 2017, pp. 1171–1180.
  72. K. Makhlouf, S. Zhioua, and C. Palamidessi, “On the applicability of machine learning fairness notions,” ACM SIGKDD Explorations Newsletter, vol. 23, no. 1, pp. 14–23, 2021.
  73. N. Si, K. Murthy, J. Blanchet, and V. A. Nguyen, “Testing group fairness via optimal transport projections,” in International Conference on Machine Learning.   PMLR, 2021, pp. 9649–9659.
  74. C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226.
  75. P. Gordaliza, E. Del Barrio, G. Fabrice, and J.-M. Loubes, “Obtaining fairness using optimal transport theory,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2357–2365.
  76. L. Oneto, M. Donini, G. Luise, C. Ciliberto, A. Maurer, and M. Pontil, “Exploiting mmd and sinkhorn divergences for fair and transferable representation learning.” in NeurIPS, 2020.
  77. S. Chiappa, R. Jiang, T. Stepleton, A. Pacchiano, H. Jiang, and J. Aslanides, “A general approach to fairness with optimal transport.” in AAAI, 2020, pp. 3633–3640.
  78. A. Genevay, G. Peyré, and M. Cuturi, “Gan and vae from an optimal transport point of view,” arXiv preprint arXiv:1706.01807, 2017.
  79. N. Lei, K. Su, L. Cui, S.-T. Yau, and X. D. Gu, “A geometric view of optimal transportation and generative model,” Computer Aided Geometric Design, vol. 68, pp. 1–21, 2019.
  80. F. Bassetti, A. Bodini, and E. Regazzini, “On minimum kantorovich distance estimators,” Statistics & probability letters, vol. 76, no. 12, pp. 1298–1302, 2006.
  81. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning.   PMLR, 2017, pp. 214–223.
  82. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30.   Curran Associates, Inc., 2017.
  83. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.
  84. T. Pinetz, D. Soukup, and T. Pock, “On the estimation of the wasserstein distance in generative models,” in German Conference on Pattern Recognition.   Springer, 2019, pp. 156–170.
  85. A. Mallasto, G. Montúfar, and A. Gerolin, “How well do wgans estimate the wasserstein metric?” arXiv preprint arXiv:1910.03875, 2019.
  86. J. Stanczuk, C. Etmann, L. M. Kreusser, and C.-B. Schönlieb, “Wasserstein gans work because they fail (to approximate the wasserstein distance),” arXiv preprint arXiv:2103.01678, 2021.
  87. J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, and G. Peyré, “Interpolating between optimal transport and mmd using sinkhorn divergences,” in The 22nd International Conference on Artificial Intelligence and Statistics.   PMLR, 2019, pp. 2681–2690.
  88. Y. Xie, X. Wang, R. Wang, and H. Zha, “A fast proximal point method for computing exact wasserstein distance,” in Uncertainty in artificial intelligence.   PMLR, 2020, pp. 433–453.
  89. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  90. S. Kolouri, G. K. Rohde, and H. Hoffmann, “Sliced wasserstein distance for learning gaussian mixture models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3427–3436.
  91. J. Wu, Z. Huang, D. Acharya, W. Li, J. Thoma, D. P. Paudel, and L. V. Gool, “Sliced wasserstein generative models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3713–3722.
  92. K. Nadjahi, A. Durmus, U. Simsekli, and R. Badeau, “Asymptotic guarantees for learning generative models with the sliced-wasserstein distance,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  93. K. Nadjahi, A. Durmus, L. Chizat, S. Kolouri, S. Shahrampour, and U. Simsekli, “Statistical and topological properties of sliced probability divergences,” Advances in Neural Information Processing Systems, vol. 33, pp. 20 802–20 812, 2020.
  94. D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” Journal of the American statistical Association, vol. 112, no. 518, pp. 859–877, 2017.
  95. A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015.
  96. I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein auto-encoders,” in International Conference on Learning Representations, 2018.
  97. G. Patrini, R. van den Berg, P. Forre, M. Carioni, S. Bhargav, M. Welling, T. Genewein, and F. Nielsen, “Sinkhorn autoencoders,” in Uncertainty in Artificial Intelligence.   PMLR, 2020, pp. 733–743.
  98. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  99. R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
  100. C. Finlay, J.-H. Jacobsen, L. Nurbekyan, and A. Oberman, “How to train your neural ode: the world of jacobian and kinetic regularization,” in International Conference on Machine Learning.   PMLR, 2020, pp. 3154–3164.
  101. B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by v1?” Vision research, vol. 37, no. 23, pp. 3311–3325, 1997.
  102. P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994.
  103. R. Sandler and M. Lindenbaum, “Nonnegative matrix factorization with earth mover’s distance metric,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2009, pp. 1873–1880.
  104. S. Shirdhonkar and D. W. Jacobs, “Approximate earth mover’s distance in linear time,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2008, pp. 1–8.
  105. A. Rolet, M. Cuturi, and G. Peyré, “Fast dictionary learning with a smoothed wasserstein loss,” in Artificial Intelligence and Statistics.   PMLR, 2016, pp. 630–638.
  106. M. A. Schmitz, M. Heitz, N. Bonneel, F. Ngole, D. Coeurjolly, M. Cuturi, G. Peyré, and J.-L. Starck, “Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning,” SIAM Journal on Imaging Sciences, vol. 11, no. 1, pp. 643–678, 2018.
  107. C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, and N. Courty, “Online graph dictionary learning,” in International Conference on Machine Learning.   PMLR, 2021, pp. 10 564–10 574.
  108. H. Xu, “Gromov-wasserstein factorization models for graph clustering,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 6478–6485.
  109. G. Peyré, M. Cuturi, and J. Solomon, “Gromov-wasserstein averaging of kernel and distance matrices,” in International Conference on Machine Learning.   PMLR, 2016, pp. 2664–2672.
  110. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
  111. V. Huynh, H. Zhao, and D. Phung, “Otlda: A geometry-aware optimal transport approach for topic modeling,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 573–18 582, 2020.
  112. D. Pollard, “Quantization and the method of k-means,” IEEE Transactions on Information theory, vol. 28, no. 2, pp. 199–205, 1982.
  113. G. Canas and L. Rosasco, “Learning probability measures with respect to optimal transport metrics,” Advances in Neural Information Processing Systems, vol. 25, 2012.
  114. P. M. Gruber, “Optimum quantization and its applications,” Advances in Mathematics, vol. 186, no. 2, pp. 456–497, 2004.
  115. M. Cuturi and A. Doucet, “Fast computation of wasserstein barycenters,” in International conference on machine learning.   PMLR, 2014, pp. 685–693.
  116. E. Del Barrio, J. A. Cuesta-Albertos, C. Matrán, and A. Mayo-Íscar, “Robust clustering tools based on optimal transportation,” Statistics and Computing, vol. 29, no. 1, pp. 139–160, 2019.
  117. J. A. Cuesta-Albertos, A. Gordaliza, and C. Matrán, “Trimmed k𝑘kitalic_k-means: an attempt to robustify quantizers,” The Annals of Statistics, vol. 25, no. 2, pp. 553–576, 1997.
  118. V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain adaptation: A survey of recent advances,” IEEE signal processing magazine, vol. 32, no. 3, pp. 53–69, 2015.
  119. V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel, “Large scale optimal transport and mapping estimation,” in International Conference on Learning Representations, 2018.
  120. M. Perrot, N. Courty, R. Flamary, and A. Habrard, “Mapping estimation for discrete optimal transport,” Advances in Neural Information Processing Systems, vol. 29, pp. 4197–4205, 2016.
  121. N. Courty, R. Flamary, A. Habrard, and A. Rakotomamonjy, “Joint distribution optimal transportation for domain adaptation,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30.   Curran Associates, Inc., 2017.
  122. M. Ghifary, W. B. Kleijn, and M. Zhang, “Domain adaptive neural networks for object recognition,” in Pacific Rim international conference on artificial intelligence.   Springer, 2014, pp. 898–904.
  123. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
  124. J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Wasserstein distance guided representation learning for domain adaptation,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  125. E. F. Montesuma and F. M. N. Mboula, “Wasserstein barycenter for multi-source domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 785–16 793.
  126. V. Titouan, I. Redko, R. Flamary, and N. Courty, “Co-optimal transport,” Advances in Neural Information Processing Systems, vol. 33, pp. 17 559–17 570, 2020.
  127. E. F. Montesuma and F. M. N. Mboula, “Wasserstein barycenter transport for acoustic adaptation,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2021, pp. 3405–3409.
  128. R. Turrisi, R. Flamary, A. Rakotomamonjy, and M. Pontil, “Multi-source domain adaptation via weighted joint distributions optimal transport,” in Uncertainty in Artificial Intelligence.   PMLR, 2022, pp. 1970–1980.
  129. I. Redko, N. Courty, R. Flamary, and D. Tuia, “Optimal transport for multi-source domain adaptation under target shift,” in The 22nd International Conference on Artificial Intelligence and Statistics.   PMLR, 2019, pp. 849–858.
  130. I. Redko, A. Habrard, and M. Sebban, “Theoretical analysis of domain adaptation with optimal transport,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.   Springer, 2017, pp. 737–753.
  131. D. Alvarez Melis and N. Fusi, “Geometric dataset distances via optimal transport,” Advances in Neural Information Processing Systems, vol. 33, 2020.
  132. D. Alvarez-Melis and N. Fusi, “Dataset dynamics via gradient flows in probability space,” in International Conference on Machine Learning.   PMLR, 2021, pp. 219–230.
  133. D. White, “Mean, variance, and probabilistic criteria in finite markov decision processes: A review,” Journal of Optimization Theory and Applications, vol. 56, no. 1, pp. 1–29, 1988.
  134. T. Morimura, M. Sugiyama, H. Kashima, H. Hachiya, and T. Tanaka, “Parametric return density estimation for reinforcement learning,” in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010, pp. 368–375.
  135. M. G. Azar, R. Munos, and B. Kappen, “On the sample complexity of reinforcement learning with a generative model,” in ICML, 2012.
  136. M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspective on reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 449–458.
  137. W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos, “Distributional reinforcement learning with quantile regression,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  138. M. Ghavamzadeh, S. Mannor, J. Pineau, and A. Tamar, “Bayesian reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, vol. 8, no. 5-6, pp. 359–483, 2015.
  139. A. M. Metelli, A. Likmeta, and M. Restelli, “Propagating uncertainty in reinforcement learning via wasserstein barycenters,” in 33rd Conference on Neural Information Processing Systems, NeurIPS 2019.   Curran Associates, Inc., 2019, pp. 4335–4347.
  140. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999.
  141. Y. Liu, P. Ramachandran, Q. Liu, and J. Peng, “Stein variational policy gradient,” UAI, 2017.
  142. R. Zhang, C. Chen, C. Li, and L. Carin, “Policy optimization as wasserstein gradient flows,” in International Conference on Machine Learning.   PMLR, 2018, pp. 5737–5746.
  143. R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the fokker–planck equation,” SIAM journal on mathematical analysis, vol. 29, no. 1, pp. 1–17, 1998.
  144. T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1352–1361.
  145. S. Dhouib, I. Redko, T. Kerdoncuff, R. Emonet, and M. Sebban, “A swiss army knife for minimax optimal transport,” in International Conference on Machine Learning.   PMLR, 2020, pp. 2504–2513.
  146. A. Korotin, V. Egiazarian, A. Asadulaev, A. Safin, and E. Burnaev, “Wasserstein-2 generative networks,” in International Conference on Learning Representations, 2020.
  147. A. Korotin, L. Li, J. Solomon, and E. Burnaev, “Continuous wasserstein-2 barycenter estimation without minimax optimization,” in International Conference on Learning Representations, 2020.
  148. J. Weed and F. Bach, “Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance,” Bernoulli, vol. 25, no. 4A, pp. 2620–2648, 2019.
  149. J. Niles-Weed and P. Rigollet, “Estimation of wasserstein distances in the spiked transport model,” arXiv preprint arXiv:1909.07513, 2019.
  150. K. Nguyen, D. Nguyen, T. Pham, N. Ho et al., “Improving mini-batch optimal transport via partial transportation,” in International Conference on Machine Learning.   PMLR, 2022, pp. 16 656–16 690.
  151. D. Onken, S. Wu Fung, X. Li, and L. Ruthotto, “Ot-flow: Fast and accurate continuous normalizing flows via optimal transport,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1-18, 2021.
  152. C. Cheng, B. Zhou, G. Ma, D. Wu, and Y. Yuan, “Wasserstein distance based deep adversarial transfer learning for intelligent fault diagnosis,” arXiv preprint arXiv:1903.06753, 2019.
  153. T. Nguyen, T. Le, H. Zhao, Q. H. Tran, T. Nguyen, and D. Phung, “Most: Multi-source domain adaptation via optimal transport for student-teacher learning,” in Uncertainty in Artificial Intelligence.   PMLR, 2021, pp. 225–235.
  154. W. Dabney, Z. Kurth-Nelson, N. Uchida, C. K. Starkweather, D. Hassabis, R. Munos, and M. Botvinick, “A distributional code for value in dopamine-based reinforcement learning,” Nature, vol. 577, no. 7792, pp. 671–675, 2020.
  155. Y. Tan, Y. Li, and S.-L. Huang, “Otce: A transferability metric for cross-domain cross-task representations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 779–15 788.
Citations (19)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.