Multivariate Density Estimation via Variance-Reduced Sketching
Abstract: Multivariate density estimation is of great interest in various scientific and engineering disciplines. In this work, we introduce a new framework called Variance-Reduced Sketching (VRS), specifically designed to estimate multivariate density functions with a reduced curse of dimensionality. Our VRS framework conceptualizes multivariate functions as infinite-size matrices/tensors, and facilitates a new sketching technique motivated by the numerical linear algebra literature to reduce the variance in density estimation problems. We demonstrate the robust numerical performance of VRS through a series of simulated experiments and real-world data applications. Notably, VRS shows remarkable improvement over existing neural network density estimators and classical kernel methods in numerous distribution models. Additionally, we offer theoretical justifications for VRS to support its ability to deliver density estimation with a reduced curse of dimensionality.
- Randomized algorithms for rounding in the tensor-train format. SIAM Journal on Scientific Computing, 45(1):A74–A95, 2023.
- Fast randomized kernel ridge regression with statistical guarantees. Advances in neural information processing systems, 28, 2015.
- Learning to find pre-images. Advances in neural information processing systems, 16:449–456, 2004.
- Jordan Bell. The singular value decomposition of compact operators on hilbert spaces, 2014.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936, 2010.
- Randomized algorithms for the approximations of tucker and the tensor train decompositions. Advances in Computational Mathematics, 45(1):395–428, 2019.
- Combining particle and tensor-network methods for partial differential equations via sketching. arXiv preprint arXiv:2305.17884, 2023.
- Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
- Randnla: randomized numerical linear algebra. Communications of the ACM, 59(6):80–90, 2016.
- Made: Masked autoencoder for distribution estimation. In International conference on machine learning, pages 881–889. PMLR, 2015.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
- Kernel methods in machine learning. The Annals of Statistics, 36(3):1171–1220, 2008.
- Topics in matrix analysis. Cambridge university press, 1994.
- Generative modeling via tensor train sketching. Applied and Computational Harmonic Analysis, 67:101575, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Streaming tensor train approximation. SIAM Journal on Scientific Computing, 45(5):A2610–A2631, 2023.
- Sampling methods for the nyström method. The Journal of Machine Learning Research, 13(1):981–1006, 2012.
- John R Lanzante. Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. International Journal of Climatology: A Journal of the Royal Meteorological Society, 16(11):1197–1226, 1996.
- Qi Li and Jeffrey Scott Racine. Nonparametric econometrics: theory and practice. Princeton University Press, 2023.
- Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences, 104(51):20167–20172, 2007.
- Sparse nonparametric density estimation in high dimensions using the rodeo. In Artificial Intelligence and Statistics, pages 283–290. PMLR, 2007.
- Introduction to nonparametric statistics for the biological sciences using R. Springer, 2016.
- Michael W Mahoney et al. Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning, 3(2):123–224, 2011.
- Per-Gunnar Martinsson. Randomized methods for matrix computations. The Mathematics of Data, 25(4):187–231, 2019.
- Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29:403–572, 2020.
- Kernel pca and de-noising in feature spaces. Advances in neural information processing systems, 11, 1998.
- Randomized algorithms for low-rank tensor decompositions in the tucker format. SIAM Journal on Mathematics of Data Science, 2(1):189–215, 2020.
- Image de-noising using discrete wavelet transform. International Journal of Computer Science and Network Security, 8(1):213–216, 2008.
- Fast & accurate randomized algorithms for linear systems and eigenvalue problems. arXiv preprint arXiv:2111.00113, 2021.
- Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30, 2017.
- Generative modeling via hierarchical tensor sketching. arXiv preprint arXiv:2304.05305, 2023.
- The intrinsic dimension of images and its impact on learning. arXiv preprint arXiv:2104.08894, 2021.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- A statistical perspective on randomized sketching for ordinary least-squares. The Journal of Machine Learning Research, 17(1):7508–7538, 2016.
- Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of machine learning research, 13(2), 2012.
- Sparse additive models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(5):1009–1030, 2009.
- Explicit error estimates for spline approximation of arbitrary smoothness in isogeometric analysis. Numerische Mathematik, 144(4):889–929, 2020.
- Johannes Schmidt-hieber. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- Parallel algorithms for computing the tensor-train decomposition. SIAM Journal on Scientific Computing, 45(3):C101–C130, 2023.
- Low-rank tucker approximation of a tensor from streaming data. SIAM Journal on Mathematics of Data Science, 2(4):1123–1150, 2020.
- Solving high-dimensional fokker-planck equation with functional hierarchical tensor. arXiv preprint arXiv:2312.07455, 2023.
- Generative modeling via tree tensor network states. arXiv preprint arXiv:2209.01341, 2022.
- Fixed-rank approximation of a positive-semidefinite matrix from streaming data. Advances in Neural Information Processing Systems, 30, 2017a.
- Practical sketching algorithms for low-rank matrix approximation. SIAM Journal on Matrix Analysis and Applications, 38(4):1454–1485, 2017b.
- Nonparametric econometrics. Cambridge university press Cambridge, 1999.
- Neural autoregressive distribution estimation. The Journal of Machine Learning Research, 17(1):7184–7220, 2016.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
- Sketched ridge regression: Optimization perspective, statistical perspective, and model averaging. In International Conference on Machine Learning, pages 3608–3616. PMLR, 2017.
- Larry Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006.
- Using the nyström method to speed up kernel machines. Advances in neural information processing systems, 13, 2000.
- David P Woodruff et al. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
- Yuan Xu. Approximation by polynomials in sobolev spaces with jacobi weight. Journal of Fourier Analysis and Applications, 24:1438–1459, 2018.
- Randomized sketches for kernels: Fast and optimal nonparametric regression. Annals of Statistics, pages 991–1023, 2017.
- Minimax optimal rates of estimation in high dimensional additive models. Annals of Statistics, 44(6):2564–2593, 2016.
- Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. The Journal of Machine Learning Research, 16(1):3299–3340, 2015.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.