Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models
Abstract: In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.
- Tensor Network alternating linear scheme for MIMO Volterra system identification. Automatica, 84:26–35.
- Higher-Order Factorization Machines. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
- Multi-output Polynomial Networks and Factorization Machines. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Parallelized Tensor Train Learning of Polynomial Classifiers. IEEE Transactions on Neural Networks and Learning Systems, 29(10):4621–4632.
- Supervised learning with projected entangled pair states. Physical Review B, 103(12):125117.
- Cichocki, A. (2014). Era of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions. arXiv:1403.2048 [cs].
- Tensor Networks for Dimensionality Reduction and Large-Scale Optimization: Part 1 Low-Rank Tensor Decompositions. Foundations and Trends® in Machine Learning, 9(4-5):249–429.
- Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives. Foundations and Trends® in Machine Learning, 9(6):249–429.
- Tensor decompositions, alternating least squares and other tales. Journal of Chemometrics, 23(7-8):393–405.
- Support-vector networks. Machine Learning, 20(3):273–297.
- De Branges, L. (1959). The Stone-Weierstrass Theorem. Proceedings of the American Mathematical Society, 10(5):822–824.
- De Lathauwer, L. (2008a). Decompositions of a Higher-Order Tensor in Block Terms—Part I: Lemmas for Partitioned Matrices. SIAM Journal on Matrix Analysis and Applications, 30(3):1022–1032.
- De Lathauwer, L. (2008b). Decompositions of a Higher-Order Tensor in Block Terms—Part II: Definitions and Uniqueness. SIAM Journal on Matrix Analysis and Applications, 30(3):1033–1066.
- UCI Machine Learning Repository.
- Sparse Gaussian Processes with Spherical Harmonic Features. In International Conference on Machine Learning, pages 2793–2802. PMLR.
- TensorNetwork for Machine Learning. arXiv:1906.06329 [cond-mat, physics:physics, stat].
- Algorithms for entanglement renormalization. Physical Review B, 79(14):144108.
- Parametric complexity reduction of Volterra models using tensor decompositions. In 2009 17th European Signal Processing Conference, pages 2288–2292.
- Grasedyck, L. (2010). Hierarchical Singular Value Decomposition of Tensors. SIAM Journal on Matrix Analysis and Applications, 31(4):2029–2054.
- A New Scheme for the Tensor Representation. Journal of Fourier Analysis and Applications, 15(5):706–722.
- Variational Fourier features for Gaussian processes. The Journal of Machine Learning Research, 18(1):5537–5588.
- Gaussian processes for Big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, pages 282–290. AUAI Press.
- Hitchcock, F. L. (1927). The Expression of a Tensor or a Polyadic as a Sum of Products. Journal of Mathematics and Physics, 6(1-4):164–189.
- The Alternating Linear Scheme for Tensor Optimization in the Tensor Train Format. SIAM Journal on Scientific Computing, 34(2):A683–A713.
- Nonlinear system identification with regularized Tensor Network B-splines. Automatica, 122:109300.
- Supervised Learning and Canonical Decomposition of Multivariate Functions. IEEE Transactions on Signal Processing, pages 1–1.
- Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models. In Advances in Neural Information Processing Systems.
- Khoromskij, B. N. (2011). O(Dlog N)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical Modeling. Constructive Approximation, 34(2):257–280.
- Expressive power of recurrent neural networks. In International Conference on Learning Representations.
- Tensor Decompositions and Applications. SIAM Review, 51(3):455–500.
- Fastfood - Computing Hilbert Space Expansions in loglinear time. In Proceedings of the 30th International Conference on Machine Learning, pages 244–252. PMLR.
- Kernel methods through the roof: Handling billions of points efficiently. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 14410–14422. Curran Associates, Inc.
- Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Exponential machines. Bulletin of the Polish Academy of Sciences: Technical Sciences; 2018; 66; No 6 (Special Section on Deep Learning: Theory and Practice); 789-797.
- Automatic differentiation for Riemannian optimization on low-rank matrix and tensor-train manifolds. arXiv:2103.14974 [cs, math].
- Oseledets, I. V. (2011). Tensor-Train Decomposition. SIAM Journal on Scientific Computing, 33(5):2295–2317.
- Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, pages 239–247. Association for Computing Machinery.
- Random features for large-scale kernel machines. In Proceedings of the 20th International Conference on Neural Information Processing Systems, pages 1177–1184. Curran Associates Inc.
- Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, Mass.
- Rendle, S. (2010). Factorization Machines. In 2010 IEEE International Conference on Data Mining, pages 995–1000.
- String and Membrane Gaussian Processes. Journal of Machine Learning Research, 17(131):1–87.
- Kernel Methods for Pattern Analysis. Cambridge University Press.
- Tensor Decomposition for Signal Processing and Machine Learning. IEEE Transactions on Signal Processing, 65(13):3551–3582.
- Hilbert space methods for reduced-rank Gaussian process regression. Statistics and Computing, 30(2):419–446.
- Supervised learning with tensor networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4806–4814. Curran Associates Inc.
- Least Squares Support Vector Machines. World Scientific.
- Tucker, L. R. (1963). Implications of factor analysis of three-way matrices for measurement of change. Problems in measuring change, 15(122-137):3.
- Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311.
- Uschmajew, A. (2012). Local Convergence of the Alternating Least Squares Algorithm for Canonical Tensor Approximation. SIAM Journal on Matrix Analysis and Applications, 33(2):639–652.
- Vapnik, V. N. (1998). The Nature of Statistical Learning Theory. Springer New York.
- Renormalization algorithms for Quantum-Many Body Systems in two and higher dimensions.
- Learning multidimensional Fourier series with tensor trains. In 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 394–398.
- Large-Scale Learning with Fourier Features and Tensor Decompositions. In Advances in Neural Information Processing Systems.
- White, S. R. (1992). Density matrix formulation for quantum renormalization groups. Physical Review Letters, 69(19):2863–2866.
- Using the Nyström Method to Speed Up Kernel Machines. In Advances in Neural Information Processing Systems 13, pages 682–688. MIT Press.
- Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP). In Proceedings of the 32nd International Conference on Machine Learning, pages 1775–1784. PMLR.
- Woodruff, D. P. (2014). Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157.
- Tensor Ring Decomposition. arXiv:1606.05535 [cs].
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.