- The paper introduces recurrence width as a unified structural parameter that subsumes traditional displacement rank and recurrence-based methods.
- It presents a divide-and-conquer algorithm that reduces matrix-vector multiplication complexity using precomputed transition matrices for near-linear performance.
- The framework extends to matrices with quasiseparable operators, enabling efficient inversion and applications in coding theory, polynomial evaluation, and neural networks.
Two-Pronged Advances in Structured Dense Matrix Multiplication
This paper presents a unified and general framework for fast matrix-vector multiplication with structured dense matrices, introducing the concept of recurrence width and extending the class of matrices with low displacement rank to those with respect to quasiseparable operators. The work provides both algorithmic generalizations and an overview of previously disparate approaches, with implications for computational linear algebra, coding theory, and efficient neural network architectures.
Background and Motivation
Matrix-vector multiplication for generic dense N×N matrices requires Θ(N2) operations. However, many structured matrices—such as Toeplitz, Hankel, Vandermonde, Cauchy, and orthogonal polynomial transforms—admit sub-quadratic or even near-linear time algorithms due to their compact parameterizations. The challenge is to identify broad classes of such matrices, develop unified representations, and design algorithms that efficiently exploit their structure for both multiplication and inversion.
Prior approaches have focused on two main axes:
- Displacement rank: Matrices A for which LA−AR (for fixed L,R) has low rank, encompassing many classical structured matrices.
- Orthogonal polynomial transforms: Matrices whose rows are evaluations of orthogonal polynomials, characterized by low-width recurrences.
Despite their algorithmic similarities, these classes have been treated separately, with different techniques and representations. This paper bridges these approaches by introducing a more general and fine-grained notion of structure.
Recurrence Width: A Unified Structural Parameter
The central contribution is the definition of recurrence width. For a matrix A∈FN×N, recurrence width t means that the rows (or columns) of A can be generated by a linear recurrence of width t with polynomial coefficients:
ai(X)=∑j=1tgi,j(X)ai−j(X)
where ai(X) is the i-th row polynomial, and gi,j(X) are polynomials of bounded degree.
Key properties:
- Generalization: Recurrence width subsumes all previously studied classes (orthogonal polynomial transforms, Toeplitz/Hankel/Vandermonde/Cauchy-like matrices).
- Finer granularity: It distinguishes between matrices that previous displacement rank or recurrence-based frameworks could not separate.
- Algorithmic unification: The same core algorithm applies to all matrices with bounded recurrence width, regardless of their origin.
Fast Algorithms for Matrix-Vector Multiplication
The paper develops a divide-and-conquer algorithm for matrix-vector multiplication with matrices of low recurrence width. The main steps are:
- Preprocessing: Compute transition matrices T[ℓ:r] that encode the recurrence over dyadic intervals. This step requires O(tωM(N)logN) operations, where M(N) is the cost of degree-N polynomial multiplication and ω is the matrix multiplication exponent.
- Multiplication: Given a vector b, the product ATb (and by transposition, Ab) can be computed in O(t2M(N)logN) operations. The algorithm recursively partitions the problem, leveraging the recurrence structure to reduce the effective problem size at each step.
- Inversion and Solvers: For invertible matrices of low recurrence width, the same structural properties allow for efficient solvers (i.e., computing A−1b) in O(t2M(N)log2N) time, using recursive block elimination and the same precomputed transitions.
Pseudocode Sketch
1
2
3
4
5
6
|
def fast_matvec(A_params, b):
# A_params: recurrence coefficients, initial conditions, etc.
# Precompute transition matrices T_{[\ell:r]} for all dyadic intervals
T = precompute_transitions(A_params)
# Recursively compute the product using divide-and-conquer
return recursive_multiply(T, b) |
The algorithm is optimal up to polylogarithmic factors with respect to the input size for fixed t.
Extension to Displacement Rank with Quasiseparable Operators
The second major contribution is the extension of fast algorithms to matrices with low displacement rank with respect to quasiseparable L and R operators. Quasiseparable matrices generalize banded and block companion matrices, and are defined by low-rank off-diagonal blocks.
- Main result: If A satisfies LA−AR=E with L,R t-quasiseparable and E of rank r, then A and AT admit matrix-vector multiplication in O(rtωM(N)log2N+rt2M(N)log3N) operations.
- Algorithmic reduction: The problem reduces to computing rational functions of the form bT(XI−R)−1c for vectors b,c, which can be solved recursively using the Woodbury identity and the self-similar structure of quasiseparable matrices.
This result strictly generalizes all previous displacement rank-based fast multiplication results, including those for block companion matrices and all classical structured matrices.
Practical and Theoretical Implications
Numerical Results and Claims
- Optimality: The algorithms match the parameterization size (up to log factors) for all classes considered.
- Generality: All known structured dense matrices with fast multiplication algorithms are captured as special cases.
- Expressivity: The recurrence width hierarchy is strict; width t cannot represent all width t+1 matrices.
Applications
- Polynomial evaluation/interpolation: Multipoint evaluation of multivariate polynomials reduces to multiplication by low recurrence width matrices.
- Coding theory: Encoding for multiplicity codes and related constructions can be performed efficiently using these algorithms.
- Neural networks: Structured layers (e.g., Toeplitz-like, block companion) used for model compression and acceleration are special cases; the framework allows for more expressive and efficient structured layers.
- Control theory: The Sylvester and Stein equations, central in system theory, are efficiently solvable for large classes of operators.
Limitations and Trade-offs
- Preprocessing cost: For very large t or r, preprocessing may dominate unless amortized over many multiplications.
- Numerical stability: The algorithms focus on exact arithmetic; stability for floating-point computation is not addressed and may require further analysis.
- Parameter recovery: Efficiently recovering the structured parameterization from a given matrix is possible for low recurrence width, but may be hard in general.
Future Directions
- Numerical analysis: Extending the framework to stable floating-point algorithms for real/complex matrices.
- Sparse and approximate structure: Exploiting sparsity or approximate low recurrence width for further acceleration.
- Learning structure: Automatic discovery of recurrence width or displacement structure in data-driven settings.
- Generalized operators: Extending to broader classes of operators beyond quasiseparable, possibly via hierarchical or multilevel structures.
Conclusion
This work provides a comprehensive and unified theory for fast matrix-vector multiplication with structured dense matrices, subsuming and generalizing all previously known classes. The introduction of recurrence width as a structural parameter enables both theoretical insight and practical algorithms, with broad applicability across computational mathematics, coding theory, and machine learning. The extension to quasiseparable displacement operators further broadens the class of tractable matrices, establishing a new foundation for structured linear algebra.