A Two Pronged Progress in Structured Dense Matrix Multiplication

Published 4 Nov 2016 in cs.DS | (1611.01569v3)

Abstract: Matrix-vector multiplication is one of the most fundamental computing primitives. Given a matrix $A\in\mathbb{F}^{N\times N}$ and a vector $b$, it is known that in the worst case $\Theta(N^2)$ operations over $\mathbb{F}$ are needed to compute $Ab$. A broad question is to identify classes of structured dense matrices that can be represented with $O(N)$ parameters, and for which matrix-vector multiplication can be performed sub-quadratically. One such class of structured matrices is the orthogonal polynomial transforms, whose rows correspond to a family of orthogonal polynomials. Other well known classes include the Toeplitz, Hankel, Vandermonde, Cauchy matrices and their extensions that are all special cases of a ldisplacement rank property. In this paper, we make progress on two fronts: 1. We introduce the notion of recurrence width of matrices. For matrices with constant recurrence width, we design algorithms to compute $Ab$ and $A^Tb$ with a near-linear number of operations. This notion of width is finer than all the above classes of structured matrices and thus we can compute multiplication for all of them using the same core algorithm. 2. We additionally adapt this algorithm to an algorithm for a much more general class of matrices with displacement structure: those with low displacement rank with respect to quasiseparable matrices. This class includes Toeplitz-plus-Hankel-like matrices, Discrete Cosine/Sine Transforms, and more, and captures all previously known matrices with displacement structure that we are aware of under a unified parametrization and algorithm. Our work unifies, generalizes, and simplifies existing state-of-the-art results in structured matrix-vector multiplication. Finally, we show how applications in areas such as multipoint evaluations of multivariate polynomials can be reduced to problems involving low recurrence width matrices.

Abstract PDF Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces recurrence width as a unified structural parameter that subsumes traditional displacement rank and recurrence-based methods.
It presents a divide-and-conquer algorithm that reduces matrix-vector multiplication complexity using precomputed transition matrices for near-linear performance.
The framework extends to matrices with quasiseparable operators, enabling efficient inversion and applications in coding theory, polynomial evaluation, and neural networks.

Two-Pronged Advances in Structured Dense Matrix Multiplication

This paper presents a unified and general framework for fast matrix-vector multiplication with structured dense matrices, introducing the concept of recurrence width and extending the class of matrices with low displacement rank to those with respect to quasiseparable operators. The work provides both algorithmic generalizations and an overview of previously disparate approaches, with implications for computational linear algebra, coding theory, and efficient neural network architectures.

Background and Motivation

Matrix-vector multiplication for generic dense $N \times N$ matrices requires $\Theta(N^2)$ operations. However, many structured matrices—such as Toeplitz, Hankel, Vandermonde, Cauchy, and orthogonal polynomial transforms—admit sub-quadratic or even near-linear time algorithms due to their compact parameterizations. The challenge is to identify broad classes of such matrices, develop unified representations, and design algorithms that efficiently exploit their structure for both multiplication and inversion.

Prior approaches have focused on two main axes:

Displacement rank: Matrices $A$ for which $LA - AR$ (for fixed $L, R$ ) has low rank, encompassing many classical structured matrices.
Orthogonal polynomial transforms: Matrices whose rows are evaluations of orthogonal polynomials, characterized by low-width recurrences.

Despite their algorithmic similarities, these classes have been treated separately, with different techniques and representations. This paper bridges these approaches by introducing a more general and fine-grained notion of structure.

Recurrence Width: A Unified Structural Parameter

The central contribution is the definition of recurrence width. For a matrix $A \in F^{N \times N}$ , recurrence width $t$ means that the rows (or columns) of $A$ can be generated by a linear recurrence of width $t$ with polynomial coefficients:

$a_i(X) = \sum_{j=1}^t g_{i,j}(X) a_{i-j}(X)$

where $a_i(X)$ is the $i$ -th row polynomial, and $g_{i,j}(X)$ are polynomials of bounded degree.

Key properties:

Generalization: Recurrence width subsumes all previously studied classes (orthogonal polynomial transforms, Toeplitz/Hankel/Vandermonde/Cauchy-like matrices).
Finer granularity: It distinguishes between matrices that previous displacement rank or recurrence-based frameworks could not separate.
Algorithmic unification: The same core algorithm applies to all matrices with bounded recurrence width, regardless of their origin.

Fast Algorithms for Matrix-Vector Multiplication

The paper develops a divide-and-conquer algorithm for matrix-vector multiplication with matrices of low recurrence width. The main steps are:

Preprocessing: Compute transition matrices $T_{[\ell:r]}$ that encode the recurrence over dyadic intervals. This step requires $O(t^\omega M(N) \log N)$ operations, where $M(N)$ is the cost of degree- $N$ polynomial multiplication and $\omega$ is the matrix multiplication exponent.
Multiplication: Given a vector $b$ , the product $A^T b$ (and by transposition, $A b$ ) can be computed in $O(t^2 M(N) \log N)$ operations. The algorithm recursively partitions the problem, leveraging the recurrence structure to reduce the effective problem size at each step.
Inversion and Solvers: For invertible matrices of low recurrence width, the same structural properties allow for efficient solvers (i.e., computing $A^{-1} b$ ) in $O(t^2 M(N) \log^2 N)$ time, using recursive block elimination and the same precomputed transitions.

Pseudocode Sketch

def fast_matvec(A_params, b):
    # A_params: recurrence coefficients, initial conditions, etc.
    # Precompute transition matrices T_{[\ell:r]} for all dyadic intervals
    T = precompute_transitions(A_params)
    # Recursively compute the product using divide-and-conquer
    return recursive_multiply(T, b)

The algorithm is optimal up to polylogarithmic factors with respect to the input size for fixed $t$ .

Extension to Displacement Rank with Quasiseparable Operators

The second major contribution is the extension of fast algorithms to matrices with low displacement rank with respect to quasiseparable $L$ and $R$ operators. Quasiseparable matrices generalize banded and block companion matrices, and are defined by low-rank off-diagonal blocks.

Main result: If $A$ satisfies $LA - AR = E$ with $L, R$ $t$ -quasiseparable and $E$ of rank $r$ , then $A$ and $A^T$ admit matrix-vector multiplication in $O(rt^\omega M(N) \log^2 N + rt^2 M(N) \log^3 N)$ operations.
Algorithmic reduction: The problem reduces to computing rational functions of the form $b^T (XI - R)^{-1} c$ for vectors $b, c$ , which can be solved recursively using the Woodbury identity and the self-similar structure of quasiseparable matrices.

This result strictly generalizes all previous displacement rank-based fast multiplication results, including those for block companion matrices and all classical structured matrices.

Practical and Theoretical Implications

Numerical Results and Claims

Optimality: The algorithms match the parameterization size (up to log factors) for all classes considered.
Generality: All known structured dense matrices with fast multiplication algorithms are captured as special cases.
Expressivity: The recurrence width hierarchy is strict; width $t$ cannot represent all width $t+1$ matrices.

Applications

Polynomial evaluation/interpolation: Multipoint evaluation of multivariate polynomials reduces to multiplication by low recurrence width matrices.
Coding theory: Encoding for multiplicity codes and related constructions can be performed efficiently using these algorithms.
Neural networks: Structured layers (e.g., Toeplitz-like, block companion) used for model compression and acceleration are special cases; the framework allows for more expressive and efficient structured layers.
Control theory: The Sylvester and Stein equations, central in system theory, are efficiently solvable for large classes of operators.

Limitations and Trade-offs

Preprocessing cost: For very large $t$ or $r$ , preprocessing may dominate unless amortized over many multiplications.
Numerical stability: The algorithms focus on exact arithmetic; stability for floating-point computation is not addressed and may require further analysis.
Parameter recovery: Efficiently recovering the structured parameterization from a given matrix is possible for low recurrence width, but may be hard in general.

Future Directions

Numerical analysis: Extending the framework to stable floating-point algorithms for real/complex matrices.
Sparse and approximate structure: Exploiting sparsity or approximate low recurrence width for further acceleration.
Learning structure: Automatic discovery of recurrence width or displacement structure in data-driven settings.
Generalized operators: Extending to broader classes of operators beyond quasiseparable, possibly via hierarchical or multilevel structures.

Conclusion

This work provides a comprehensive and unified theory for fast matrix-vector multiplication with structured dense matrices, subsuming and generalizing all previously known classes. The introduction of recurrence width as a structural parameter enables both theoretical insight and practical algorithms, with broad applicability across computational mathematics, coding theory, and machine learning. The extension to quasiseparable displacement operators further broadens the class of tractable matrices, establishing a new foundation for structured linear algebra.