Tensor Algebra Processing Primitives (TAPP)

Updated 19 January 2026

Tensor Algebra Processing Primitives (TAPP) are a suite of algebraic operations and programming abstractions that unify and accelerate tensor computations across varied scientific and machine learning domains.
The framework provides mathematically rigorous operations such as t-product, tensor contractions, and decompositions, implemented via optimized APIs and backend-portable interfaces.
TAPP enables performance portability on diverse hardware, supporting applications from hyperspectral imaging to deep neural networks and quantum chemistry.

Tensor Algebra Processing Primitives (TAPP) are a suite of algebraic operations and standardized programming abstractions designed to unify, express, and efficiently execute tensor operations across scientific computing, machine learning, and high-performance data analysis. TAPP encompasses mathematically rigorous multilinear primitives, reference implementations, and—more recently—a common C-based interface for performance-portable tensor contraction, providing a foundation analogous to BLAS for matrix algebra but extended to the much richer domain of tensors. TAPP has emerged through contributions in tubal tensor algebra, Einstein-product-based multilinear theories, sparse tensor compilation, and platform-agnostic abstractions for deep learning and scientific applications.

1. Mathematical Foundations of TAPP

At its core, TAPP formalizes a minimal set of operations sufficient to realize the full spectrum of multilinear algebra. These include tensor-tensor contractions (generalizing matrix multiplication), various decomposition schemes (such as t-SVD, Tucker, CP), and mode-specific or structured products (e.g., convolution, Kronecker, Hadamard, Khatri-Rao). In the tubal framework, third-order tensors are modeled as matrices whose elements are tubes—vectors of length- $n$ , equipped with the t-product:

For $A \in \mathbb{R}^{m \times p \times n}$ and $B \in \mathbb{R}^{p \times \ell \times n}$ , the t-product is defined as

$A * B = \text{squeeze}\left( \operatorname{bcirc}(A) \cdot \operatorname{unfold}(B) \right)$

where $\operatorname{bcirc}(A)$ is a block-circulant matrix of tubes, and $\operatorname{unfold}(B)$ stacks tubes column-wise. This operation is equivalent to matrix multiplication over a ring where the scalars are tubes and multiplication corresponds to circular convolution (Avron et al., 3 Jun 2025).

The $\star_M$ -product extends this to arbitrary invertible $M \in \mathbb{C}^{n \times n}$ , defining the tube-tube product as $a \star_M b = M^{-1}(Ma \circ Mb)$ (with $\circ$ the Hadamard product). It is proven that $\star_M$ is the unique tubal product satisfying associativity, commutativity (on tubes), existence of identity, and module structure over $\mathbb{R}^n$ (Avron et al., 3 Jun 2025).

Beyond the tubal setting, TAPP in the generalized multilinear domain or Einstein-product algebra is built around the $N$ -mode contraction:

$(\mathscr{A} *_N \mathscr{B})_{i_1,\dots,i_P,j_1,\dots,j_M} = \sum_{k_1=1}^{K_1} \cdots \sum_{k_N=1}^{K_N} \mathscr{A}_{i_1,\dots,i_P,k_1,\dots,k_N}\, \mathscr{B}_{k_1,\dots,k_N,j_1,\dots,j_M}$

where contraction is performed over consecutive shared modes (Pandey et al., 2023). These algebraic primitives typically preserve associativity, distributivity, and mimic the operator structure of matrix algebra under suitable matricization.

2. Algorithmic and Programming Abstractions

TAPP is operationalized through formal APIs and concise operator languages, supporting both dense and sparse regimes:

The TAPP C interface (Brandejs et al., 12 Jan 2026) provides an abstract, backend-portable API for tensor contraction $D := \alpha\cdot A\star B+\beta\cdot C$ , where tensors are described by opaque descriptors (shape, stride, data type) and contractions are specified by index-labeled plans. The interface supports explicit configuration of execution resources and is implemented with handle-based resource management, error reporting, and support for both real and complex types.
The Tensor-Tensor Product Toolbox (Lu, 2018) realizes TAPP in MATLAB as reference t-product and t-SVD routines, exploiting fast implementations via batched FFT, facewise matrix multiplication, and carefully optimized in-place layouts.
In sparse settings, TAPP encompasses schema languages to describe data structures (e.g., BSTs, block lists) for dynamic sparsity, along with abstract append_first, append_rest, and build primitives to assemble arbitrary pointer-based tensor formats (Chou et al., 2021). TAPP-generated code is competitive with, or outperforms, hand-tuned libraries for sparse algebra kernels.
At the architectural level, TAPP-inspired virtual Tensor ISAs (e.g., Tensor Processing Primitives—TPP) expose a compact set of 2D memory-to-memory operators (unary, binary, ternary), which can be composed to define all layers in DL and HPC workloads (Georganas et al., 2021). High-dimensional tensor computations are decomposed into tiles and dispatched as highly optimized micro-kernels for vector CPUs, GPUs, or custom accelerators.

3. Structural and Algebraic Properties

The algebraic operations included in TAPP frameworks typically adhere to strong mimetic properties:

Primitive	Associativity	Distributivity	Existence of Identity	Invertibility
t-product/ $\star_M$	Yes	Both slots	Yes	Yes, iff each FFT frontal slice invertible (Avron et al., 3 Jun 2025)
Einstein product	Yes	Both slots	Yes (Kronecker delta)	Yes, iff the matricization is invertible (Pandey et al., 2023)
Hadamard/Kronecker	Yes	Both slots	Yes (all-ones or identity)	Yes (elementwise/non-singular)

These properties enable best-approximation theorems (e.g., Eckart–Young optimality for t-SVD (Avron et al., 3 Jun 2025)), module structures over rings of tubes, and seamless generalization of classical linear algebra results to the tensor regime (e.g., orthogonality/unitarity, norm definitions, SVD/EVD decomposition).

4. Implementation and Complexity Considerations

TAPP implementations exploit a range of computational techniques to reach high efficiency:

The t-product is efficiently computed via FFT along the third (tube) mode, batched matrix-matrix multiplies in the Fourier domain, and inverse FFT (Lu, 2018). The overall complexity is $O(m p n \log n)$ for t-product and $O(n \cdot \max(m,p)^3)$ for t-SVD, with memory layouts engineered to avoid the explicit formation of block-circulant matrices when possible.
For sparse tensors, blocked sparse matrix multiplications (BSMMs) are heavily leveraged, often mapping high-order contractions to collections of rectangular GEMMs over dense tiles, combined via a block-CSR format. Distributed-memory parallelism uses 2D block-cyclic distribution with hand-tuned communication and dense kernel invocation (Sivkov et al., 2019).
The TAPP C reference implementation provides portability through exhaustive type dispatch, stride computation, and loop-generation logic verified against existing libraries (e.g., TBLIS, cuTENSOR) (Brandejs et al., 12 Jan 2026). Dynamically, resource allocation and autotuning are exposed through key-value extension channels.
Compiler-level TAPP frameworks (e.g., TACO, dynamic sparse tensor algebra compilation) generate iteration and assembly routines tailored for the logical data structure of nonzeros, yielding efficient kernels for evolving sparse formats (Chou et al., 2021).

5. Applications Across Scientific and Machine Learning Domains

TAPP primitives support a diverse range of high-impact applications:

Hyperspectral and medical imaging: Tubal algebra and t-product-based decompositions preserve native multi-dimensional structure, outperforming matrix-based flattening in compressibility and noise-robustness (Avron et al., 3 Jun 2025).
Neural dynamics and graph processing: Tensor convolution along temporal and topological modes is realized via t-product or Einstein-product in dynamic graph convolution networks (Avron et al., 3 Jun 2025).
Stable and parameter-efficient neural architectures: t-product layers replace matrix-multiply in deep learning, providing improved stability and compression (Avron et al., 3 Jun 2025).
Low-rank tensor recovery and completion: Nuclear norm minimization and t-SVD are used for recovery of missing values under multilinear constraints (Lu, 2018).
Multi-domain signal processing: Einstein contraction and multi-modal convolution define filtering, channel equalization, and transceiver design in systems such as MIMO-CDMA (Pandey et al., 2023).
Quantum chemistry and many-body physics: High-order tensor contractions are central to coupled-cluster methods, with TAPP standardization promoting backend interoperability (e.g., DIRAC package using TBLIS or cuTENSOR) (Brandejs et al., 12 Jan 2026).
Large-scale PDE solvers and 4D reconstruction: TAPP enables separable and low-rank formulations, drastically reducing storage and computation (e.g., 33M-unknown 4D MRI solved with $>10\times$ speedup over matrix-based approaches) (Morozov et al., 2010).

6. Standardization and Extensibility under the TAPP Initiative

The emergence of a common TAPP standard (Brandejs et al., 12 Jan 2026) is grounded in consensus-driven specification, engaging academic, industrial, and hardware communities. The standard defines:

A uniform C application interface for contractions ( $D = \alpha A \star B + \beta C$ ) with support for explicit mode labeling, dynamic execution resources, and guaranteed correctness through a verified reference.
Backward compatibility and drop-in support for vendor libraries (e.g., performance parity with TBLIS, transparent GPU acceleration with cuTENSOR, and quantum chemistry codes such as DIRAC).
Extensible performance-tuning via “virtual key-value” APIs, algorithmic preferences, and planned “trampoline” dispatch layers to select optimal kernels at runtime.
Ongoing efforts to extend the primitive set to sparse/dense combinations, tensor decompositions, and more complex indexed updates, driven by a curated benchmark suite (TAPP-Bench) representative of AI, chemistry, and ML workloads.

7. Outlook and Future Research Directions

Research and community priorities around TAPP include expanding support for:

Additional operations beyond core contractions, notably decompositions (CP, Tucker), generalized outer products, and structured updates.
Performance-portable implementations of input reductions, broadcasts, and array-wise extension to sparse and block-sparse tensor storage formats.
Deep integration with novel hardware backends, with abstraction layers supporting runtime autodetection, autotuning, and migration to next-generation accelerators or specialized photonic computing platforms (Redrouthu et al., 2022).
Theoretical advances in the algebraic categorization of tensor products, decompositions, and their uniqueness properties, particularly in the context of structured or sparse domains.

TAPP’s evolving ecosystem will continue to deliver a rigorous operational and algebraic foundation for portable, high-performance tensor computations across scientific, engineering, and data-intensive domains (Brandejs et al., 12 Jan 2026, Avron et al., 3 Jun 2025, Lu, 2018, Pandey et al., 2023, Georganas et al., 2021, Sivkov et al., 2019, Chou et al., 2021, Morozov et al., 2010).