Non-uniform breakpoint spacing for exp2 piecewise-linear approximation

Investigate non-uniformly spaced breakpoint placement for the piecewise linear approximation of the exp2 function used in FSA’s SystolicAttention to implement the element-wise exponential within FlashAttention, and determine whether such non-uniform spacing reduces mean relative error compared to the current uniformly spaced eight-segment scheme.

Background

FSA implements the element-wise exp operation in FlashAttention as exp2 and approximates it using a piecewise linear interpolation executed within the systolic array. By splitting x into integer and fractional parts (x = x_i + x_f) and exploiting that x_f ∈ (-1, 0], the design uses uniform segments and streams slope/intercept coefficients to reuse MAC units for interpolation.

The authors evaluate the approximation over negative normal fp16 values and select an 8-segment uniform scheme, reporting MAE ≈ 1.4e-4 and MRE ≈ 2.728e-2. They explicitly note that exploring non-uniformly spaced breakpoints to further reduce MRE is left for future work, making the evaluation and design of such schemes an unresolved question in their framework.

References

We choose to use 8 segments in our FSA implementation, which achieves a MAE of 0.00014 and an MRE of 0.02728. Although non-uniformly spaced breakpoints could further reduce the MRE, we leave this exploration for future work.

SystolicAttention: Fusing FlashAttention within a Single Systolic Array  (2507.11331 - Lin et al., 15 Jul 2025) in Accuracy Analysis, subsubsection "Error Analysis of Piecewise Linear Approximation"