Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tapered Precision Encoding: Posit and Takum

Updated 5 March 2026
  • Tapered-precision encoding is a floating-point representation that dynamically reallocates bits between exponent and fraction to optimize local precision near unity.
  • Posit employs a variable-length regime field offering efficient coding near zero while sacrificing precision for extreme values, whereas Takum uses bounded fields to guarantee a minimum precision and fixed dynamic range.
  • Both formats facilitate hardware acceleration with reduced power and area requirements, enhancing numerical performance in HPC and machine learning applications.

Tapered-Precision Encoding (Posit/Takum)

Tapered-precision encoding refers to a family of nonuniform floating-point representations in which the distribution of precision (mantissa/fraction width) and dynamic range (exponent width) varies systematically along the real line. The hallmark of these formats is a regime field—usually variable-length—that dynamically reallocates bitwidth between exponent and fraction depending on the value’s magnitude, providing high local precision near unity and proportionally increased exponent range for very large or very small magnitudes. The principal modern realizations are the posit format, pioneered by Gustafson and Yonemoto, and the more recent takum format, which modifies regime and exponent handling for improved dynamic range, coding efficiency, and integer representation.

1. Mathematical Definition and Bit-Level Structure

Both posit and takum formats partition an nn-bit word into:

  • a 1-bit sign field,
  • a regime field (a variable-length prefix code in posits, a small fixed or variable-length field in takum),
  • a short exponent or characteristic field,
  • and a fraction (mantissa) field occupying the remaining bits.

Positn,esn,es

Let nn be the total bit-width and eses the exponent size parameter:

Field Width (bits, variable) Value Contribution
Sign (ss) 1 (1)s(-1)^s
Regime (kk) 1n11 \ldots n-1 (run of bits) useedkuseed^k, useed=22esuseed=2^{2^{es}}
Exponent (ee) up to eses bits (if space permits) 2e2^e
Fraction (ff) remaining bits $1+f$, f[0,1)f\in[0,1)

The regime encodes an integer kk by unary run-length: for a run of m1m\ge 1 leading $1$s, k=+m1k=+m-1; for m1m\ge1 $0$s, k=mk=-m. The represented real value:

x=(1)suseedk2e(1+f)x = (-1)^s\,useed^k\,2^e\,(1+f)

Special cases: all-zeros encodes $0$; $1$ followed by all-zeros encodes NaR (Not a Real).

Takumnn

Several concrete variants have appeared; the “linear” takum and the logarithmic takum (“Takum LNS”) are most prominent.

Linear takum (n12n\ge12):

Field Width (determinate) Value Contribution
Sign (SS) 1 (1)S(-1)^S
Direction (DD) 1 determines encoding for regime/char.
Regime (R2R1R0R_2R_1R_0) 3 r{0,1,,7}r\in\{0,1,\ldots,7\}
Characteristic (CC) rr (0–7) c[255,254]c \in [-255,254] (varies with DD)
Mantissa (MM) p=n5rp=n-5-r m[0,1)m\in[0,1)

Decoded logarithmic value:

=(1)S(c+m),x=(1)S2\ell = (-1)^S (c+m),\quad x = (-1)^S\,2^{\ell}

For the linear (floating-point) variant, x=(1)S(1+f)2ex = (-1)^S (1+f)2^e with appropriate field mappings.

The regime and characteristic fields are bounded (r7r\le 7, c255|c|\le 255), making the dynamic range constant for all n12n\ge12 and decoupling it from the overall word size.

2. Tapering Mechanism and Precision-Range Tradeoff

The defining principle of tapered-precision encoding is the allocation of more bits to the fraction near x1|x|\approx 1 and reallocating those bits to the exponent (or regime+exponent, or characteristic) as x|x| deviates far from unity.

  • Posit: The variable-length regime means that extreme values consume many bits for regime, leaving few for fraction—precision “tapers” off (decreases) far from x=1|x|=1. Around k=0k=0, nearly all remaining bits go to the fraction, maximizing local relative precision.
  • Takum: The regime field is either fixed-width (linear takum) or bounded (logarithmic takum), so precision loss at the range extremes is limited (at least n12n-12 bits of precision for n12n\ge12). Takum thus provides a lower bound on minimum local precision, preventing catastrophic drops in accuracy for large x|x| and enforcing a constant, bounded dynamic range independent of nn (Hunhold, 2024, Hunhold, 2024).
  • Comparative: Posits achieve maximum coding efficiency and minimum bit cost for exponents near zero, but for large exponents, the regime run length increases linearly, consuming more bits. Takum maintains bounded regime+characteristic length, so its bit cost plateaus for large e|e|.

3. Hardware Implementation and Algorithmic Decoding

Implementing tapered-precision arithmetic, especially for variable-length fields, is nontrivial.

  • Posit Hardware: Hardware must extract the run-length encoded regime, align fields, and perform fused arithmetic. Designs such as the TALU (Dube et al., 30 Sep 2025) use parallel Q-function threshold comparators and a two-stage pipeline, resulting in 54.6× lower power and 19.8× area compared to unified MAC units for floats and posits. Bit-sliced logic time-multiplexes between integer, IEEE, and posit operations. In DPU-style architectures (Shah et al., 2021), precision scalability (8/16/32 bits) is supported within a single datapath running at up to 538 GOPS/W.
  • Takum Hardware: Takum’s bounded regime and characteristic fields permit fixed-width datapaths, smaller logic, and consistent latency. VHDL implementations for both linear and LNS takums demonstrate up to 50% LUT and 38% latency reduction versus state-of-the-art posit codecs (Hunhold, 2024). The absence of long variable run decoding translates to smaller, faster combinational critical paths and excellent FPGA scaling.
  • Encoding/Decoding: Both formats require special handling for zero and NaR. In Takum, characteristic extraction, biasing, and bit-wise operations are fixed for all n12n\ge12, and the minimal required operations are OR, shift, increment, and conditional negation.
  • Integration with SIMD ISAs: Using takum as the base format for AVX10.2-style SIMD instructions reduces ISA complexity by ∼60–75%, improves area/power, and simplifies decoder multiplexing (Hunhold, 18 Mar 2025).

4. Numerical Properties: Precision, Range, and Integer Representation

Dynamic Range

  • Posit: Dynamic range grows hyper-exponentially with nn and eses (22es(n2)2^{2^{es}(n-2)}), but at the cost of steep precision drop-off in the tails (Hunhold, 2024).
  • Takum: Constant, bounded dynamic range (2255,22542^{-255},2^{254} for LNS and linear variants) achieved as soon as n12n\ge12; this property is formally proved and enables robustness for large and small magnitudes (Hunhold, 2024, Hunhold, 2024, Hunhold et al., 2024).

Precision Profile and Rounding

  • Posit: Highest precision near unity (ϵ=2(n2es)\epsilon=2^{-(n-2-es)}) but rough ulp cliffs at regime boundaries; rounding is usually round-to-nearest-even, but catastrophic rounding errors may occur at boundaries.
  • Takum: Guarantees minimum precision pn12p\ge n-12, smoother ulp growth across the exponent field, and no catastrophic jumps at field boundaries; rounding can be unified geometrically across segments.

Integer Representability

  • Posit: The largest consecutive integer that an nn-bit posit can safely represent is 24(n3)/52^{\lfloor4(n-3)/5\rfloor}, strictly less than IEEE 754 for equivalent bit-widths (Hunhold, 2024).
  • Takum: Achieves or exceeds IEEE 754’s safe integer range: for n=32n=32, takum matches 2242^{24} (float32); for n=64n=64, takum attains 2552^{55} (float64: 2532^{53}). Asymptotically, takum’s representability grows like 2n/n2^n/n, surpassing 24n/52^{4n/5} for posit.

5. Numerical Performance in Scientific and Machine Learning Applications

HPC and Scientific Kernels

  • NAS Parallel Benchmarks: 32-bit posits provided $0.6$–$1.4$ decimal digits better accuracy than IEEE float32 in conjugate gradient, multigrid, LU/BT solvers, and FFT—a direct consequence of tapered precision near unity. Quire-based FMA in posit32 yielded additional accuracy, approaching float128 in some tests (Chien et al., 2019).
  • Sparse Linear Solvers: In sparse LU, QR, and GMRES, both posits and takums outperformed bfloat16 and IEEE floats at identical bit-widths. Linear takum delivered consistently lower forward and backward errors; for GMRES, takum8 enabled stable convergence where even float8 failed (Hunhold et al., 2024).
  • Spectral Methods (FFTs, PDEs): At low bit-width (8/16), takum outperformed bfloat16, float16, and posit in both accuracy and range; at 32+ bits, takums matched or narrowly outperformed posit across supercritical FFT/PDE tasks (Hunhold et al., 29 Apr 2025).
  • Krylov Subspace Methods: In the Arnoldi method, takum32 and takum64 converged reliably with significantly lower relative error than posit32/64 or IEEE64, due in part to smoother ulp distribution and higher dynamic range (Hunhold et al., 29 Apr 2025).

Machine Learning and TinyML

  • Training DNNs: 8–16-bit posits with layerwise centering, warm-up, and tensor-wise eses allow stable DNN training without model degradation, reducing required bit-width and saving memory/energy. Hardware MAC units achieve up to 80% power savings and 72% area reduction compared to float32 (Lu et al., 2019).
  • Quantized Inference: Tapered fixed-point (TFX) with unary regimes, inspired by posits, delivers significantly better quantization performance in TinyML with only 17–30% energy increase over fixed-point, closing the performance gap to full-precision models by up to 31% (Langroudi et al., 2021).

6. Broader Family: Variants and Extensions

The tapered-precision design space includes:

  • Morris HEB/BiasHEB/UnaryHEB: Morris-style tapers with a hidden exponent bit, providing up to two-three orders of magnitude more dynamic range and substantial improvements in exact addition rates and golden-zone population compared to posit and IEEE—especially valuable in ML settings (Ciocirlan et al., 2023).
  • Redundant Signed Radix-2 (NONADJ): Ternary encodings with nonadjacent signed digits produce unprecedented range and carry-free logic; dynamic range exceeds both posit and IEEE, though at the cost of requiring ternary logic hardware (Schoenbaum, 2021).
  • Tekum (Balanced Ternary): A full balanced-ternary trit implementation of tapered precision, offering symmetric range (10±87\sim10^{\pm87} at 64 bits), simpler encode/decode, intrinsic carry-free arithmetic, and optimal truncation rounding. Tekum’s ternary nature positions it for emerging CNTFET and Josephson technologies (Hunhold, 25 Nov 2025).

7. Practical Implications, Limitations, and Future Directions

Advantages and Impacts

  • Uniform support for low, medium, and high precision—single hardware datapaths process 8/16/32/64 bit operations across arithmetic types (Dube et al., 30 Sep 2025, Shah et al., 2021).
  • ISA simplification and vectorization benefits (e.g., AVX10.2 T8–T64 replacing all multiple 8/16/32-bit IEEE variants) with attendant area/power savings (Hunhold, 18 Mar 2025).
  • Universal machine learning acceleration with robust stability for mixed-precision pipelines, removal of subnormal/overflow edge cases, and elimination of “catastrophic rounding” events.

Limitations

  • Variable-length regime fields in posits increase hardware complexity (long critical paths for barrel shifters, leading-one detectors), lead to catastrophic ulp cliffs, and degrade integer representability in some settings (Hunhold, 2024, Hunhold, 2024).
  • Takum and derivatives, while solving many posit issues, require more involved encode/decode logic than purely fixed-point or IEEE. Some designs (e.g., Morris bias) require additional range-limiting or field-bias calculations.

Prospects

  • Pursuing standardization of tapered-precision in general-purpose ISAs and compilers.
  • Integration of quire/dot-product accumulators and expanded hardware support for vector/SIMD execution.
  • Application-specific regime/exponent allocation and hybrid architectures blending posit, takum, and balanced-ternary (tekum) arithmetic as device-level ternary technologies mature (Hunhold, 25 Nov 2025).
  • Robustification of integer representation and error-propagation analysis in deep combinatorial and scientific codes (Hunhold, 2024).

References (by arXiv id)

  • "Posit NPB: Assessing the Precision Improvement in HPC Scientific Applications" (Chien et al., 2019)
  • "A Compact, Low Power Transprecision ALU for Smart Edge Devices" (Dube et al., 30 Sep 2025)
  • "Training Deep Neural Networks Using Posit Number System" (Lu et al., 2019)
  • "A Tapered Floating Point Extension for the Redundant Signed Radix 2 System Using the Canonical Recoding" (Schoenbaum, 2021)
  • "Spectral Methods via FFTs in Emerging Machine Number Formats: OFP8, Bfloat16, Posit, and Takum Arithmetics" (Hunhold et al., 29 Apr 2025)
  • "Design and Implementation of a Takum Arithmetic Hardware Codec in VHDL" (Hunhold, 2024)
  • "Beating Posits at Their Own Game: Takum Arithmetic" (Hunhold, 2024)
  • "Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers" (Hunhold et al., 2024)
  • "Tekum: Balanced Ternary Tapered Precision Real Arithmetic" (Hunhold, 25 Nov 2025)
  • "DPU: DAG Processing Unit for Irregular Graphs with Precision-Scalable Posit Arithmetic in 28nm" (Shah et al., 2021)
  • "Integer Representations in IEEE 754, Posit, and Takum Arithmetics" (Hunhold, 2024)
  • "Streamlining SIMD ISA Extensions with Takum Arithmetic: A Case Study on Intel AVX10.2" (Hunhold, 18 Mar 2025)
  • "A Number Representation Systems Library Supporting New Representations Based on Morris Tapered Floating-point with Hidden Exponent Bit" (Ciocirlan et al., 2023)
  • "TENT: Efficient Quantization of Neural Networks on the tiny Edge with Tapered FixEd PoiNT" (Langroudi et al., 2021)
  • "Numerical Performance of the Implicitly Restarted Arnoldi Method in OFP8, Bfloat16, Posit, and Takum Arithmetics" (Hunhold et al., 29 Apr 2025)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tapered-Precision Encoding (Posit/Takum).