Tapered-Precision Representations
- Tapered-Precision Representations are non-uniform numerical formats that adapt exponent, fraction, and regime fields to allocate high precision near key values like unity and zero.
- They employ diverse encoding schemes (e.g., TFX, Takum, Tekum) to achieve superior dynamic range and reduced quantization error compared to classical floating-point systems.
- TPRs enable efficient hardware and software implementations, significantly improving neural network quantization and general-purpose computing through tailored precision and dynamic range.
Tapered-Precision Representations (TPRs) are a broad class of numerical formats for real (and extended) arithmetic in which the distribution of representable values—precision and dynamic range—varies non-uniformly across the number line. Unlike classical floating-point systems, TPRs allocate exponent, fraction, and sometimes regime fields variably or with fields whose definition is locally adapted, yielding finer discretization (higher precision) near key regions (often unity or zero) and coarser representation at large/small magnitudes. Notable instances include posit, redundant nonadjacent radix-2, Takum and Tekum (logarithmic and ternary-tapered), various tapered fixed-point and Morris-family formats, along with hardware-specific variants targeting efficient inference and computation. TPRs have demonstrated substantial advantages in ultra-low-bit quantization, dynamic range, coding efficiency, and resilience to rounding artifacts, particularly on neural workloads, hardware accelerators, ternary logic platforms, and general-purpose computing.
1. Mathematical Principles and Core Formats
Tapered-precision designs generally comprise a set of fields whose size, role, or interpretation is adapted according to the magnitude of the stored value. Representative formats include:
- Tapered Fixed-Point (TFX(n, IS, SC)): Total bitwidth is partitioned into a sign, an up-to--bit "signed unary" regime (or integer), explicit fraction bits, and typically a small scaling exponent common to a group or layer. , with the integer itself derived from unary run-length. By tuning , the TFX format can concentrate density near zero (small ) or stretch dynamic range (large ), leading to a tapered granularity that aligns with parameter or data distributions (Langroudi et al., 2021).
- Redundant Signed Radix 2 (Canonical-Recoding Tapered FP): Numbers are encoded as vectors of ternary digits in canonical (nonadjacent) form, split at a uniquely defined boundary into exponent and mantissa. The exponent is mapped from a run of nonzeros, the mantissa from the remainder. The value is . Unified field structure allows peak -trit precision near , with smoothly decaying local precision for large exponents (Schoenbaum, 2021).
- Takum Arithmetic: Each -bit word is parsed into a sign bit, direction bit, 3-bit regime, -bit characteristic, and bits of mantissa. Decoding proceeds via true logarithmic coding: , with , . The regime controls the size of , which can be up to 8 bits, and (mantissa) sets the local granularity. Notably, dynamic range is independent of beyond , and relative error bounds are strictly tighter than linear-fraction floats at the same (Hunhold, 2024).
- Tekum (Balanced Ternary Tapered Arithmetic): Words are trits (balanced ternary digits), split into regime (), exponent (), and fraction (). The regime determines the size of the exponent field, which is biased as a function of , with sign implicit from the top nonzero trit. Real numbers are decoded as (or logarithmic variant), and dynamic range is designed to span at (trits) (Hunhold, 25 Nov 2025).
- Morris-family Tapered Float with Hidden Exponent Bit (HEB): MorrisHEB, MorrisBiasHEB, and MorrisUnaryHEB generalize tapered allocation. All use a hidden high bit for the exponent, allowing implicit leading-one normalization. MorrisUnaryHEB, for instance, has a unary run-length regime field followed by a variable-length exponent and remaining bits as fraction, yielding a dense “golden zone” with maximal local precision near unity (Ciocirlan et al., 2023).
2. Dynamic Range, Precision, and Quantization Error
TPRs offer highly non-uniform error profiles tailored to application needs:
- Dynamic Range: Takum’s dynamic range is fixed at (approximately ) for all (Hunhold, 2024), whereas Tekum achieves at modest word sizes (Hunhold, 25 Nov 2025). Morris-family and canonical recoding variants realize ranges surpassing IEEE-754 and posit for comparable bit or trit widths (Ciocirlan et al., 2023, Schoenbaum, 2021).
- Precision “Tapering”: Relative error (ULP spacing) and the number of fraction bits (effective precision) vary by regime or exponent length. In TFX, the quantization step is narrowest near zero and widens linearly with —empirically reducing quantization error by 20–50% in deep networks relative to fixed-point (Langroudi et al., 2021). Takum and Tekum maintain minimum precision bits or equivalent per value, with worst-case error at regime extremes, but achieve maximum density in the “golden zone” ().
- Benchmarks: On neural models (ConvNet, ResNet-18), 8-bit TFX yields, e.g., accuracy on MNIST ( over FP) and on CIFAR-10 ( over FP), with only 17–30% energy-delay overhead (Langroudi et al., 2021). MorrisUnaryHEB achieves denser “golden zone” and up to exactness in additions on 12-bit random data (Ciocirlan et al., 2023).
| Format | Dynamic Range | “Golden Zone” Population (16b/trit) | Worst-case Error |
|---|---|---|---|
| IEEE-754 binary16 | 26,587 | ||
| Posit16 | 26,587 | Variable | |
| MorrisUnary16 | 30,201 | Region-dependent | |
| Tekum10 (16b) | trits | – | |
| Redundant-canonical 16trit | Full field at |
3. Encoding, Decoding, and Arithmetic Algorithms
Encoding and decoding schemes are format-specific but share key traits:
- Regime and Exponent Partitioning: Principle of run-length or prefix-coding for regime (e.g., signed unary in TFX, run of like trits/bits in Tekum, unary regime in MorrisUnaryHEB).
- Hidden Bits and Exponent Bias: HEB formats utilize an implicit leading '1' in the exponent to maximize usable code space (Ciocirlan et al., 2023).
- Canonical Recoding: Redundant radix-2 codes ensure only one maximal contiguous run of nonzero digits, with unique point-separation mapping (Schoenbaum, 2021).
- Addition and Multiplication: Most formats translate input fields to magnitudes, perform fixed-point alignment or logarithmic combination (Takum: Gaussian log; Tekum: ternary integer addition), and re-encode, with normalization and rounding ensuring legal field partitioning (Ciocirlan et al., 2023, Hunhold, 2024, Hunhold, 25 Nov 2025). Takum, for example, performs multiplication and inversion in the -domain (exponent plus fraction), while addition requires Gaussian log correction.
- Exactness: Certain formats guarantee exact inversion (100%), high rates of exact root and multiplication (up to for Takum16), and increased probability of exact sums around unity for tapered regimes (Hunhold, 2024, Ciocirlan et al., 2023).
4. Hardware, Implementation, and Energy Considerations
Several TPRs are explicitly designed for hardware simplicity or energy/area efficiency:
- TFX (TENT) Accelerator: 16×16 systolic array, each MAC employing TFX decoding (leading-zero count, regime extraction), accumulation in a wide “quire,” and post-accumulation re-encoding. Postponed quantization mitigates truncation error. Decoder logic adds some area/latency overhead versus pure fixed-point but avoids floating-point ALU cost (Langroudi et al., 2021).
- Takum/Tekum: Parsing regime and characteristic fields never exceeds first 12 bits/trits, making logic constant for all precision levels. Arithmetic (besides addition) is a series of fixed-point operations and small lookup-table evaluations (e.g., for in Takum) (Hunhold, 2024). Tekum leverages integer-to-balanced-ternary arithmetic, which can be implemented with fixed-width adders; no floating-point normalization step is required (Hunhold, 25 Nov 2025).
- Ternary Hardware: Tekum and canonical-recoding formats naturally target balanced ternary logic, with potential information and energy density gains per trit but requiring further maturity of ternary hardware, e.g., CNTFET and Josephson junction circuits (Hunhold, 25 Nov 2025, Schoenbaum, 2021).
- Software Libraries: MorrisHEB and related types have high-level implementations in Scala, enabling rapid prototyping and benchmarking, with seamless switching between tapered and IEEE/posit types (Ciocirlan et al., 2023).
5. Comparative Evaluation and Empirical Results
Quantitative comparisons among major TPRs, posit, and IEEE-754 reflect:
- Dynamic Range: Canonical-recoding, Takum, Tekum, and MorrisUnaryHEB have dynamic ranges superior to posit and (often by many orders-of-magnitude) IEEE-754 at comparable wordwidths (Hunhold, 2024, Schoenbaum, 2021, Hunhold, 25 Nov 2025, Ciocirlan et al., 2023).
- Precision Distribution: Peak precision is realized at or near unity (full field in canonical-recoding, maximized golden zone in MorrisUnaryHEB and Takum). Precision tapers toward range extremities, matching the heavy-tailed characteristics common in DNN weights and certain scientific workloads (Langroudi et al., 2021, Ciocirlan et al., 2023, Hunhold, 2024).
- Empirical Accuracy: Softmax classification on MNIST/CIFAR-10 (TFX), SI physical constant representation (Takum), and unary-ops accumulation (Morris-family) indicate that TPRs can match or exceed IEEE-754 and posit, especially in low-precision, edge, and inference scenarios (Langroudi et al., 2021, Hunhold, 2024, Ciocirlan et al., 2023).
- Arithmetic Closure: Takum achieves perfect closure for inversion, high closure for multiplication/root; Tekum’s symmetric structure encodes and NaR with monotonic order, and canonical recoding allows generalization to integer/boolean/complex types with type safety (Hunhold, 2024, Hunhold, 25 Nov 2025, Schoenbaum, 2021).
6. Extensions, Applications, and Open Directions
Extensions and applications for TPRs span multiple domains:
- Neural Quantization: TFX (in TENT) enables per-layer quantization matching the statistical range of DNN weights/activations, reducing overflow/clipping and maximizing utilization of representable points. No retraining or complex RL-parameter search is needed (Langroudi et al., 2021).
- Complex, Vector, and Algebraic Types: Canonical-recoding/point-signature frameworks naturally encode integers, booleans, complex numbers, vectors, quaternions, and octonions, supporting direct hardware type tagging (Schoenbaum, 2021).
- ISA and Security: Some tapered formats (e.g., canonical-recoding/point-signature) enable type-safe instruction/data tagging, blocking certain memory corruption or code-injection attacks (Schoenbaum, 2021).
- Ternary Platforms and Beyond: Tekum and recoded schemes are positioned for next-generation ternary logic platforms, targeting energy, bandwidth, and information density improvements (Hunhold, 25 Nov 2025).
- Potential Future Work: Open challenges include subnormal/gradual underflow representation, fused multiply-add or dot-product pipelines for ternary wordlengths, further error analysis for ML and scientific workloads, and hardware design for efficient ternary FMA. Takum’s constant dynamic range, algebraic properties, and hardware simplicity point to further adoption in both scientific HPC and embedded/edge inference (Hunhold, 2024, Hunhold, 25 Nov 2025).
7. Format Diversity and Selection Guidance
Several TPRs are tailored to different application and hardware priorities:
- MorrisBiasHEB: Drop-in IEEE-754 replacement, comparable arithmetic, better range, simple decode/encode (Ciocirlan et al., 2023).
- MorrisUnaryHEB: Extra-wide golden zone, highest addition exactness, recommended for ML inference/training (Ciocirlan et al., 2023).
- Takum: Uniform dynamic range beyond , optimal for broad-range scientific and general-purpose (Hunhold, 2024).
- Tekum: Balanced ternary, optimal for future ternary logic hardware with minimal field overhead (Hunhold, 25 Nov 2025).
- TFX (TENT): Edge inference under tight power envelopes ( bits), per-layer matching to DNN stats, hardware-friendly (Langroudi et al., 2021).
A plausible implication is that selection should be guided by intended system architecture, workload locality (e.g., prevalence of “dense” vs. “tail” magnitudes), energy/area constraints, and infrastructure readiness for ternary/binary encodings. The convergence of hardware, coding, and numerical analysis in TPRs indicates a growing opportunity for new classes of algorithms and hardware-software co-design.