Optimal precision allocation for deterministic quantization
Determine the optimal per-layer precision allocation for mixed-precision integer arithmetic quantization in transformer inference (for example, assigning INT16 precision to attention mechanisms and INT8 precision to feed-forward layers), and characterize how the required precision interacts with context length, so as to minimize calibration degradation while preserving deterministic inference guarantees.
References
The optimal precision allocation per layer type and the interaction between precision and context length are open optimization problems with direct impact on production deployment.
— On the Foundations of Trustworthy Artificial Intelligence
(2603.24904 - Dunham, 26 Mar 2026) in Section 12. Open Problems — Higher-precision deterministic quantization