Validation of Arenas and 3:4 Sparsity at 70B+ Scale

Validate the behavior of the Arenas annealing residual synapse mechanism and the 3:4 structured sparsity pattern when applied to larger, server-grade large language models with 70B or more parameters within the Sherry 1.25-bit ternary quantization framework.

Background

Sherry introduces a hardware-aligned 3:4 structured sparsity and an Arenas annealing residual synapse to achieve 1.25-bit ternary quantization with strong edge deployment efficiency. The paper evaluates models up to 3B parameters and demonstrates competitive performance and speed advantages at these scales.

However, scaling behaviors of sparsity and training mechanisms can change in larger models due to differences in optimization dynamics, capacity, and hardware execution characteristics. The authors explicitly note that the behavior of Arenas and the 3:4 sparsity pattern at server-grade scales (70B+) has not yet been validated, indicating an unresolved question about Sherry’s applicability to much larger LLMs.

References

While we demonstrate that Sherry achieves a superior Pareto frontier for these scales, the behavior of the Arenas mechanism and the 3:4 sparsity pattern on larger, server-grade models (70B+) remains to be validated.

Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification  (2601.07892 - Huang et al., 12 Jan 2026) in Section: Limitation, paragraph 'Edge-Centric Model Scale'