- The paper validates the linear scaling law of the expressibility gap in LLMs with an unprecedented R² of 0.9997.
- It employs float32 margin recomputation and Fisher information distance maximization to refine token margins with minimal collateral damage.
- Empirical results reveal a distinct mid-layer ambiguity regime and a concentration of gains among high-frequency structural tokens.
Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of LLMs
Summary and Contributions
This paper presents an empirical investigation of the geometric structure induced by the Voronoi tessellation in the latent representation space of a LLM, specifically Qwen3.5-4B-Base, a 4.2B-parameter transformer with a Gated DeltaNet architecture and a large vocabulary set (248,320 tokens). By directly validating and extending the latent semantic manifold framework introduced by Mabrok (2026), the authors rigorously analyze how discrete tokenization interacts with continuous semantic representations, confirming that the expressibility gap, i.e., the measure of undecidable regions in representation space due to vocabulary limitations, follows the predicted linear scaling law. This is demonstrated by a log-log regression yielding an unprecedented R2=0.9997, supporting Theorem 10.5 from Mabrok's theoretical work.
A core methodological advance is the elimination of quantization artifacts arising from bfloat16 inference, replaced here by float32 margin recomputation, which restores fine-grained analysis of the geometric structure. The paper establishes strong baseline metrics for both the gap coefficient (α=0.762) and layerwise correlations between geometric regularization losses and cross-entropy (Spearman ρ=0.836 at the final layer). The authors identify a pronounced mid-layer ambiguity regime (layers 24–28) in which CE-MRP correlation is negative, suggesting structural geometric uncertainty persists prior to final decoding.
Two forms of post-hoc intervention for margin refinement (MRP) are systematically compared: direct margin maximization and Fisher information distance maximization. The Fisher method uniquely achieves substantial margin improvement (+28% at λMRP=0.6) with no degradation in downstream benchmark performance, and collateral damage remains constant across tested λ values, contrasting with the destructive effects observed in aggressive direct margin maximization. The interventions reveal a fixed reservoir of “geometrically accessible” corrections (approximately 16,300 positions per 256K evaluated), but both frequency and token-class audits highlight a strong concentration of gains among high-frequency structural and head tokens as λMRP increases.
Theoretical Framework and Methodology
The latent semantic manifold framework models contextual hidden states at each layer as a smooth Riemannian submanifold M(l)⊂Rd. The final unembedding projects these states, yielding a Voronoi tessellation—a discrete partitioning defining which token is assigned to each region of the semantic manifold. The Voronoi margin is the logit gap (m(h)=ℓt∗(h)−ℓt∗∗(h)) between top-1 and top-2 candidate tokens. The expressibility gap η(ε) is the fraction of positions where the margin falls below threshold ε, quantifying how much semantic context fails to be confidently expressed.
The scaling law, α=0.7620, is validated at unprecedented precision on Qwen3.5-4B-Base, extending Mabrok’s prior multi-model empirical confirmation. The float32 margin recomputation is crucial for restoring the continuous structure underlying the quantized representation induced by bfloat16.
Margin refinement techniques are rigorously characterized:
- Direct Margin Maximization: Low-complexity loss that directly separates top candidates, but the method exhibits rapidly escalating collateral damage as α=0.7621 increases, ultimately degrading performance beyond α=0.7622.
- Fisher Information Distance Maximization: Uses a probability-weighted Fisher metric (derived from softmax covariance) to maximize distinguishability among top-k candidates. Unlike margin maximization, Fisher maximization moves hidden representations along directions of greatest information divergence, enabling more natural separation.
Experiments were performed using dose-response sweeps over α=0.7623, reporting detailed margin statistics, per-position churn analysis, banded accuracy audits, and downstream evaluation on six text generation benchmarks.
Empirical Results
Scaling Law and Margin Statistics
- The scaling law holds with α=0.7624, slope α=0.7625 (consistent with theoretical and prior empirical ranges), and gap coefficient α=0.7626.
- Qwen3.5-4B’s median margin (1.03) outperforms prior models, indicating stronger tessellation quality.
- Bias toward high-frequency tokens is evident, with gains increasingly concentrated as α=0.7627 is raised.
Layerwise Geometry Evolution
- Early layers (4–20): margins near zero, geometry uninformative.
- Mid layers (24–28): geometric ambiguity regime, negative CE-MRP correlation, indicating substantial uncertainty persists prior to decoding.
- Final layers: strong positive correlation, redundancy between CE and MRP loss, geometry directly aligns with prediction quality.
Intervention Efficacy and Damage Profile
- Both margin maximization and Fisher methods yield similar counts of correctable positions, but their damage profiles differ. Margin maximization exhibits super-linear escalation of collateral damage and flip ratio collapse as intervention strength increases, while Fisher maximization maintains constant damage (~5,300 positions) and preserves downstream benchmarks.
- Runner-up rotation is a dominant mechanism, with Fisher effectively rotating the second-place token to one that is more naturally separable, improving margins without forced displacement.
Token-Level Analysis
- Frequency and token-class audits dissect where gains accrue. At α=0.7628, 84% of net corrections are from high-frequency tokens; at higher α=0.7629, the concentration rises above 92%.
- Structural tokens (punctuation, formatting, whitespace) account for most corrections; gains in content and entity-like tokens diminish, and in instruct-style fine-tuned models entity-like tokens become net negative.
Downstream Evaluation
- Flat performance on all downstream benchmarks, maximum deviation from baseline mean <0.005, indicating margin refinements do not degrade task-level capabilities within validated ρ=0.8360 range.
Implications and Future Directions
Practical
Fisher information distance maximization provides a post-training mechanism for geometric margin enhancement that is both efficient and minimally disruptive within a carefully constrained parameter range (ρ=0.8361). Gains are concentrated in high-frequency structural tokens, suggesting utility for calibration, speculative decoding confidence, and possibly domain-specific applications where structural clarity dominates. However, aggregate token accuracy improvements and flat benchmarks do not guarantee uniformly improved utility across token classes, raising caution when applying bulk geometric polishing to domain-sensitive content and entity tokens.
Theoretical
The expressibility gap scaling law is validated to the highest precision reported. The persistence of the linear scaling character under post-hoc geometric reshaping demonstrates the topological invariance of the Voronoi tessellation structure, though its precise allocation of separability remains plastic.
The identification of a mid-layer ambiguity regime aligns with recent findings on hallucination-associated neurons and reinforces the role of representational uncertainty in model calibration and latent geometry.
Open Problems and Future Research
The evidence motivates several open directions:
- Token-value-aware refinement: Current Fisher methods require weighting or protection schemes to avoid head/structural token concentration at higher intervention strengths. More surgical interventions are needed for content- and entity-focused accuracy improvements.
- Parameter-scope ablation: Isolation of the minimal subset of parameters responsible for geometric improvements (e.g., output layer, adapters) may enable more precise editing.
- Training-time geometry shaping: Incorporation of margin-oriented objectives during pretraining may provide a more favorable substrate for subsequent polishing.
- Calibration metrics and rare-token performance: Extension to entropy penalty baselines, Brier score, and ECE metrics is necessary to characterize practical calibration outcomes beyond margin width.
- Model diversity and scaling: Larger models may afford greater slack for geometry-aware refinement, per evidence from intrinsic dimension analyses.
Conclusion
This work rigorously validates the latent semantic manifold framework in a large-scale transformer, demonstrating that the expressibility gap scaling law holds robustly, and confirming that the Voronoi tessellation in representation space remains both plastic and topologically invariant under post-hoc geometric intervention. Fisher information distance maximization is shown to be an effective and low-damage method for margin refinement, but practical utility is constrained by a concentration of gains among high-frequency structural tokens, especially as intervention strength increases. The results foreground the importance of token-value-aware refinement for future work in geometric editing, both for domain-sensitive deployment and for further theoretical advances in LLM latent manifold analysis.