Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models

Published 8 Apr 2026 in cs.LG and cs.CL | (2604.06767v1)

Abstract: LLMs operate on discrete tokens but compute in continuous vector spaces, inducing a Voronoi tessellation over the representation manifold. We study this tessellation empirically on Qwen3.5-4B-Base, making two contributions. First, using float32 margin recomputation to resolve bfloat16 quantization artifacts, we validate Mabrok's (2026) linear scaling law of the expressibility gap with $R^2$ = 0.9997 - the strongest confirmation to date - and identify a mid-layer geometric ambiguity regime where margin geometry is anti-correlated with cross-entropy (layers 24-28, $ρ$ = -0.29) before crystallizing into alignment at the final layer ($ρ$ = 0.836). Second, we show that the Voronoi tessellation of a converged model is reshapable through margin refinement procedures (MRP): short post-hoc optimization runs that widen token-decision margins without retraining. We compare direct margin maximization against Fisher information distance maximization across a dose-response sweep. Both methods find the same ceiling of ~16,300 correctable positions per 256K evaluated, but differ critically in collateral damage. Margin maximization damage escalates with intervention strength until corrections are overwhelmed. Fisher damage remains constant at ~5,300 positions across the validated range ($λ$ = 0.15-0.6), achieving +28% median margin improvement at $λ$ = 0.6 with invariant downstream benchmarks - a geometric reorganization that compresses the expressibility gap while preserving its scaling law. However, frequency and token-class audits reveal that gains concentrate in high-frequency structural tokens (84% of net corrections at $λ$ = 0.6), with content and entity-like contributions shrinking at higher $λ$. Fisher MRP is therefore a viable geometric polishing tool whose practical ceiling is set not by aggregate damage but by the uniformity of token-level benefit.

Abstract PDF Upgrade to Chat

Authors (1)

Marshall Brett

Summary

The paper validates the linear scaling law of the expressibility gap in LLMs with an unprecedented R² of 0.9997.
It employs float32 margin recomputation and Fisher information distance maximization to refine token margins with minimal collateral damage.
Empirical results reveal a distinct mid-layer ambiguity regime and a concentration of gains among high-frequency structural tokens.

Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of LLMs

Summary and Contributions

This paper presents an empirical investigation of the geometric structure induced by the Voronoi tessellation in the latent representation space of a LLM, specifically Qwen3.5-4B-Base, a 4.2B-parameter transformer with a Gated DeltaNet architecture and a large vocabulary set (248,320 tokens). By directly validating and extending the latent semantic manifold framework introduced by Mabrok (2026), the authors rigorously analyze how discrete tokenization interacts with continuous semantic representations, confirming that the expressibility gap, i.e., the measure of undecidable regions in representation space due to vocabulary limitations, follows the predicted linear scaling law. This is demonstrated by a log-log regression yielding an unprecedented $R^2 = 0.9997$ , supporting Theorem 10.5 from Mabrok's theoretical work.

A core methodological advance is the elimination of quantization artifacts arising from bfloat16 inference, replaced here by float32 margin recomputation, which restores fine-grained analysis of the geometric structure. The paper establishes strong baseline metrics for both the gap coefficient ( $\alpha = 0.762$ ) and layerwise correlations between geometric regularization losses and cross-entropy (Spearman $\rho = 0.836$ at the final layer). The authors identify a pronounced mid-layer ambiguity regime (layers 24–28) in which CE-MRP correlation is negative, suggesting structural geometric uncertainty persists prior to final decoding.

Two forms of post-hoc intervention for margin refinement (MRP) are systematically compared: direct margin maximization and Fisher information distance maximization. The Fisher method uniquely achieves substantial margin improvement (+28% at $\lambda_\mathrm{MRP}=0.6$ ) with no degradation in downstream benchmark performance, and collateral damage remains constant across tested $\lambda$ values, contrasting with the destructive effects observed in aggressive direct margin maximization. The interventions reveal a fixed reservoir of “geometrically accessible” corrections (approximately 16,300 positions per 256K evaluated), but both frequency and token-class audits highlight a strong concentration of gains among high-frequency structural and head tokens as $\lambda_\mathrm{MRP}$ increases.

Theoretical Framework and Methodology

The latent semantic manifold framework models contextual hidden states at each layer as a smooth Riemannian submanifold $M^{(l)} \subset \mathbb{R}^d$ . The final unembedding projects these states, yielding a Voronoi tessellation—a discrete partitioning defining which token is assigned to each region of the semantic manifold. The Voronoi margin is the logit gap ( $m(h) = \ell_{t^*}(h) - \ell_{t^{**}}(h)$ ) between top-1 and top-2 candidate tokens. The expressibility gap $\eta(\varepsilon)$ is the fraction of positions where the margin falls below threshold $\varepsilon$ , quantifying how much semantic context fails to be confidently expressed.

The scaling law, $\alpha = 0.762$ 0, is validated at unprecedented precision on Qwen3.5-4B-Base, extending Mabrok’s prior multi-model empirical confirmation. The float32 margin recomputation is crucial for restoring the continuous structure underlying the quantized representation induced by bfloat16.

Margin refinement techniques are rigorously characterized:

Direct Margin Maximization: Low-complexity loss that directly separates top candidates, but the method exhibits rapidly escalating collateral damage as $\alpha = 0.762$ 1 increases, ultimately degrading performance beyond $\alpha = 0.762$ 2.
Fisher Information Distance Maximization: Uses a probability-weighted Fisher metric (derived from softmax covariance) to maximize distinguishability among top-k candidates. Unlike margin maximization, Fisher maximization moves hidden representations along directions of greatest information divergence, enabling more natural separation.

Experiments were performed using dose-response sweeps over $\alpha = 0.762$ 3, reporting detailed margin statistics, per-position churn analysis, banded accuracy audits, and downstream evaluation on six text generation benchmarks.

Empirical Results

Scaling Law and Margin Statistics

The scaling law holds with $\alpha = 0.762$ 4, slope $\alpha = 0.762$ 5 (consistent with theoretical and prior empirical ranges), and gap coefficient $\alpha = 0.762$ 6.
Qwen3.5-4B’s median margin (1.03) outperforms prior models, indicating stronger tessellation quality.
Bias toward high-frequency tokens is evident, with gains increasingly concentrated as $\alpha = 0.762$ 7 is raised.

Layerwise Geometry Evolution

Early layers (4–20): margins near zero, geometry uninformative.
Mid layers (24–28): geometric ambiguity regime, negative CE-MRP correlation, indicating substantial uncertainty persists prior to decoding.
Final layers: strong positive correlation, redundancy between CE and MRP loss, geometry directly aligns with prediction quality.

Intervention Efficacy and Damage Profile

Both margin maximization and Fisher methods yield similar counts of correctable positions, but their damage profiles differ. Margin maximization exhibits super-linear escalation of collateral damage and flip ratio collapse as intervention strength increases, while Fisher maximization maintains constant damage (~5,300 positions) and preserves downstream benchmarks.
Runner-up rotation is a dominant mechanism, with Fisher effectively rotating the second-place token to one that is more naturally separable, improving margins without forced displacement.

Token-Level Analysis

Frequency and token-class audits dissect where gains accrue. At $\alpha = 0.762$ 8, 84% of net corrections are from high-frequency tokens; at higher $\alpha = 0.762$ 9, the concentration rises above 92%.
Structural tokens (punctuation, formatting, whitespace) account for most corrections; gains in content and entity-like tokens diminish, and in instruct-style fine-tuned models entity-like tokens become net negative.

Downstream Evaluation

Flat performance on all downstream benchmarks, maximum deviation from baseline mean <0.005, indicating margin refinements do not degrade task-level capabilities within validated $\rho = 0.836$ 0 range.

Implications and Future Directions

Practical

Fisher information distance maximization provides a post-training mechanism for geometric margin enhancement that is both efficient and minimally disruptive within a carefully constrained parameter range ( $\rho = 0.836$ 1). Gains are concentrated in high-frequency structural tokens, suggesting utility for calibration, speculative decoding confidence, and possibly domain-specific applications where structural clarity dominates. However, aggregate token accuracy improvements and flat benchmarks do not guarantee uniformly improved utility across token classes, raising caution when applying bulk geometric polishing to domain-sensitive content and entity tokens.

Theoretical

The expressibility gap scaling law is validated to the highest precision reported. The persistence of the linear scaling character under post-hoc geometric reshaping demonstrates the topological invariance of the Voronoi tessellation structure, though its precise allocation of separability remains plastic.

The identification of a mid-layer ambiguity regime aligns with recent findings on hallucination-associated neurons and reinforces the role of representational uncertainty in model calibration and latent geometry.

Open Problems and Future Research

The evidence motivates several open directions:

Token-value-aware refinement: Current Fisher methods require weighting or protection schemes to avoid head/structural token concentration at higher intervention strengths. More surgical interventions are needed for content- and entity-focused accuracy improvements.
Parameter-scope ablation: Isolation of the minimal subset of parameters responsible for geometric improvements (e.g., output layer, adapters) may enable more precise editing.
Training-time geometry shaping: Incorporation of margin-oriented objectives during pretraining may provide a more favorable substrate for subsequent polishing.
Calibration metrics and rare-token performance: Extension to entropy penalty baselines, Brier score, and ECE metrics is necessary to characterize practical calibration outcomes beyond margin width.
Model diversity and scaling: Larger models may afford greater slack for geometry-aware refinement, per evidence from intrinsic dimension analyses.

Conclusion

This work rigorously validates the latent semantic manifold framework in a large-scale transformer, demonstrating that the expressibility gap scaling law holds robustly, and confirming that the Voronoi tessellation in representation space remains both plastic and topologically invariant under post-hoc geometric intervention. Fisher information distance maximization is shown to be an effective and low-damage method for margin refinement, but practical utility is constrained by a concentration of gains among high-frequency structural tokens, especially as intervention strength increases. The results foreground the importance of token-value-aware refinement for future work in geometric editing, both for domain-sensitive deployment and for further theoretical advances in LLM latent manifold analysis.

Markdown Report Issue