Geometry-Weighted Embedding Losses

Updated 21 January 2026

The paper introduces a novel loss design that enforces geometric invariance through local isometry and curvature penalties, enhancing representation quality.
Methodologies include first-order distortion and second-order bending penalties implemented via convex functions and Monte Carlo integration to maintain manifold structure.
Applications span autoencoders, classification, and contrastive learning, with empirical gains on benchmarks demonstrating improved regularity and robust performance.

Geometry-weighted embedding losses comprise a suite of regularization and objective function designs for deep learning that explicitly incorporate geometric structure into the representation space. By weighting loss terms according to local distance laws, embedding curvature, or inter-class angular separation, these methods induce latent geometries that align with intrinsic or task-specific similarities, yielding superior regularity, interpretability, and downstream performance in a wide range of settings. Geometry-weighted losses have been developed for autoencoders, classification, contrastive learning, and distributional modeling, each with theory-backed formulations and empirically validated gains on benchmark tasks.

1. Foundational Principles of Geometry-Weighted Losses

The central principle in geometry-weighted losses is the direct penalization of geometric distortion in the learned embedding space relative to a source manifold or target class structure. Classical objectives such as softmax cross-entropy and triplet contrastive loss do not guarantee isometry, curvature control, or explicit angular separation. Geometry-weighted approaches modify losses to enforce:

Local isometry: penalizing deviations between Euclidean/latent distances and manifold distances.
Extrinsic flatness: minimizing curvature through second-order penalties that measure bending of the embedding map.
Inter-class separation: encouraging orthogonal or equiangular configurations of class/channel subspaces or prototypes.
Metric-aware class assignment: weighting misclassification by underlying costs or distances in label space.

These principles lead to regularization schemes that are faithful to the structure of the input or label manifold, making representations more robust, transferable, and clearly interpretable.

2. Loss Designs for Low-Bending, Low-Distortion Manifold Embeddings

Geometry-weighted loss design for autoencoder and manifold learning proceeds by directly penalizing two key failure modes in embedding maps: non-isometric distance scaling and excessive curvature. Methods introduced by Braunsmann et al. (Braunsmann et al., 2022, Braunsmann et al., 2021) formalize this via:

First-order distortion penalty: For nearby pairs $(x, y)$ with geodesic distance $d_m(x, y)$ and encoder $\phi$ , the quotient

$D_{x, y} \phi := \frac{\|\phi(y) - \phi(x)\|}{d_m(x, y)}$

is penalized through a convex function $h(s)$ or $\gamma(s)$ , with $h(1) = 0$ enforcing local isometry.

Second-order bending penalty: For the geodesic midpoint $m = \text{av}_m(x, y)$ ,

$B_{x, y} \phi := 8 \frac{\text{av}_{\mathbb{R}^l}(\phi(x), \phi(y)) - \phi(m)}{d_m(x, y)^2}$

penalizes deviation from linearity (extrinsic flatness).

Monte Carlo integration: Losses are summed over batches of such pairs, with careful sampling within an $\epsilon$ -neighborhood to ensure uniqueness and faithful derivative approximation.

In the limit $\epsilon \to 0$ , these empirical losses converge to local functionals involving the Riemannian Jacobian and Hessian of $\phi$ , enforcing a regular, flat, distance-preserving parameterization. Specific trade-offs between isometry and flatness are controlled via the weight $\lambda$ , and hyperparameters such as $\epsilon$ and batch size are tuned empirically (Braunsmann et al., 2021).

3. Geometry-Weighted Losses in Classification and Contrastive Settings

The geometry of embedding spaces for supervised classification can be directly manipulated via loss design and explicit geometric priors:

Spherical and hyperbolic softmax losses: Choice of embedding geometry (Euclidean, hyperbolic, spherical) modifies the similarity function and the induced representation clusters (Scott et al., 2021). Spherical (cosine) losses and angular-margin variants (ArcFace) operate on the unit sphere, while hyperbolic losses utilize the Poincaré ball for hierarchical arrangements.
Prototype-enforced contrastive losses: Introducing fixed class prototypes with a target Gram matrix during supervised contrastive learning steers the entire embedding geometry to a desired configuration, such as an equiangular tight frame (ETF). Increasing the number of prototype anchors leads to "Neural Collapse," where class means and features converge onto the chosen frame (Gill et al., 2023).

Loss Variant	Embedding Geometry	Highlights
Spherical/ArcFace	Unit sphere	High accuracy, calibration, retrieval
Hyperbolic	Poincaré ball	Hierarchical task alignment
Prototype SCL	Target Gram ( $G_*$ )	Controlled class separation, ETF
Euclidean	Vector space	Best far-transfer, robust scaling

4. Metric-Weighted Softmax and Distributional Losses

Geometry-weighted objectives are also constructed by integrating pairwise costs or distances defined on the label space or output manifold:

Geometric Fenchel–Young loss: Mensch et al. (Mensch et al., 2019) propose a generalization of logistic regression incorporating a cost matrix $C$ representing misclassification costs or squared distances in an embedding. The loss is derived from entropy-regularized optimal transport, with a "geometric softmax" operator $\sigma_C(f)$ found via a strictly convex quadratic program over the simplex.
Sparse/singular prediction and task-aware weighting: This framework supports infinite/continuous label spaces, and yields sparse outputs concentrated on geometric features of the label manifold. The geometric softmax attenuates or sharpens class probabilities according to both predicted scores and geometric proximity.

Empirical results show improved Hausdorff divergence in ordinal regression and higher-fidelity reconstructions in VAE-driven drawing generation.

5. Low-Rank, Orthogonal Embedding via OLÉ Loss

The OLÉ framework (Lezama et al., 2017) constitutes a plug-and-play geometry-weighted loss targeting deep metric learning objectives:

Low-dimensional subspace collapse for intra-class compactness: Minimizing the nuclear norm of each class's feature block forces within-class samples into a linear subspace of minimal dimension.
Orthogonality for inter-class margin: Maximizing the rank of the concatenated feature matrix aligns class subspaces orthogonally, which enforces maximal angular separation and margin between categories.
Drop-in integration: OLÉ is appended after penultimate feature layers, with gradients computed through singular value decomposition. The loss is used jointly with cross-entropy and $L_2$ regularization; the main additional hyperparameter is the trade-off weight $\lambda$ .

Empirical evaluation demonstrates steady improvements in classification accuracy, especially in low-data regimes, and robust novelty rejection owing to the explicit geometric structure in feature space.

6. Practical Implementation and Hyperparameter Tuning

Geometry-weighted losses require careful implementation detail:

Pair selection: For manifold embedding losses, draw pairs uniformly or from graph neighborhoods, compute geodesic distances and midpoints analytically or via solvers (Braunsmann et al., 2021, Braunsmann et al., 2022).
Prototype injection: In contrastive learning, batch formation augments real samples with multiple copies of fixed prototypes. Hyperparameters include the number of prototypes and the choice of target Gram matrix (Gill et al., 2023).
Weighting coefficients: For distortion/bending losses, tune $\lambda$ so that terms have balanced gradients. For geometric softmax, cost matrices and entropic regularization weights are selected according to downstream objectives and task semantics (Mensch et al., 2019).
Computational cost: SVD-based OLÉ incurs 10–33% training overhead; geometric softmax may require quadratic programming or Frank–Wolfe optimization, but is tractable for moderate class/dimension counts.

7. Impact and Theoretical Guarantees

Geometry-weighted losses are supported by rigorous convergence and optimality results:

Mosco convergence: Sampling-based losses for autoencoder embeddings converge to local geometric energies under mild regularity and boundedness conditions (Braunsmann et al., 2022).
Global minimizers: Prototype-steered SCL yields unique minimizers matching the target frame in the unconstrained feature model (Gill et al., 2023).
Convexity and differentiability: Geometric Fenchel–Young losses are strictly convex and admit efficient gradient computation, ensuring stable optimization (Mensch et al., 2019).
Empirical generalization: Across vision and distributional learning benchmarks, geometry-weighted approaches demonstrably improve accuracy, calibration, robustness to data scarcity, and representation regularity (Scott et al., 2021, Lezama et al., 2017).

Geometry-weighted embedding losses thus provide a principled and flexible paradigm for sculpting latent representations to match data-intrinsic or task-induced geometric invariants, with stable optimization, clear theoretical grounding, and measurable empirical benefit.