Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hyperbolic Sentence Embeddings

Updated 7 January 2026
  • Hyperbolic sentence embeddings are geometric representations that encode hierarchical relationships using manifolds of constant negative curvature.
  • They utilize operations like Möbius addition and centroid algorithms to preserve syntactic order and compositionality in language.
  • Empirical evaluations show these embeddings enhance classification and entailment accuracy on tasks with inherent hierarchical structure.

Hyperbolic sentence embeddings are geometric representations of sentences embedded in manifolds of constant negative curvature—most commonly the Poincaré ball or Lorentz hyperboloid. Driven by the exponential volume growth inherent to hyperbolic spaces, such embeddings inductively model latent hierarchies in language, offering a sharp alternative to flat Euclidean representations. Key developments span theoretical underpinnings, manifold-based neural operations, centroid algorithms, training objectives, and empirical evaluation across classification and entailment tasks. Hyperbolic sentence embeddings have demonstrated particular utility in applications with hierarchical structure or entailment-type semantics.

1. Hyperbolic Manifold Models and Operations

Hyperbolic embedding models begin with the selection of a target manifold. The dd-dimensional Poincaré ball, Bd={xRd:x<1}B^d = \{x\in\mathbb{R}^d : \|x\| < 1\}, possesses a metric tensor gx=(21x2)2gEg_x = \left(\frac{2}{1 - \|x\|^2}\right)^2 g^E, with geodesic distance d(u,v)=arccosh(1+2uv2(1u2)(1v2))d(u, v) = \operatorname{arccosh}\left(1 + 2\frac{\|u - v\|^2}{(1 - \|u\|^2)(1 - \|v\|^2)}\right) (Petrovski, 2024, Gerek et al., 2022, Dhingra et al., 2018). The Lorentz hyperboloid is defined as Hd={XRd+1:X,XL=1,X0>0}\mathcal{H}^d = \{X \in \mathbb{R}^{d+1} : \langle X, X \rangle_L = -1, X_0 > 0\}, with distance dH(X,Y)=arccosh(X,YL)d_\mathcal{H}(X, Y) = \operatorname{arccosh}\left(-\langle X, Y \rangle_L\right) (Gerek et al., 2022, Patil et al., 25 May 2025).

Central operations—Möbius addition and Möbius scalar multiplication in the Poincaré model, and Lorentzian sum in the hyperboloid—replace Euclidean vector arithmetic. Möbius addition is uMv=(1+2u,v+v2)u+(1u2)v1+2u,v+u2v2u \oplus_M v = \frac{(1 + 2\langle u, v \rangle + \|v\|^2)u + (1 - \|u\|^2)v}{1 + 2\langle u, v \rangle + \|u\|^2\|v\|^2}, and scalar multiplication is rMv=tanh(rtanh1v)v/vr \otimes_M v = \tanh(r\,\tanh^{-1}\|v\|)\: v/\|v\| (Petrovski, 2024). These are neither commutative nor associative, inherently encoding word order and tree structure sensitivity.

2. Centroid Algorithms and Sentence-Level Composition

Standard Euclidean averaging is geometrically inconsistent under negative curvature; sentence-level pooling in hyperbolic space thus leverages centroids:

  • The Riemannian Fréchet mean minimizes squared geodesic distances, c=argminci=1nd(c,xi)2c^* = \arg\min_{c}\sum_{i=1}^n d(c, x_i)^2 (Gerek et al., 2022), using gradient descent on the manifold: ct+1=expct(η1ni=1nlogct(xi))c_{t+1} = \exp_{c_t}(-\eta\frac{1}{n}\sum_{i=1}^n\log_{c_t}(x_i)) (Lorentz), or ct+1=ct(i(ctxi)ηn)c_{t+1}=c_t\oplus\left(\sum_i(-c_t\oplus x_i) \otimes \frac{\eta}{n}\right) (Poincaré) (Gerek et al., 2022).
  • The Einstein midpoint, closed-form for n=2n=2, generalizes via recursive mass-weighted averaging on the Lorentz manifold; normalization ensures manifold validity (Gerek et al., 2022).

Alternatively, recursive composition via Möbius addition along syntactic parse trees yields sentence representations that encode both hierarchy and constituency (Petrovski, 2024). Mobius averaging and binary-tree algorithms provide single-pass computational alternatives.

3. Hyperbolic Neural Network Layers and Sentence Encoding

Hyperbolic neural architectures generalize fundamental subnetworks:

  • Feed-forward and recurrent networks: Linear maps are mapped to the manifold by applying log- and exponential-maps: Mcx=exp0c(Mlog0c(x))M \otimes_c x = \exp^c_0(M\log^c_0(x)), with pointwise nonlinearities ϕc(x)=exp0c(ϕ(log0c(x)))\phi^{\otimes_c}(x) = \exp^c_0(\phi(\log^c_0(x))) (Ganea et al., 2018). Biases are Möbius-translated.
  • Hyperbolic recurrent units: A gated recurrent unit (GRU) cell with Möbius version of gating, candidate, and update steps entirely respects manifold constraints (Ganea et al., 2018). Sentence encoding typically begins by mapping word vectors to the ball, then feeding the word sequence to a hyperbolic RNN/GRU, with the final hidden state as the sentence embedding.

Hierarchical Mamba (HiM) uses state-space sequence models with manifold-projected outputs, utilizing a learnable curvature parameter cc; Poincaré and Lorentz projections are applied post-normalization and scaling (Patil et al., 25 May 2025). Combined with mean pooling for hidden states, this yields robust and hierarchy-preserving embeddings.

4. Loss Functions, Optimization, and Hierarchical Induction

Losses are rooted in manifold distances and hierarchical relationships:

  • Margin-based losses: Centripetal loss enforces hierarchical structure: Lcentri=(e,e+,e)max(dc(e+,0)dc(e,0)+β,0)\mathcal{L}_{\mathrm{centri}} = \sum_{(e, e^+, e^-)} \max(d_c(e^+, 0) - d_c(e, 0) + \beta, 0) (Patil et al., 25 May 2025); clustering loss tightens sibling clusters.
  • Binary/ternary entailment: Pairwise energy E(u,v)=βd(u,v)+(1β)max(0,vu)E(u, v) = \beta d(u, v) + (1 - \beta)\max(0, \|v\| - \|u\|) leverages both proximity and radial order (Petrovski, 2024). Cross-entropy over hyperbolic multinomial logistic regression is often adopted for classification (Ganea et al., 2018, Patil et al., 25 May 2025).
  • Optimization: Euclidean parameters use Adam/AdamW; hyperbolic parameters via Riemannian SGD, converting Euclidean gradients with scaling and manifold retraction (typically exponential map) and projection back to the ball if needed (Ganea et al., 2018, Petrovski, 2024, Gerek et al., 2022).

Empirical hierarchy induction in Penn Treebank parses shows strong correlation (r=0.671r=0.671) between norm and tree height, explicitly confirming the embedding of hierarchical depth by radial distance from the origin (Dhingra et al., 2018).

5. Empirical Performance and Task-Specific Evaluation

Extensive experiments confirm domain-specific advantages:

  • Entailment and hierarchy tasks: Hyperbolic embeddings outperform Euclidean baselines on SICK textual entailment (86.8%86.8\% binary, 81.2%81.2\% 3-way) and rapid partial order learning in toy tasks; in SNLI, performance is on par or slightly behind Order Embeddings (Petrovski, 2024).
  • Classification benchmarks: Both Riemannian Fréchet mean and Einstein midpoint centroids improve k-NN and SVM classification accuracy by 0.5–1.0% over Euclidean composition in text classification (20News, Turkish corpora) (Gerek et al., 2022).
  • Long-sequence reasoning and multi-hop inference: HiM (Hierarchical Mamba) yields stable and high F1 on deeply hierarchical ontologies (WordNet, SNOMED-CT, DOID, FoodOn), with HiM-Lorentz offering lower variance and robustness, and HiM-Poincaré capturing fine-grained distinctions (Patil et al., 25 May 2025).
  • Downstream generalization: Held-out perplexity and MPQA polarity tasks show small but persistent gains for hyperbolic models; other semantic tasks register mixed results, with clear advantages for tasks inherently hierarchical or entailment-based (Dhingra et al., 2018).

Performance Table (selected binary entailment):

Model SNLI-Binary SICK-Binary
Euclidean Averaging + FFNN 83.7% 85.6%
LSTM + FFNN 83.2% 75.5%
Mobius Summation + FFNN 82.8% 86.8%
Mobius Summation + FFNN (c=0.03c=0.03) 85.5% 86.7%
Order Embeddings 88.3% 85.2%

6. Strengths, Limitations, and Task Alignment

Hyperbolic embeddings deliver substantial representational benefits:

  • Strengths: Exponential volume admits near-isometric tree embeddings, and radial ordering naturally encodes specificity/generality (with norm corresponding to abstraction level) (Petrovski, 2024, Dhingra et al., 2018, Gerek et al., 2022). Non-commutative, non-associative composition matches linguistic constituency.
  • Limitations: Gains are inconsistent in purely semantic or similarity tasks; continuous latent hierarchies are difficult to inspect compared to explicit graph-structured models; performance critically depends on task alignment with hierarchical or entailment structure (Dhingra et al., 2018).
  • All hyperbolic centroid schemes robustly outperform naïve Euclidean averaging on classification and ranking (Gerek et al., 2022).

A plausible implication is that hyperbolic embeddings should be considered preferentially for applications with strong hierarchical or entailment relations, but may provide limited advantage for flat or similarity-focused tasks.

7. Practical Considerations and Future Directions

Implementations must carefully manage manifold boundary conditions (clamping in Poincaré, normalization in Lorentz), and select centroid computation per cost/accuracy tradeoffs (Gerek et al., 2022). Recent advances in learnable curvature (HiM), linear-time state-space models, and hybrid losses have substantially narrowed the accuracy gap with more mature Euclidean and order-based methods (Patil et al., 25 May 2025). Future work will likely further integrate hyperbolic layers in deep architectures, expand unsupervised hierarchical induction, and refine optimization for large-scale, high-dimensional language data.

Hyperbolic sentence embeddings thus provide a principled, theoretically grounded pathway to harnessing the hierarchical nature of text in modern language understanding pipelines, with demonstrable efficacy in entailment, classification, and transitive reasoning under manifold constraints.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hyperbolic Sentence Embeddings.