Papers
Topics
Authors
Recent
Search
2000 character limit reached

Siegel Neural Networks

Updated 14 November 2025
  • Siegel neural networks are discriminative architectures defined on Siegel spaces, which generalize SPD matrices and complex-hyperbolic geometry.
  • They employ novel formulations for multiclass logistic regression and fully-connected layers with Riemannian optimization, achieving state-of-the-art performance on radar clutter and node classification tasks.
  • The design leverages closed-form layer constructions and group symmetries, but faces challenges in parameter efficiency and computational overhead.

Siegel neural networks are a class of discriminative architectures defined over Siegel spaces: Riemannian symmetric spaces (RSS) generalizing both symmetric positive definite (SPD) matrices and complex-hyperbolic geometry. By leveraging the quotient structure and symmetries of Siegel upper half-spaces SHm\mathbb{SH}_m, these networks enable learning and classification with data that naturally reside on disconnected or highly curved geometric domains. Siegel neural networks introduce new formulations for multiclass logistic regression (MLR) and fully-connected (FC) layers, allowing end-to-end training with Riemannian optimization tools. The approach yields state-of-the-art performance on radar clutter classification and node classification tasks.

1. Geometric Foundation: The Siegel Upper Half-Space

The Siegel upper half-space of complex dimension mm is defined as

$\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$

where $\Sym_m$ denotes the m×mm \times m real symmetric matrices and $\Sym_m^+$ denotes symmetric positive definite matrices of the same size.

Siegel spaces possess a transitive isometric action by the real symplectic group

$\Sp_{2m} = \left\{\begin{pmatrix} a & b \ c & d \end{pmatrix}: ab^T = ba^T,\, cd^T = dc^T,\, ad^T - bc^T = I_m \right\},$

through generalized Möbius transformations: s=[ab cd]:x(ax+b)(cx+d)1.s = \begin{bmatrix} a & b \ c & d \end{bmatrix}: \quad x \mapsto (a x + b)(c x + d)^{-1}. The stabilizer of iImiI_m is $\SpO_{2m} = \Sp_{2m} \cap O_{2m}$, making the symmetric space realization explicit: mm0 with rank mm1 and nonpositive sectional curvature.

A canonical mm2-invariant metric on mm3 is: mm4 For mm5,

mm6

On any noncompact RSS mm7, one defines a vector-valued (Weyl-chamber-valued) distance mm8 as the mm9-congruence-invariant translation in a fixed maximal flat. This metric structure underpins the network constructions.

2. Layer Construction on $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$0

2.1 Multiclass Logistic Regression (MLR)

In Euclidean settings, MLR relies on linear scoring: $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$1 This is interpreted as proportional to the exponential of the signed distance from $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$2 to a class hyperplane.

For $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$3, two MLR constructions are defined:

(i) Quotient-Structure MLR (QMLR)

A class hyperplane is parameterized by points $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$4. The signed distance (Thm 2.1) is

$\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$5

where

$\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$6

Class scores and probabilities are then: $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$7

$\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$8

(ii) Vector-Valued-Distance MLR (VMLR)

Fix a direction $\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$9 (Weyl chamber) and basepoint $\Sym_m$0. Define

$\Sym_m$1

The distance upper bound (Prop 2.7) is

$\Sym_m$2

Set

$\Sym_m$3

In both cases, the cross-entropy loss is

$\Sym_m$4

2.2 Fully-Connected (FC) Layers

Two FC designs are given for $\Sym_m$5:

(i) Affine via Group Action (AFC)

Let weights be $\Sym_m$6 with $\Sym_m$7, $\Sym_m$8: $\Sym_m$9

m×mm \times m0

(ii) Dimensionality-Reducing FC (DFC)

Let m×mm \times m1 (Stiefel), m×mm \times m2: m×mm \times m3

Pointwise nonlinearities, such as an SPD-valued ReLU on the imaginary part, follow these mappings.

3. Training Procedures and Riemannian Optimization

3.1 Riemannian Backpropagation

Parameters may reside in vector spaces or on manifolds:

  • For m×mm \times m4 (SPD): Project gradients onto m×mm \times m5 and update via exponential retraction:

m×mm \times m6

  • For Stiefel m×mm \times m7: gradient step in m×mm \times m8, then QR re-orthonormalization.
  • For m×mm \times m9 points: compute a tangent gradient (via Jacobians of distance) and retract (via group action or geodesic) to manifold.

Standard Riemannian optimizers, such as Riemannian SGD or Riemannian Adam (e.g., Geoopt), can be directly utilized with conventional hyperparameters.

3.2 Regularization and Projection

No additional regularization is required beyond maintaining parameter feasibility via manifold-valued retractions. Optional penalties on tangent-space parameters, such as the spectral or Frobenius norm ($\Sym_m^+$0, $\Sym_m^+$1), can control model complexity.

4. Empirical Performance and Evaluation

4.1 Applications and Experimental Setup

Radar clutter classification: Uses simulated autoregressive (AR) Gaussian time series in $\Sym_m^+$2 (order $\Sym_m^+$3), summarized as $\Sym_m^+$4. Four datasets with $\Sym_m^+$5 = (3,2), (4,2), (5,2), (6,2), varying sample sizes. Network: one FC (AFC or DFC) layer mapping to $\Sym_m^+$6, followed by QMLR. Training: Riemannian Adam, learning rate 1e-3, batch size 32, 80 epochs.

Node classification: Datasets (Glass, Iris, Zoo from UCI) are small graphs. All-pairs “ground-truth” cosine distances are embedded into $\Sym_m^+$7 by minimizing

$\Sym_m^+$8

Network: AFC $\Sym_m^+$9 QMLR or VMLR. Training: Riemannian Adam, learning rate 1e-3, 100 epochs.

4.2 Key Quantitative Results

Table 1. Radar Clutter Classification (mean ± std over 10 runs)

Method Dataset 1 Dataset 2 Dataset 3 Dataset 4
kNN (Kähler dist.) 76.22 93.00 76.75 73.20
SPDNet [17] 63.44 41.50 45.88 66.80
SiegelNet–AFC–QMLR (Ours) 80.94 96.50 91.00 85.60

Table 2. Node Classification

Method Glass Iris Zoo
kNN 29.65 31.66 33.33
LogEig [21] 41.54 34.33 51.04
SiegelNet–BFC–BMLR [25] 41.12 37.26 48.12
SiegelNet–AFC–QMLR (Ours) 45.79 38.20 53.37

Siegel neural networks demonstrate superior performance across all datasets compared to SPD-based and kNN baselines.

5. Analysis, Limitations, and Prospects

5.1 Advantages

  • Expressivity: Siegel spaces naturally generalize SPD and complex-hyperbolic settings, enabling the representation of intricate correlations and dependencies.
  • Closed-form FC layers: The symplectic group action allows explicit formulae for affine mappings within the space.
  • Empirical results: State-of-the-art accuracy on radar signal and node classification benchmarks.

5.2 Limitations

  • Parameter efficiency: QMLR structure requires two points per class, effectively doubling the parameter count relative to Euclidean and SPD analogues.
  • Computational overhead: Riemannian distance calculations involve eigen-decompositions and matrix logarithms. Retractions and Cayley transforms further increase computation.
  • Curvature restriction: Only nonpositive curvature is supported; thus, structures with intrinsic positive curvature are not accommodated.
  • Architectural scope: Convolutional, batch-normalization, pooling, and attention layers on $\Sp_{2m} = \left\{\begin{pmatrix} a & b \ c & d \end{pmatrix}: ab^T = ba^T,\, cd^T = dc^T,\, ad^T - bc^T = I_m \right\},$0 have not been developed.

5.3 Potential Extensions

  • Compact MLR: Design of more parameter-efficient Siegel hyperplane representations.
  • Convolutional layers: Definition of local Siegel-valued filters via horospheres or $\Sp_{2m} = \left\{\begin{pmatrix} a & b \ c & d \end{pmatrix}: ab^T = ba^T,\, cd^T = dc^T,\, ad^T - bc^T = I_m \right\},$1-equivariant constructions.
  • Horospherical nonlinearities: Proposals to mimic ReLU via projections onto convex Weyl chambers.
  • Generative models: Development of Riemannian normalizing flows on $\Sp_{2m} = \left\{\begin{pmatrix} a & b \ c & d \end{pmatrix}: ab^T = ba^T,\, cd^T = dc^T,\, ad^T - bc^T = I_m \right\},$2 for generative modeling.
  • Hybrid manifolds: Integration of Siegel spaces with other curvature components in product manifold networks.

Siegel neural networks formalize geometric deep learning within a rich family of symmetric spaces, providing theoretical and practical advances for data with complex intrinsic geometry.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Siegel Neural Networks.