SIREN: Sinusoidal Representation Networks
- SIREN architectures are multilayer perceptrons that use sine activations to represent continuous signals, enabling accurate modeling of images, audio, and physical fields.
- The design leverages principled weight initialization and frequency scaling to stabilize gradient propagation and preserve both low- and high-frequency details.
- Variants like FM-SIREN and SIREN² enhance frequency diversity and adaptability, achieving improved PSNR, noise robustness, and efficient parameter utilization.
A SIREN (Sinusoidal Representation Network) is a multilayer perceptron (MLP) that uses the sinusoidal activation function in each hidden layer to parameterize continuous, differentiable implicit neural representations (INRs) of signals such as images, audio, 3D geometry, and physical fields. SIREN architectures are defined by their characteristic use of sine activation, principled initialization to maintain signal propagation and frequency fidelity, and the ability to directly model signal derivatives necessary for representing physical PDEs. Subsequent advances have introduced neuron-specific frequency multipliers (e.g., FM-SIREN), adaptive weight initialization strategies (WINNER/SIREN²), and specialized variants for diverse tasks such as noise-robust INR, high-fidelity audio compression, and multi-modal registration.
1. Mathematical Definition and Architectural Principles
A canonical SIREN consists of a fully connected stack of layers, where each hidden layer applies an affine transformation followed by the elementwise sine nonlinearity. For input , parameters , and layer input/output dimensions , the mapping is:
where each hidden layer applies
In practice, SIREN uses a frequency scaling hyperparameter for the first layer only:
In all subsequent layers, (unless using alternative variants). This configuration ensures that the networkās post-activation distribution remains stable across layers and retains sensitivity to both low- and high-frequency signals (Sitzmann et al., 2020).
Weight initialization is critical for signal propagation: for layers , set ; for the first layer, . This stabilizes the variance of pre-activations and ensures sinusoids do not saturate or become degenerate, preserving the networkās representational bandwidth (Sitzmann et al., 2020).
2. Theoretical Rationale and Frequency Properties
The use of periodic activation is motivated by the superior capacity of to represent signals with rich local and global oscillatory content. Unlike ReLU or tanh nonlinearities that suppress or rapidly decay high-frequency components, the sine function admits nonzero derivatives of all orders and can represent arbitrarily oscillatory features.
Deeper SIRENs are theoretically analogous to deep Fourier synthesizers; stacking sine activations constructs increasingly complex superpositions of harmonics. Maintaining pre-activation variance at initialization (via principled weight scaling) is essential to preserve high-frequency expressivity and avoid frequency collapse through depth (Sitzmann et al., 2020).
Closed-form derivatives (from automatic differentiation) make SIREN architectures directly suitable for learning solutions and derivatives of PDEs. For example, for
higher-order derivatives follow recursively. This analytic differentiability underpins applications from signal fitting to PDE-solving (Sitzmann et al., 2020).
3. Variants: Frequency-Multiplied and Adaptive Initialization SIRENs
FM-SIREN
FM-SIREN addresses the core limitation of original SIRENs: feature redundancy induced by a layer-wide fixed frequency multiplier , leading to correlated, overlapping frequency responses across neurons. Drawing from Nyquist-Shannon sampling theory and the discrete sine transform (DST), FM-SIREN assigns to each neuron in a layer a neuron-specific multiplier,
where is the number of neurons and the domain sampling rate. This set spans , eliminating redundancy and ensuring near-orthogonality across features (Alsakabi et al., 27 Sep 2025).
Empirically, this yields a 49ā50% reduction in off-diagonal covariance of hidden features (quantified via the Frobenius norm of covariance matrices), outperforms baseline SIREN in PSNR and MSE across audio, image, 3D, and NeRF fitting, and does so at unchanged computational cost (Alsakabi et al., 27 Sep 2025). FM-SIREN matches or exceeds the performance of a deep (5-layer) SIREN with only 2 layers.
WINNER (SIREN²)
SIREN exhibits spectral bias at initialization: pre-activations have low-frequency spectral support, impeding learning of signals with significant high-frequency content. When the target signal's spectral centroid lies outside the network's support, a "spectral bottleneck" arises: the network collapses to near-zero outputs for all frequencies.
WINNER (Weight Initialization with Noise for Neural Representations; SIREN²) injects adaptive Gaussian noise into the first two layersā weights, with noise magnitude determined by the spectral centroid of the target signal (). This increases pre-activation variance and widens spectral support without increasing trainable parameters:
with (noise scales) computed as functions of .
Empirically, SIREN² achieves state-of-the-art PSNR in audio, image, and 3D fitting, outperforming both SIREN and parameter-intensive SIREN+RFF, and also enhances denoising in DIP/Noise2Self setups. The analytic effect on spectral energy is accurately predicted by (Chandravamsi et al., 16 Sep 2025).
4. Key Applications and Performance
SIREN and its variants serve as implicit neural representations for a range of tasks:
- Signal Fitting: SIRENs and FM-SIRENs precisely reconstruct 1D audio, 2D images, and 3D shapes from coordinateāvalue pairs, achieving high PSNR and low MSE compared to non-periodic or PE-based MLPs (Sitzmann et al., 2020, Alsakabi et al., 27 Sep 2025).
- Physical PDE Solving: SIRENs are directly applied to systems defined via derivatives, such as:
- Poisson equation: via supervision only on gradients or Laplacians.
- Eikonal equation: for signed distance function (SDF) learning with norm and normal constraints.
- Helmholtz/wave equation: for learning complex wavefields and FWI velocity models (Sitzmann et al., 2020).
- Noise-Robust INR: Appropriately limiting width, depth, or frequency scaling enables SIREN to function as a mesh-free, noise-robust regressor for pressure fields from image velocimetry in challenging, noisy environmentsāoutperforming classical mesh-based alternatives (Miotto et al., 29 Jan 2025).
- Audio Compression: Architecture variants (Siamese SIREN) share backbones but separate output heads, facilitating implicit audio compression and self-noise estimation with substantially reduced parameter counts and improved objective and perceptual metrics after quantization (Lanzendƶrfer et al., 2023).
- Registration and Multi-modal Fusion: SIREN architectures have been extended to semantic-guided registration pipelines for multi-robot Gaussian Splatting maps and as collaborative transformer systems for text-to-audio generation (Shorinwa et al., 10 Feb 2025, Wang et al., 6 Oct 2025).
5. Training, Initialization, and Implementation Details
SIRENs require careful architectural configuration:
- Depth/Width: Depths of 5ā10 layers and hidden widths of 256ā512 neurons balance representational capacity and runtime (Sitzmann et al., 2020).
- Initialization: Principled uniform initialization (as outlined above) is crucial to avoid vanishing/exploding gradients and to maintain frequency propagation.
- Optimization: Training is via Adam (default ), with learning rates in ā.
- Loss Functions: For regression, use MSE on value or derivatives; for PDE constraints, additional loss terms target gradients, Laplacians, or physical constraints.
- Gradient Supervision: For PDE tasks, losses on spatial derivatives or Laplacians are computed via autograd.
- Sampling: Coordinate minibatches are directly sampled from the input domain (e.g., spatial, spatiotemporal grids).
The frequency multipliers in FM-SIREN are computed once per layer (e.g., per audio sampling rate, image dimension, or voxel grid size) and stored as an -sized vector per layer (Alsakabi et al., 27 Sep 2025).
6. Comparative Empirical Results
Key empirical results for SIREN and variants include:
| Architecture | Audio PSNR | Image PSNR | 3D IoU/PSNR | NeRF PSNR | Notes |
|---|---|---|---|---|---|
| SIREN | 13.4ā59.4 | 21.3ā38.9 | 0.960 | 33.02 | Baseline, fixed |
| FM-SIREN | 62.7 | 32.29 | 0.990 | 33.17 | 10ā50% reduction in redundancy, faster |
| SIREN² (WINNER) | 62.7ā95.2 | 36.1ā75.2 | 55āÆdB | ā | Adaptive freq. support, state of art |
| Siamese SIREN | ā | ā | ā | ā | 40% fewer params, best audio metrics |
SIREN²/WINNER matches or exceeds parameter-heavy SIREN+RFF, without increased optimizer or memory cost (Chandravamsi et al., 16 Sep 2025). FM-SIREN matches deep classic SIRENs with only 2 layers, and in NeRF reconstruction yields 14% faster training at unchanged fidelity (Alsakabi et al., 27 Sep 2025).
7. Limitations and Open Directions
- Spectral Bottleneck: Even with appropriate initialization and frequency allocation, SIRENās ability to learn highest frequencies is fundamentally linked to architecture and initialization. WINNER can underfit extremely low-frequency content if noise scales are excessive (Chandravamsi et al., 16 Sep 2025).
- Parameter Sensitivity: FM-SIRENās frequency allocation requires knowledge of the signal domainās effective sampling rate. Incorrect estimation can degrade near-orthogonality of hidden features (Alsakabi et al., 27 Sep 2025).
- Hyperparameter Tuning: Frequency multipliers in vanilla SIREN (), initialization constants, or adaptive noise scales () are task/domain-specific. Porting to new signal domains may require retuning.
- Generalization to Other Periodic Activations: Research suggests extending adaptive initialization and neuron-specific frequency selection to broader classes of periodic activations and positional encoding schemes is a promising avenue (Chandravamsi et al., 16 Sep 2025).
- Integration with Hypernetworks: SIREN functions parameterized by hypernetworks support priors over distribution families of signals but inherit any representational limitations of the core SIREN.
SIREN architectures and their recent extensions define a mathematically principled and empirically validated toolkit for high-fidelity implicit neural representation, with both theoretical underpinnings grounded in harmonic analysis and practical advances in frequency diversity, initialization, compression, and robustness (Sitzmann et al., 2020, Alsakabi et al., 27 Sep 2025, Chandravamsi et al., 16 Sep 2025, Miotto et al., 29 Jan 2025, Lanzendƶrfer et al., 2023).