HoloMambaRec: Efficient SSM & CGH

Updated 20 January 2026

HoloMambaRec is a scalable, hardware-efficient architecture that integrates selective state space encoding with holographic binding to capture long-range dependencies.
It employs a selective state space encoder and circular convolution-based HRRs to efficiently model item attributes and temporal correlations under strict latency constraints.
Empirical results demonstrate superior recommendation metrics and high-fidelity, real-time full-color holographic video processing on commodity hardware.

HoloMambaRec is a scalable, hardware-efficient architecture deployed in two distinct, high-impact domains: (1) sequential recommendation under strict latency and memory constraints and (2) real-time computer-generated holography (CGH) for full-color video. In both contexts, the defining principles are lightweight state space modeling, efficient representations, and explicit spatial-temporal or meta-data awareness. HoloMambaRec leverages a selective state space encoder inspired by Mamba-style models and, where relevant, holographic reduced representations (HRRs) for embedding and circular convolution binding. Its general scientific contribution is to demonstrate that long-range dependencies and structured attributes can be captured without incurring quadratic attention or memory growth. Below, both canonical instantiations are detailed, highlighting methodology, formalism, empirical performance, and significance.

1. Model Architecture and Components

In sequential recommendation, HoloMambaRec comprises:

Holographic embedding layer: Each item and its attribute are mapped to fixed-dimension embeddings and combined using circular convolution, yielding a single representation $ẽₜ$ . This enables the direct modeling of attribute relationships while keeping the parameter footprint constant.
Selective state space encoder: A recurrent-style backbone processes input sequences using input-conditioned, discretized dynamical systems. The encoder operates in 2–3 stacked layers, each updating a hidden state $hₜ$ and providing per-step outputs $yₜ$ for next-item prediction.
Temporal bundling and inference-time compression: Forward-compatible mechanisms allow for vector superposition of consecutive steps, compressing sequence length and enabling rapid inference without full re-encoding.

In high-speed, full-color CGH video, "HoloMambaRec" denotes a three-level asymmetric U-Net built from multi-receptive-field interaction (MRFI) modules, and incorporates bidirectional spatial-temporal Mamba blocks. The global MRFI branch engages long-range temporal dependencies via state space scanning, while the local branch captures fine-grained spatial details using 3D convolutions.

2. Holographic Reduced Representations and Binding

Attribute-aware modeling in recommendation systems is realized with HRRs:

Representation: Each item embedding $e(iₜ) \in \mathbb{R}^d$ and attribute embedding $e(aₜ) \in \mathbb{R}^d$ are bound via circular convolution:

$\tilde{e}_t = \text{LayerNorm}( e(i_t) + \alpha \cdot (e(i_t) \⊛ e(a_t)) )$

where $\⊛$ denotes circular convolution, defined for $x,y \in \mathbb{R}^d$ as $(x \⊛ y)_j = \sum_{k=0}^{d-1} x_k y_{(j-k)\bmod d}$, typically implemented using the FFT.

Unbinding: Recovery of $x$ from $z = x \⊛ y$ is approximated by circular correlation:

$\hat{x} \approx z \⊛̄ y,\quad (z \⊛̄ y)_j = \sum_{k=0}^{d-1} z_k y_{(j+k)\bmod d}$

Properties: The embedding dimensionality $d$ is fixed and attribute binding does not inflate model size, ensuring computational tractability for large catalogs or multi-attribute scenarios.

A plausible implication is that the algebraic structure of HRRs enables future expansion to encode richer side-information or superposed temporal bundles without incurring parameter blow-up.

3. Selective State Space Modeling

Central to both instantiations is the Mamba-inspired selective state space encoder:

Update equations: Each step receives the bound token $\tilde{e}_t$ and generates intermediate features via learned projections and convolutions. The state update is:

$h_t = \exp(-\Delta_t A) \odot h_{t-1} + \Delta_t \odot B_t \odot u_t$

where $A$ is a fixed diagonal matrix, and $B_t, \Delta_t$ are input-dependent. Output computation involves gated mixing of state and input:

$y_t = W_{out} [ \text{SiLU}(g_t) \odot (C_t\odot h_t + D \odot u_t) ]$

Gating $g_t$ controls expressivity and modulates the contribution between memory and input at each step.

Efficient inference: No pairwise (quadratic) attention is present. Sequence processing is $O(L d_{\text{state}})$ for length $L$ , supporting constant-time recurrent inference and spanning very long user or temporal histories.
Memory complexity: The encoder uses $O(d_{\text{state}} d)$ parameters per layer, with no key–value caches, enabling deployment on commodity hardware or VRAM-limited devices.

In CGH, the bidirectional Mamba process scans spatial-temporal volumes in both forward and backward directions, capturing correlations necessary for temporally coherent video reconstruction.

4. Training Paradigms and Loss Functions

Sequential recommendation: Masked cross-entropy loss over next-item prediction:

$\mathcal{L} = - \frac{1}{\sum \mathbb{I}[i_{t+1}\neq 0]} \sum_t \mathbb{I}[i_{t+1}\neq 0] \log \text{softmax}(z_{t,i_{t+1}})$

Standard dropout and LayerNorm are applied for regularization; SSM blocks are left dropout-free to enable direct performance comparison.

CGH video: Hybrid losses in both image and Fourier domains, combining mean-squared error (MSE) with focal frequency loss (FFL):

$\mathcal{L}_{\text{total}} = \lambda_{\text{MSE}} \mathcal{L}_{\text{MSE}} + \lambda_{\text{FFL}} \mathcal{L}_{\text{FFL}}$

This dual-domain objective ensures fidelity in both low-frequency (global) and high-frequency (edge/detail) components, crucial for holographic display and color separation.

5. Computational Complexity and Hardware Considerations

Model	Time/step	Memory Requirement	Sequence Scalability
Transformer (SASRec)	$O(L^2 d)$	$O(L^2)$ (cache)	Limited for $L\gg100$
HoloMambaRec (SSM)	$O(L [d \log d + d_{\text{state}}])$	$O(\|I\|d + \|A\|d + d_{\text{state}}d)$	Long histories $(L \sim 1000)$
CGH-HoloMambaRec	$O(T H W)$ (SSM + CNN)	$O(\text{channels}\cdot H\cdot W)$	Real-time (267 FPS FHD)

HoloMambaRec eliminates the quadratic bottleneck of attention, supports long-horizon modeling, and enables real-time holography on a single RTX 8000 GPU, using less memory than prior DCM or transformer-based solutions (Parthasarathy et al., 13 Jan 2026, Zhang et al., 27 Aug 2025).

6. Extensions, Practical Features, and Future Directions

Temporal bundling: Superposing $k$ consecutive HRR-bound vectors using learnable role vectors compresses $L \to \lceil L/k \rceil$ , enabling faster inference and reduced scan cost. The superposition error is controllable for moderate $k$ and does not materially degrade accuracy.
Inference-time compression: Summarization or caching of recurrent state steps is possible post-training, further reducing production latency.
SGDDM (holography): Spectrum-guided depth-division multiplexing applies learned Fourier masks to phase optimization, eliminating color crosstalk with single-shot angular spectrum separation and enabling full-color display at high frame rates.
Empirical ablations: MRFI modules with a global:local ratio of 0.8:0.2 optimize the trade-off between long-range modeling and memory footprint. Pure global branches slightly improve PSNR at higher memory cost, whereas pure local branches fail to capture necessary dependencies.

Future research includes multi-attribute joint binding, deeper compression and throughput analysis, optimization of selective scan kernels, and investigation of alternative binding schemes (e.g., tensor fusion) within the same memory budget.

7. Empirical Performance and Comparative Results

Sequential recommendation: On Amazon Beauty (sparse), HoloMambaRec achieves HR@10=0.0467, NDCG@10=0.0290, outperforming SASRec (0.0340/0.0203) and matching GRU4Rec under an aggressive 10-epoch budget. On MovieLens-1M (denser), it attains HR@10=0.1722, NDCG@10=0.0966, besting SASRec and GRU4Rec (Parthasarathy et al., 13 Jan 2026). Learning curves confirm faster, more stable convergence relative to transformer baselines.
Full-color video CGH: HoloMambaRec achieves 35.44 dB PSNR, 0.95 SSIM, and 267 FPS on FHD video, surpassing DCM by >2.6 dB and >2.6× speed at 18.6% lower memory usage. It outperforms ViT by >1000× speed, achieving temporal coherence and minimal color crosstalk, as evidenced by spectral separation and edge fidelity in both simulated and real-world reconstructions (Zhang et al., 27 Aug 2025). The warping error is also lowest among tested methods.

A plausible implication is that the unified selective SSM + HRR design in recommendation and CGH allows highly extensible, production-ready models, supporting larger catalogs, longer temporal windows, and real-time video processing without prohibitive hardware scaling.

HoloMambaRec represents an overview of efficient state space modeling, metadata binding, and hardware-conscious design, with demonstrated superiority in both sequential recommendation and computer-generated holography. By eschewing quadratic attention and explicitly modeling attributes or spatial-temporal correlations, it enables practical deployment in scenarios previously limited by inference or memory constraints.

Markdown Report Issue Upgrade to Chat

References (2)

Scalable Sequential Recommendation under Latency and Memory Constraints (2026)

High-Speed FHD Full-Color Video Computer-Generated Holography (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HoloMambaRec.