Masked Raw Signal Reconstruction

Updated 22 February 2026

Masked Raw Signal Reconstruction is a set of methods that recover incomplete or corrupted signal data using observed masking patterns and signal priors.
The techniques range from deep learning models like transformer-based autoencoders to classical iterative and convex optimization methods, effective in various modalities.
Key advances include robust mathematical frameworks, diverse masking strategies including curriculum and spectral masking, and validated performance metrics such as MSE, Pearson R, and PSNR.

Masked raw signal reconstruction refers to a class of computational and statistical methods for recovering unobserved, corrupted, or intentionally hidden components of raw signal data, based on an observed, “masked” or incomplete version of the signal and possibly side information such as the mask pattern or signal priors. This paradigm encompasses both modern deep learning approaches such as masked autoencoder models, and classical signal processing and inverse problem formulations leveraging harmonic, spectral, or algebraic structure. Techniques are adapted to diverse modalities—fMRI, EEG, communications IQ streams, remote sensing, and more—and may operate in temporal, spatial, spectral, or abstract patch domains.

1. Core Principles and Mathematical Frameworks

Masked raw signal reconstruction is structured around the following central components:

Masking Operation: The observed signal $x_{\text{masked}}$ is generated from the true signal $x$ and a mask $M$ (often binary, possibly stochastic or deterministic). Representative formulations include per-entry masking, per-patch masking, or convolutional masking, e.g., $x_{\text{masked}}[i] = x[i]$ if $M[i]=0$ , replaced otherwise (Qu et al., 2024, Liu et al., 1 Aug 2025).
Reconstruction Objective: Recovery is formulated as minimizing a loss function over the masked components, commonly mean squared error (MSE) over the set $\Omega=\{i : M[i]=1\}$ . In deep models, this becomes

$L = \frac{1}{|\Omega|} \sum_{i\in \Omega} (x_i - \hat{x}_i)^2$

or a spectral or perceptual loss in some contexts (Qu et al., 2024, Liu et al., 1 Aug 2025, Kweon et al., 2023, Chen et al., 2022).

Model Classes: Reconstruction may be accomplished by:
- Deep masked autoencoders (MAE, ViT-based, temporal/spatial/patch-wise) (Qu et al., 2024, Liu et al., 1 Aug 2025, Naskar et al., 20 Aug 2025, Kweon et al., 2023, Chen et al., 2022)
- Classical iterative harmonic inversion, spectral Wiener-type filtering, or SDP/convex programming (Hamann et al., 2023, Jaganathan et al., 2016, Bandeira et al., 2013, Nishizawa et al., 2013)
- Data-driven hybrid variants (as in CuMoLoS-MAE) combining curriculum masking and stochastic ensembles for uncertainty quantification (Naskar et al., 20 Aug 2025)
Setting: Can be applied to time series, spatial fields, complex-valued signals, multidimensional arrays, or abstract domains (e.g., block or patchwise for fMRI, atmospheric grids, remote sensing images).

The choice of framework is directed by the statistical properties of the signal, the mask structure, and the inferential aims (imputation, denoising, uncertainty estimation, representation learning, or transfer).

2. Masking Strategies and Data Modalities

Masking strategies are critical for both model design and experimental evaluation:

Random entry-wise masking: Each element is independently masked with fixed probability $p_m$ (Bernoulli masking) (Qu et al., 2024, Liu et al., 1 Aug 2025).
Patch/block masking: Non-overlapping patches along time or space are masked jointly—common in fMRI (temporal patches), remote sensing grids (micro-patches), image transformer models (fixed-size windows) (Qu et al., 2024, Naskar et al., 20 Aug 2025, Chen et al., 2022).
Curriculum masking: The mask ratio is scheduled from low to high during training to promote progressive learning from denser to sparser contexts (e.g., CuMoLoS-MAE uses a cosine ramp from 50% to 70% masking over 30 epochs) (Naskar et al., 20 Aug 2025).
Domain-specific masking: In neural recordings, structured masking over cortical regions or time blocks is deployed to model physiologically motivated occlusions (Qu et al., 2024). In communication IQ data, patches may align with symbol intervals (Liu et al., 1 Aug 2025).
Convolutional or spectral masking: Observations may be masked via filtering, blurring, or spectral domain occlusions—requiring specialized deconvolutional or spectral coupling inversion (Bahmani et al., 2014, Mack et al., 2019, Hamann et al., 2023).

Empirical studies support optimal masking ratios in the range 40–75% for effective self-supervised representation learning, with higher ratios favoring learning of global structure at the cost of local fidelity (Qu et al., 2024, Liu et al., 1 Aug 2025).

3. Architectures and Algorithms

Deep Masked Autoencoder Families

Modern approaches rely on transformer-based encoder–decoder models (MAE, ViT) adapted per modality:

fMRI (spatiotemporal MAE):
- Input: $X\in\mathbb{R}^{R\times T}$ , where $R=360$ parcels, $T=20$ time-frames (Qu et al., 2024).
- Patch embedding: Temporal blocks (e.g., $p=2$ frames), linear projection to hidden size $D=1024$ .
- Encoder: Multi-layer transformer (e.g., 8 layers, 16 heads).
- Decoder: Transformer, reconstructs masked patches to raw fMRI values.
- Loss: MSE on masked entries.
Raw IQ Modulation (RIS-MAE):
- Input: $x\in\mathbb{R}^{L\times 2}$ (IQ samples), patches of size $P$ .
- Encoder: Transformer with positional encoding.
- High masking ratio (75%), patch prediction head, and task-tailored downstream classifiers after self-supervised pretraining (Liu et al., 1 Aug 2025).
Remote Sensing (CuMoLoS-MAE):
- Micro-patch division (e.g., $2\times2$ pixel blocks), curriculum masking, ensemble MC inference for PCI (Naskar et al., 20 Aug 2025).
Image/vision (LoMaR):
- Local masked reconstruction within small windows (e.g., $7\times7$ patches), efficient for high-res images and reduces computational cost without accuracy loss (Chen et al., 2022).
EEG/PSG:
- Encoder–decoder mapping from masked single-channel EEG to multi-signal outputs, using cosine similarity loss and MSE evaluation (Kweon et al., 2023).

Classical and Hybrid Approaches

Iterative Harmonic/Spectral Expansion: Harmonic iterative expansion (IHE) or spectral filtering reconstructs masked large-scale fields, especially for isotropic Gaussian priors (e.g., CMB maps), attaining error $O(a^{N+1})$ for mask size $a$ after $N$ iterations (Nishizawa et al., 2013, Hamann et al., 2023).
Convex and SDP Methods: Masked autocorrelation and cross-correlation measurement reconstruction via SDP lifting (matrix $X=xx^*$ ), prominent in phase retrieval and blind channel estimation, with stability guarantees and empirical superiority over classical polynomial methods (Jaganathan et al., 2016).
Compressive Sensing via Masked Convolutions: Random mask deconvolution systems for imaging blur and subsampling, with theoretical RIP and conditioning guarantees, leveraging sparse recovery algorithms such as Basis Pursuit (Bahmani et al., 2014).

4. Empirical Results and Quantitative Performance

Empirical results consistently demonstrate the efficacy of masked reconstruction across domains:

fMRI MAE (50% masking): Pearson $R\approx0.61$ , MSE $\approx0.082$ over masked entries; language/motor reconstruction $R=0.65–0.7$ ; working memory $R=0.45–0.5$ (Qu et al., 2024).
IQ Modulation RIS-MAE (75% masking, cross-domain): Few-shot OA $34–48\%$ on various datasets with $1\%$ labels, outperforming supervised baselines by up to $+11.8$ pp; maintains high accuracy ( $92–97\%$ ) on previously unseen modulations (Liu et al., 1 Aug 2025).
Remote Sensing CuMoLoS-MAE (after curriculum): PSNR $29.45$ dB, SSIM $0.7857$, FID $1.87$; uncertainty map correlation $r=0.961$ with absolute error (Naskar et al., 20 Aug 2025).
EEG PSG: MSE per output channel in the range $1.9–10.2$ (sleep stages, reconstructed signals) from masked single-channel EEG (Kweon et al., 2023).
Classical Harmonic Expansion: RMS error $<8\%$ across moderate SNRs, with superior performance to standard $\ell_2$ -minimization or SVD for smooth priors and moderate mask size (Nishizawa et al., 2013, Hamann et al., 2023).
Compressive Deconvolution: Stable recovery with $O(S\log L)$ mask count for $S$ -sparse signals (Bahmani et al., 2014).

Performance is modulated by mask ratio, architecture depth, patch size, and domain-specific regularity. For deep architectures, masking also regularizes the model and promotes extraction of transferable representations (Qu et al., 2024, Liu et al., 1 Aug 2025, Chen et al., 2022).

5. Transfer Learning, Representation, and Uncertainty

Transfer and Taskonomy: Masked reconstruction provides an avenue for learning representations transferable between tasks, quantified by taskonomy matrices (difference in transfer vs. gold-standard reconstruction MSE) (Qu et al., 2024). This reveals cognitive task similarity, subtask clustering, and guides source task selection for decoding.
Uncertainty Quantification: Monte Carlo ensembling over masks enables explicit per-pixel or per-entry uncertainty maps; the CuMoLoS-MAE posterior predictive variance tightly correlates with reconstruction error, supporting principled confidence estimation in practical settings (Naskar et al., 20 Aug 2025).
Cross-modal Learning: Demonstrated in masked EEG-PSG, where even with high masking, models can reconstruct absent modalities (EOG, EMG) via learned self-supervised cross-modal structure (Kweon et al., 2023).
Representation Robustness: Self-supervised learning with aggressive masking improves downstream viability in limited-data settings, supporting cross-domain generalization and data-efficient transfer (Liu et al., 1 Aug 2025, Chen et al., 2022).

6. Applications, Limitations, and Extensions

Masked raw signal reconstruction is foundational in multiple domains:

Neuroimaging: Denoising, imputation, and cognitive representation discovery in fMRI and EEG; transfer learning-based taskonomy for enhanced decoding (Qu et al., 2024, Kweon et al., 2023).
Communications: Self-supervised pretraining for robust modulation classification from raw I/Q streams under varying channel conditions (Liu et al., 1 Aug 2025).
Remote Sensing: High-resolution reconstruction and uncertainty modeling for atmospheric profile gap-filling, real-time assimilation, and fine-scale feature recovery (Naskar et al., 20 Aug 2025).
Cosmic Field Recovery: Iterative and spectral methods for masked sky maps—critical for the CMB and galaxy surveys (Nishizawa et al., 2013, Hamann et al., 2023).
General Inverse Problems: Stable deconvolution with random masks, phase retrieval, blind channel estimation, and related quadratic inverse problems (Bahmani et al., 2014, Jaganathan et al., 2016, Bandeira et al., 2013).

Limitations include scalability of classical harmonic techniques at high resolution, need for knowledge of signal statistics in spectral methods, and hyperparameter sensitivity in deep models. Extensions span uncertainty calibration, adaptation to novel modalities, multi-modal integration, and curriculum-based masking schemes.

7. Comparative Summary of Approaches

Approach	Mask Type	Recovery Model	Key Reference
fMRI MAE	Random entry/patch	Transformer MAE	(Qu et al., 2024)
PSG EEG MAE	Entrywise	Autoencoder (EEG→multi)	(Kweon et al., 2023)
RIS-MAE (IQ signals)	Patchwise	Transformer MAE	(Liu et al., 1 Aug 2025)
CuMoLoS-MAE	Patch, curriculum	ViT + MC ensemble	(Naskar et al., 20 Aug 2025)
LoMaR Vision	Local window	Transformer encoder	(Chen et al., 2022)
Harmonic Expansion	Masking in space	Iterative spectral	(Nishizawa et al., 2013, Hamann et al., 2023)
SDP/PhaseLift	Linear masks	SDP lifting	(Jaganathan et al., 2016, Bandeira et al., 2013)
Random-mask Deconv.	Modulation+Conv	Linear/L1 optimization	(Bahmani et al., 2014)

The field of masked raw signal reconstruction synthesizes insights from self-supervised deep learning, harmonic analysis, convex optimization, and statistical estimation, providing robust, efficient, and generalizable methods for recovering hidden or corrupted signal components in complex real-world domains.