RFDiffusion Family

Updated 10 February 2026

RFDiffusion Family is a suite of denoising diffusion models that generate de novo protein structures and high-fidelity RF signals using SE(3)-equivariant methods.
The framework integrates RoseTTAFold-derived feature extractors with geometric deep learning to accurately model 3D atomic structures and complex radio-frequency waveforms.
Interpretability extensions like FoldSAE enable tunable control over secondary structure features, enhancing design accuracy and practical application outcomes.

The RFDiffusion family encompasses a suite of denoising diffusion generative models, primarily designed for de novo protein structure generation and, in related but orthogonal work, for high-fidelity RF signal synthesis. Distinguished by SE(3)-equivariant architectures and integration with RoseTTAFold-derived feature extractors, RFDiffusion core models and their interpretability extensions—such as FoldSAE—represent a major advance in the programmable, data-driven creation of complex molecular and signal structures. The family includes not only models for protein design but also for radio-frequency waveform generation, unified by the application of diffusion probabilistic modeling in high-dimensional, domain-specific data spaces (Yang et al., 2 Apr 2025, Zarzecki et al., 27 Nov 2025, Chi et al., 2024).

1. Core Diffusion Formalism and Probabilistic Framework

At the core of RFDiffusion for protein design lies an adaptation of the discrete-time Denoising Diffusion Probabilistic Model (DDPM) formalism to 3D atomic structures. The forward process corrupts inputs via two Markov chains: (1) local Gaussian noise applied to each residue's SE(3) (rotation-translation) frame and (2) global ISO(3) noise on the atomic coordinates. For residue-level backbones, this is captured as:

$q(R_t|R_{t-1}) = \mathcal{N}_{SO(3)}(R_t; R_{t-1}, \beta_t I), \quad q(T_t|T_{t-1}) = \mathcal{N}(T_t; T_{t-1}, \sigma_t^2 I)$

with iterative frame interpolation for $t=1,\ldots,T$ , leading to

$X_t = \sqrt{\alpha_t} X_0 + \sqrt{1-\alpha_t}\,\epsilon, \;\; \epsilon\sim\mathcal{N}(0, I)$

The learned reverse kernel $p_\theta(X_{t-1}\mid X_t)$ is regressed to undo noise, parameterized either as the predicted mean $\mu_\theta$ or noise term $\epsilon_\theta$ .

In the RF signal domain, RFDiffusion generalizes diffusion by introducing a composite "Time-Frequency Diffusion" mechanism. The forward process first applies a frequency-domain Gaussian blur via DFT/inverse DFT and then injects complex Gaussian noise in the time domain:

$x_t = \sqrt{\alpha_t} \mathcal{F}^{-1}(G_t * \mathcal{F}(x_{t-1})) + \sqrt{1-\alpha_t}\,\epsilon, \quad \epsilon\sim\mathcal{CN}(0, I)$

where $G_t$ is a time-dependent frequency-domain kernel (Chi et al., 2024).

2. Model Architecture and SE(3)-Equivariant Networks

The canonical RFDiffusion architecture in protein modeling employs a RoseTTAFold-derived feature extractor—pretrained on structure-prediction—to generate rich sequence and pairwise embeddings. These, along with noisy geometric inputs, feed into an SE(3)-Transformer engineered to preserve equivariance with respect to rigid motions in three-dimensional space. The model is organized as stacked residual blocks, each updating per-residue embedding streams in the 1D, 2D, and 3D feature axes. Self-conditioning is used throughout: at each denoising step, the previous output $\hat{Y}_0$ is concatenated with the corrupted input as an auxiliary conditioning vector, enhancing stability and long-horizon generation (Yang et al., 2 Apr 2025, Zarzecki et al., 27 Nov 2025).

For RF data, the architecture consists of two-stage Hierarchical Diffusion Transformers—first a spatial denoising stage, independently applied to each time slice, followed by a time–frequency deblurring stage across the sequence. Attention-Based Diffusion Blocks (ADBs) with complex-valued multi-head self- and cross-attention and Adaptive LayerNorm are implemented throughout. Complex domain operators and phase modulation encoding (PME) ensure appropriate handling of sequence order and complex magnitudes/phases (Chi et al., 2024).

3. Training Regimes, Loss Functions, and Data

RFDiffusion models for protein design are trained on curated PDB backbones (plus small-molecule coordinates for all-atom extensions). Training is mediated by multi-scale noise schedules for local and global perturbations, typically annealed linearly or by a cosine schedule over $T \approx 1000$ steps. The loss combines an MSE regression of predicted noise (or denoised coordinates) with a KL divergence between true and model reverse kernels. Fine-tuning of RoseTTAFold backbone weights is conducted for the diffusion objective, and all-atom models add further side-chain embedding and grafting modules (Yang et al., 2 Apr 2025).

RF-Diffusion for RF signals utilizes standardized sequence length normalization, power calibration, and context conditioning. The loss is a squared error between predicted and analytically derived means for the reverse process across time steps, with all parameters, including transformer weights and complex-valued kernels, trained by AdamW under exponential moving average stabilization (Chi et al., 2024).

4. Variational and Interpretability Extensions: FoldSAE

FoldSAE represents a post-hoc interpretability and controllability extension to the RFDiffusion family. It introduces a sparse autoencoder (SAE) module attached to the residual stream of a critical block (main_04) in the denoising network, without modifying existing weights or introducing extra layers in the core diffusion model. The SAE is constructed with a single linear encoder, TopK sparsification (no explicit L1 penalty), and a linear decoder. This enables the extraction of mono-semantic latent features corresponding directly to secondary structure content (e.g., α-helix vs β-sheet).

Steering is enabled at inference by modulating selected SAE latent coordinates with a hyperparameter $t=1,\ldots,T$ 0, which up/down-weights features correlated with target structures. This plug-and-play mechanism allows direct, tunable control of folding outcomes, as evidenced by monotonically increasing strand or helix fractions with λ sweeps and preservation of biological plausibility as assessed by FBD and MMD metrics (Zarzecki et al., 27 Nov 2025).

5. Empirical Performance, Benchmarks, and Case Studies

RFDiffusion has demonstrated high design hit-rates (tens of percent for binders with $t=1,\ldots,T$ 1 nM) without template selection, generation of novel backbones within 2 Å RMSD to target folds in over 70% of cases, and 2–3× lower coordinate MSE compared to RoseTTAFold inpainting/naive decoders. The all-atom variant achieves ligand pocket geometries with side-chain placements within 1.5 Å and sub-μM binding affinities.

Notable case studies include:

Generation of de novo scaffolds for neutralizing snake-venom toxins, with in vivo efficacy;
Design of α-helix mimetics for pan-viral fusion inhibition;
Creation of ligand binding sites with atomic-level precision (Yang et al., 2 Apr 2025).

In RF synthesis, RF-Diffusion achieves state-of-the-art structural similarity index (SSIM) and Fréchet inception distance (FID) scores, outperforming DDPM, DCGAN, and CVAE baselines for Wi-Fi and FMCW radar signals. For downstream applications, such as data augmentation for gesture recognition and 5G channel estimation, RF-Diffusion provides substantial improvements in classifier accuracy and SNR over previous methods (Chi et al., 2024).

6. Limitations and Directions for Future Development

Documented limitations of RFDiffusion include decreased fidelity in modeling structures under highly dynamic solvent environments, insufficient sampling of intrinsically disordered or non-canonical protein regions, and reduced performance at rare small-molecule/interface chemistries due to training data sparsity. Addressing these gaps, proposed extensions comprise:

Conditional diffusion guidance via environmental or multimodal conditioning (e.g., pH, cryo-EM density);
Incorporation of coarse-grained molecular dynamics to better capture conformational ensembles;
Generative adversarial training to expand negative sampling diversity and increase structural sharpness;
Use of fragment libraries and additional retrievals for low-data or membrane-related targets (Yang et al., 2 Apr 2025).

FoldSAE highlights the potential of mechanistic interpretability in generative protein models and suggests future work on more granular controls, hybrid training objectives, and integration with additional post-hoc explainability modules (Zarzecki et al., 27 Nov 2025).

Table: Key RFDiffusion Family Models and Domains

Model Variant	Domain	Key Features
RFDiffusion (backbone/AA)	Protein design	SE(3)-equivariant diffusion, RoseTTAFold backbone, all-atom/ligand pockets
FoldSAE	Protein design interpretability	Sparse autoencoder, block/neuron-level secondary structure control
RF-Diffusion	Radio-frequency signals	Time-Frequency diffusion, hierarchical transformer, complex-valued ops

This summary contextualizes RFDiffusion as the central generative framework for programmable biomolecular and RF data generation, integrating geometric deep learning with modern diffusion and interpretability paradigms (Yang et al., 2 Apr 2025, Zarzecki et al., 27 Nov 2025, Chi et al., 2024).