RFDiffusion: SE(3)-Equivariant Protein Design

Updated 22 January 2026

RFDiffusion is an SE(3)-equivariant denoising diffusion model that achieves de novo protein backbone generation with state-of-the-art performance.
It leverages score-based SDEs and integrates geometric neural networks to control translations and rotations in protein structures.
Empirical benchmarks and sparse autoencoder interpretability enable versatile applications in motif scaffolding, binder design, and symmetric assembly.

RFDiffusion is an SE(3)-equivariant denoising diffusion generative model designed for de novo protein backbone generation. It leverages score-based stochastic differential equations (SDEs) to denoise rigid-body representations of protein structures, coupling translational and rotational noise in the geometric manifold of SE(3). By leveraging architectural foundations from structure prediction networks such as RoseTTAFold, RFDiffusion achieves state-of-the-art performance in diverse protein design tasks, including motif scaffolding, binder generation, and symmetric assembly. Its statistical and neural design enables conditional and unconstrained generation across a range of backbone lengths and structural topologies, and it provides a practical framework for incorporating interpretability and controllability into generative protein modeling (Qin et al., 23 Apr 2025, Yang et al., 2 Apr 2025, Zarzecki et al., 27 Nov 2025, Yu et al., 27 Jul 2025).

1. Mathematical Formulations and Generative Process

RFDiffusion parameterizes a protein of length $N$ as a set of rigid frames:

$\{F_i = (R_i, x_i)\}_{i=1}^N$

where $R_i \in SO(3)$ is the local backbone rotation and $x_i \in \mathbb{R}^3$ is the C $_\alpha$ position (Yang et al., 2 Apr 2025, Qin et al., 23 Apr 2025). The forward diffusion process applies independent isotropic Brownian motion in translation and rotation: $dx_i(t) = dB^{\mathbb{R}^3}_i(t), \quad dR_i(t) = R_i(t) \circ dB^{SO(3)}_i(t)$ In the DDPM discretization, this corresponds to: $q(x^t_i|x^{t-1}_i) = \mathcal{N}(x^t_i;\sqrt{1-\beta_t}x^{t-1}_i,\beta_t I), \quad q(R^t_i|R^{t-1}_i) = \mathrm{WrappedNormal}(R^t_i;R^{t-1}_i,\sigma_t^2)$ Masking can fix designated residues for motif scaffolding.

The reverse process (generative denoising) employs an SE(3)-equivariant neural network to predict the score (denoised backbone or noise), updating coordinates with: $x^{(t-1)} = \frac{1}{\sqrt{\alpha_t}}\left(x^{(t)} - \frac{1-\alpha_t}{\sqrt{1-\bar\alpha_t}}\epsilon_\theta(x^{(t)}, t)\right) + \sigma_t \eta$ where $\epsilon_\theta$ is the learned denoising prediction, and analogous updates apply to rotations using appropriate $SO(3)$ representations (Yu et al., 27 Jul 2025, Zarzecki et al., 27 Nov 2025).

Losses combine score-matching for translation and rotation, atom coordinate reconstruction, 2D distance maps, and local geometric constraints (dihedral and planar angle MSEs), typically weighted progressively by stage: $\{F_i = (R_i, x_i)\}_{i=1}^N$ 0 (Yang et al., 2 Apr 2025, Qin et al., 23 Apr 2025).

2. Network Architecture and Denoising Implementation

RFDiffusion’s architecture extends RoseTTAFold's three-track backbone and incorporates SE(3)-equivariant graph neural networks, notably the SE(3)-Transformer (Yang et al., 2 Apr 2025, Qin et al., 23 Apr 2025). Each residue carries geometric and contextual features, and the denoising head predicts per-residue translation and rotation corrections.

The main network passes noisy sequence (1D), pairwise (2D), and frame (3D) features through multiple stacked blocks (typically 36 or more). Each block executes triangle attention, invariant point attention, and equivariant message passing to maintain SE(3) symmetry. Self-conditioning incorporates previous denoised predictions at each iteration, stabilizing sample trajectories (Yu et al., 27 Jul 2025). Select blocks (such as "main_04") are shown to concentrate secondary structure signals and enable interpretable interventions on feature space (Zarzecki et al., 27 Nov 2025).

3. Training Protocols and Data

The model is trained on the nonredundant Protein Data Bank (PDB), comprising >100,000 high-resolution single-chain structures without explicit length restriction (Qin et al., 23 Apr 2025). For unconditional backbone design, training sets span 19,703 PDB chains (length 60–512), while motif-scaffolding uses a length-restricted subset and mask randomization (Yu et al., 27 Jul 2025).

Optimization employs AdamW with linear warmup to LR ≈ 1e-4 and cosine decay over up to 2 million updates, batch size ≈64 per GPU. The loss combines denoising, geometric, and auxiliary terms with stage-dependent weighting. Some precise hyperparameters (public vs. proprietary checkpoints, optimizer details) are not consistently reported and require reference to official repositories or checkpoints (Yu et al., 27 Jul 2025, Qin et al., 23 Apr 2025).

4. Empirical Performance and Benchmarks

RFDiffusion demonstrates leading designability and accuracy for unconditional backbone generation and competitive performance for motif-scaffolding:

Task	scTM Score	RMSD (Å)	Comments
Uncond. backbone, L=100	0.97 ± 0.01	0.52 ± 0.10	Highest among all methods
Uncond. backbone, L=500	0.90 ± 0.11	3.65 ± 2.95	Graceful degradation
Motif-scaffold (Design24)	mid–0.9s	0.3–0.5	Ranks 3rd (FrameFlow best)

Secondary structure, novelty, and fold diversity are all strong; however, compared to flow-matching models (e.g., FrameFlow), RFDiffusion can be less diverse in generated structures. For de novo protein-protein binders and symmetric assemblies, experimental hit rates and assembly yields consistently surpass those from traditional methods and prior deep generative approaches such as RFjoint and hallucination (Qin et al., 23 Apr 2025, Yang et al., 2 Apr 2025).

Benchmarks using unified training and evaluation frameworks confirm these findings; all methods, including RFDiffusion, show decreased accuracy with increasing backbone length, but RFDiffusion’s performance degrades more slowly than most score- and flow-based competitors (Yu et al., 27 Jul 2025).

5. Interpretability and Control via Sparse Representations

Interpretability of RFDiffusion’s internal representations has been advanced by application of sparse autoencoders (SAEs) to targeted network blocks, identifying features with strong correspondence to secondary structure elements (Zarzecki et al., 27 Nov 2025). Specifically:

Ablation studies localize secondary structure signals to "main_04" block.
Top-K SAEs are trained on block activations, yielding high explained variance (99.1%) and a dictionary of monosemantic features.
Logistic-probe analyses distinguish helix- and strand-promoting features, exposing "antagonistic" indices for fine-grained control.

A steering mechanism modulates these features via a continuous hyperparameter $\{F_i = (R_i, x_i)\}_{i=1}^N$ 1, offering direct, tunable control over helix/strand content during the diffusion process. Quantitatively, sweeping $\{F_i = (R_i, x_i)\}_{i=1}^N$ 2 from $\{F_i = (R_i, x_i)\}_{i=1}^N$ 3 to $\{F_i = (R_i, x_i)\}_{i=1}^N$ 4 alters the helical fraction from ≈10% to ≈50%, with reciprocal changes in strand content and stable coil propensity. Sequence-and-structure post-hoc assessment (ProteinMPNN, ESM2 FID/MMD) validates that such manipulation preserves manifold plausibility and does not destabilize global model behavior.

6. Strengths, Usage Contexts, and Limitations

RFDiffusion’s strengths encompass:

SE(3)-equivariance, capturing both translation and rotation in protein geometry (Yang et al., 2 Apr 2025, Yu et al., 27 Jul 2025).
Self-conditioning for sample stability.
Versatility in conditional generation: motif scaffolding, interface constraints, secondary structure control.
High designability and experimental verification on diverse de novo design problems.
Integrability with downstream sequence decoders (e.g., ProteinMPNN) and post-processing tools.

Limitations include:

Two-stage workflow (structure, then sequence).
Absence of explicit modeling for dynamic ensembles, ligand binding, or non-canonical chemistries.
Implicit reliance on PDB training distributions; membrane proteins and non-natural residues remain underrepresented.
Proprietary optimizer and some hyperparameter choices in official releases inhibit full reproducibility (Yu et al., 27 Jul 2025, Yang et al., 2 Apr 2025).

7. Extensions, Applications, and Future Directions

Current applications include de novo binders (e.g., snake-venom neutralizers, cytokine receptors, pMHC complexes), symmetric nanomaterials, and functional site scaffolds (Yang et al., 2 Apr 2025, Qin et al., 23 Apr 2025). All-atom and ligand-binding extension pipelines are also reported. Future research will likely pursue:

End-to-end joint sequence-structure diffusion.
More dynamic architectural conditioning, including explicit small-molecule and solvent interactions.
Integration of physicochemical priors and wet-lab feedback (active learning, reinforcement learning).
Enhanced interpretability and programmability at the feature level, broadening user control over topological and functional outcomes (Zarzecki et al., 27 Nov 2025).

8. RFDiffusion in Other Domains

The term "RFDiffusion" has separately referred to an efficient algorithm for field integration on graphs encoding point clouds (Choromanski et al., 2023). In this context, RFDiffusion operates as a fast kernel operator on $\{F_i = (R_i, x_i)\}_{i=1}^N$ 5-neighborhood graphs via random feature factorization, primarily for mesh interpolation, Wasserstein distance acceleration, and field harmonics—distinct from its use in protein generative modeling.

RFDiffusion combines architectures rooted in structure prediction, mathematically principled SE(3)-equivariant score-based diffusion, and recent advances in model interpretability and control, establishing itself as a benchmark for de novo protein backbone design and a platform for future methodological development (Qin et al., 23 Apr 2025, Yang et al., 2 Apr 2025, Zarzecki et al., 27 Nov 2025, Yu et al., 27 Jul 2025).