Papers
Topics
Authors
Recent
Search
2000 character limit reached

Factorized Diffusion Policies in Robotics

Updated 16 February 2026
  • Factorized Diffusion Policies (FDP) are methods that decompose diffusion-based robot policies into modular components to enhance sample efficiency and robustness.
  • FDP leverages observation modality prioritization, expert model factorization, and parameter space decomposition to tailor policy architectures for diverse tasks.
  • Empirical results show significant improvements in success rates, training efficiency, and resilience to sensor noise across various robotic manipulation benchmarks.

Factorized Diffusion Policies (FDP) comprise a class of techniques that decompose diffusion-based policies into modular components for improved sample efficiency, robustness, multitask generalization, and adaptability in robot skill learning. FDP frameworks enable either (a) observation modality prioritization—where the influence of distinct sensing modalities can be explicitly controlled—or (b) factorization of the action distribution into a product or mixture of expert diffusion models, each capturing different behavioral sub-modes or subtasks. This modularization can be achieved by architectural partitioning, score aggregation, or via parameter space decomposition. FDP models have demonstrated significant empirical benefits in both low-data regimes and settings characterized by distributional shift or catastrophic forgetting.

1. Fundamentals of Diffusion Policies in Robot Learning

Diffusion models are generative frameworks that define a fixed Gaussian noising process ("forward" process) and learn to reverse it via a parameterized denoising kernel ("reverse" process), typically predicting the added noise at each step. In robotic skill imitation, these methodologies have been leveraged to map multi-modal sensory observations (e.g., proprioception, vision, tactile) to target action trajectories. The canonical approach formulates the reverse kernel as

p(at1at,o1,,oM)N(atwtεθ(at,o1,,oM,t),σt2I),p(a_{t-1}\mid a_t, o^{1},\dots,o^{M}) \approx \mathcal N(a_{t} - w_t\,\varepsilon_\theta(a_t, o^{1},\dots,o^{M}, t), \sigma^2_t\mathbf{I}),

where εθ\varepsilon_\theta is a neural network predicting the conditional noise. In standard diffusion policy implementations, all modalities are concatenated or otherwise fused prior to conditioning (“joint” conditioning). However, this approach yields suboptimal data efficiency when modalities are unequally informative and exposes the policy to robustness failures from spurious correlations or sensor-specific noise (Patil et al., 20 Sep 2025).

2. Mathematical Formulations and Model Factorizations

FDP methodologies operationalize modularity by separating the generative policy or its conditioning space in a principled manner:

2.1 Observation Modality Prioritization

Rather than conditioning the denoising kernel jointly on all modalities, FDP selects k<Mk < M prioritized modalities o1:ko^{1:k} and treats the remaining ok+1:Mo^{k+1:M} as secondary. The policy score atlogp(ato1:M)\nabla_{a_t}\log p(a_t|o^{1:M}) is factorized as

atlogp(ato1:M)=atlogp(ato1:k)+atlogp(ok+1:Mat,o1:k),\nabla_{a_t}\log p(a_t|o^{1:M}) = \nabla_{a_t}\log p(a_t|o^{1:k}) + \nabla_{a_t}\log p(o^{k+1:M} | a_t, o^{1:k}),

corresponding to a base score model conditioned on the priority modalities and a residual model capturing the information gain from the secondary ones. The two models are trained sequentially using mean-squared error (MSE) losses on the conditional noise prediction, with the residual model correcting the frozen base (Patil et al., 20 Sep 2025).

2.2 Modular Task and Behavioral Factorization

For multitask or highly multimodal action distributions, FDP can be instantiated as a composition of NN specialized diffusion experts. Each expert pi(ao)p_i(a|o) models a distinct behavioral sub-mode. The overall action distribution is represented as a product-of-experts:

p(ao)i=1Npi(ao)wi(o),p(a|o) \propto \prod_{i=1}^N p_i(a|o)^{w_i(o)},

with a learned router predicting the convex weights wi(o)w_i(o). At each diffusion step, the aggregate score is

aklogp(ako)i=1Nwi(o)εθi(ak,o,k),\nabla_{a^k}\log p(a^k|o) \approx \sum_{i=1}^N w_i(o)\,\varepsilon_{\theta_i}(a^k, o, k),

enabling the policy to exploit regime-specific expert predictions (Liu et al., 26 Dec 2025).

2.3 Parameter Space Decomposition

An alternative form of factorization involves decomposing model parameters, e.g., via truncated Singular Value Decomposition (SVD) of network weights. In rank-rr factorized diffusion policies, each layer weight is split into a low-rank (trainable) and orthogonal (frozen) component, modulating network expressivity and computational cost as training progresses (Sun et al., 6 Feb 2025).

3. FDP Architectures and Training Procedures

The architectural instantiations of FDP depend on the targeted factorization:

  • Observation-prioritized FDP: The base network (e.g., UNet or DiT) is conditioned on prioritized modalities only; the residual network uses all modalities and injects corrections via FiLM-style or zero-initialized adapter connections. Training proceeds with the base model first, after which it is frozen and the residual is trained (Patil et al., 20 Sep 2025).
  • Expert/Modular FDP: Each expert is an independent denoising network (typically sharing lower-level encoders) and receives a weight from a separate router MLP per observation. All modules are trained end-to-end via the aggregated noise-prediction loss (Liu et al., 26 Dec 2025).
  • Parameter-factorized FDP: Each layer is decomposed via SVD; the number of trainable singular vectors is adjusted over epochs via a scheduling strategy. This implementation does not alter the computation graph for the forward pass, but reduces backpropagation cost (Sun et al., 6 Feb 2025).

Example FDP Training Loop (Observation Prioritization)

1
2
3
4
5
6
7
8
9
10
11
12
13
for epoch in base_training_epochs:
    sample (a_0, o^{1:k}), t, ε
    compute a_t = ᾱ_t a_0 + (1ᾱ_t) ε
    L_base = ||ε - ε_base(a_t, o^{1:k}, t)||^2
    update ε_base

freeze(ε_base)

for epoch in residual_training_epochs:
    sample (a_0, o^{1:M}), t, ε
    compute a_t = ᾱ_t a_0 + (1ᾱ_t) ε
    L_res = ||ε - ε_base(a_t, o^{1:k}, t) - ε_res(a_t, o^{1:M}, t)||^2
    update ε_res

4. Empirical Results and Evaluation

FDP methods have been evaluated on diverse robotic manipulation benchmarks, including RLBench (vision + proprioception), Adroit (hand, state + prop), Robomimic (env-state + prop), M3L insertion (vision + tactile), and real-world tasks (Close Drawer, Put Block in Bowl, etc.).

Summary of Key Empirical Findings

Setting Metric FDP Result Baseline Gain
RLBench, 10 demos Success rate 44% (prop>vision) 29% (joint DiT) +15 pt
M3L insertion, 100 demos Success rate 48–50% (vision>tactile) 22% +26 pt
RLBench distractor shift Success rate under distribution shift ≈70% (prop>vision) ≈30% (joint DiT) +40 pt
Real-robot (occlusion, distractor) Success rate (Fold Towel, Put In Bowl…) ~60% (prop>vision) ~5–15% (joint DiT) +40 pt
MetaWorld multitask (Liu et al., 26 Dec 2025) Avg. success over 6 tasks 74.8% (FDP) 70.8% (DP), 69.8% SDP +4–5 pt
Adaptation/fine-tuning (few demos) Retention after adaptation >90% with 27% params Full-finetune baseline Comparable
Parameter factorization (Sun et al., 6 Feb 2025) Training time, simulated tasks 7–10% reduction, no SR drop (CT: 4.03h vs. 4.35h) - 7–10% faster
Parameter factorization (Sun et al., 6 Feb 2025) Training time, real tasks Up to 18% speedup online - -

Additional findings:

  • Robustness: Factorized observation policies drastically outperform joint models under vision corruptions (distractors, occlusion).
  • Low-data efficiency: Prioritizing informative modalities yields absolute gains up to 20 pt over baselines in sparse demonstration settings.
  • Multitask transfer and forgetting: Modular experts yield efficient adaptation to new tasks with no catastrophic loss on old skills, especially when combined with a small replay buffer.
  • Parameter factorization: Rank scheduling (e.g., sigmoid, τ=0.5\tau=0.5) enables faster batch times (up to \sim20%) with minimal or no loss in performance on a range of simulated and real setups (Sun et al., 6 Feb 2025).

5. Analysis, Implications, and Limitations

FDPs introduce a task/setting-dependent flexibility:

  • Sample efficiency is improved by enforcing a strong prior over primary modalities, reducing overfitting to redundant or noisy sensory features (Patil et al., 20 Sep 2025).
  • Robustness arises since the residual corrector cannot fully override the base predictions, limiting the failure modes induced by sensor noise or novel distractors.
  • Efficient multitask learning: Modular FDPs (expert composition) facilitate specialization, allowing individual modules to adapt or be replaced without disrupting baseline competencies, thereby addressing catastrophic forgetting (Liu et al., 26 Dec 2025).
  • Parameter management: Dynamic low-rank scheduling decreases computational demands, promoting practical training in resource-constrained or online-interactive (e.g., DAgger) settings (Sun et al., 6 Feb 2025).

Limitations cited across FDP literature include:

  • Hyperparameter burden: Modality ranking, number/ordering of experts, and rank schedule require tuning; automated strategies are an open area.
  • Static prioritization: Current observation-priority schemes are trajectory-invariant; dynamic or state-dependent prioritization may provide further gains.
  • Scope: Some approaches (notably rank-based) have been explored primarily in imitation learning, not fully on-policy RL.
  • Component specialization: While component-wise specialization is demonstrated empirically, systematic analysis of functional roles remains an open area.

FDPs relate fundamentally to modular and compositional learning, mixture-of-experts (MoE) models, and conditional or product-of-experts score aggregation in generative modeling. FDP’s soft aggregation of per-expert scores mitigates MoE routing instabilities and encourages skill specialization (Liu et al., 26 Dec 2025). The residual correction paradigm inherits from classifier guidance/classifier-free guidance in diffusion models (see Dhariwal & Nichol 2021). Parameter factorization is conceptually aligned with adaptive pruning, low-rank neural adaptation, and efficient subspace optimization in large-scale models.

A plausible implication is that FDP-style decompositions can serve as a substrate for future advances in data-efficient, robust, and scalable policy learning in diverse real-world robotic domains (Patil et al., 20 Sep 2025, Liu et al., 26 Dec 2025, Sun et al., 6 Feb 2025).

7. Outlook and Future Directions

Prominent open directions for FDP include:

  • Automated prioritization and dynamic routing over observation modalities and experts, enabling context-sensitive adaptation.
  • Integration with Vision-Language-Action (VLA) models for safe finetuning and generalization to previously unseen input modalities (Patil et al., 20 Sep 2025).
  • Heterogeneous architectures for expert modules (e.g., combining UNet and transformer backbones).
  • Systematic removal or ablation of expert components to elucidate distributed skill encoding (Liu et al., 26 Dec 2025).
  • Application to lifelong and continual learning scenarios with ongoing task acquisition, seeking robust knowledge retention beyond replay buffer approaches.

Factorized Diffusion Policies thus offer a unified perspective on compositional generative policy design, delivering quantifiable gains in efficiency, robustness, and flexibility across robot learning challenges.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Factorized Diffusion Policies (FDP).