Papers
Topics
Authors
Recent
Search
2000 character limit reached

Curriculum Consistency Model (CCM)

Updated 21 January 2026
  • Curriculum Consistency Model (CCM) is an adaptive framework that employs a PSNR-based metric to standardize the learning challenge across timesteps in generative distillation.
  • It dynamically calibrates teacher iteration steps to stabilize the curriculum and reduce cumulative error, thereby enhancing convergence in both image synthesis and text-to-image tasks.
  • Empirical results show that CCM achieves superior single-step FID and compositional fidelity, accelerating inference and generalizing across various model architectures.

The Curriculum Consistency Model (CCM) is an adaptive framework for distilling generative models, specifically designed to optimize the sampling efficiency in diffusion and flow matching architectures. CCM employs a Peak Signal-to-Noise Ratio (PSNR)–based learning complexity metric and dynamically selects teacher iteration steps to ensure a uniform challenge across timesteps. This approach stabilizes the training curriculum, alleviates accumulated error in knowledge transfer, and demonstrably enhances convergence in both image synthesis and text-to-image tasks. The method generalizes to various model families, including Stable Diffusion XL and Stable Diffusion 3, yielding competitive single-step sampling performance and improved compositional and semantic fidelity (Liu et al., 2024).

1. Curriculum-Learning Complexity Metric via PSNR

CCM quantifies "curriculum difficulty" at each distillation step using the PSNR between student and teacher predictions. For a noisy input xtx_t, the student model fθf_\theta outputs xest=fθ(xt,t,1)x_{\mathrm{est}} = f_\theta(x_t, t, 1), while the teacher prediction xtargetx_{\mathrm{target}} is derived from an ODE solver executing u>tu>t steps, formulated as xtarget=fθ(Solver(xt,t,u;ϕ),u,1)x_{\mathrm{target}} = f_{\theta^-}(\mathrm{Solver}(x_t, t, u; \phi), u, 1). The pixel-wise mean squared error (MSE) is:

MSE(xest,xtarget)=1Dxestxtarget22,\mathrm{MSE}(x_{\mathrm{est}}, x_{\mathrm{target}}) = \frac{1}{D} \| x_{\mathrm{est}} - x_{\mathrm{target}} \|_2^2,

where DD is the dimension of the image. PSNR, a decibel-scaled metric, is computed as:

PSNR(t;u)=10log10((2n1)2MSE(xest,xtarget)),\mathrm{PSNR}(t;u) = 10 \log_{10}\left( \frac{(2^n-1)^2}{\mathrm{MSE}(x_{\mathrm{est}}, x_{\mathrm{target}})} \right),

for bit-depth nn. The Knowledge Discrepancy of the Curriculum (KDC) is then defined:

KDCtu=100PSNR(t;u),\mathrm{KDC}_t^u = 100 - \mathrm{PSNR}(t;u),

with higher KDC values signaling greater learning challenge.

2. Adaptive Curriculum Through Teacher Iterations

Empirical analyses reveal KDC drops as t1t \rightarrow 1 (i.e., as noise intensity diminishes). Traditional Consistency Models (CM) employ a fixed step-size =ut\ell = u-t, but this results in trivial discrepancies for large tt (insufficient challenge) and highly divergent ones at small tt (overly difficult). CCM introduces an adaptive protocol, setting uu such that:

KDCtuTKDC,t[0,1)\mathrm{KDC}_t^u \approx T_{\mathrm{KDC}}, \quad \forall t \in [0, 1)

where TKDCT_{\mathrm{KDC}} is a pre-set threshold (e.g., $60$ dB). With base step-size s1s \ll 1, the teacher iterates:

umin(u+s,1),xuSolver(xt,t,u;ϕ),u \leftarrow \min(u+s, 1), \quad x_u \leftarrow \mathrm{Solver}(x_t, t, u; \phi),

until the KDC criterion is met. The number of steps n=(ut)/sn = \lceil (u-t)/s \rceil increases where the system is less noisy, equalizing learning complexity across the trajectory.

3. CCM Training Loop and Optimization

The CCM training procedure conducts adaptive teacher iteration, target generation, and loss computation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Initialize student f_θ, teacher EMA θ^-  θ
for training iteration = 1  N do
    Sample data x1  p_data
    Sample t  Uniform(0,1)
    Generate noisy state x_t via forward ODE / diffusion
    # 1. Student one‐step estimate
    x_est = f_θ(x_t, t, 1)
    # 2. Find KDC‐adjusted target by multi‐step teacher iteration
    u  t
    x_curr  x_t
    repeat
        u  min(u + s, 1)
        x_curr  Solver(x_curr, t, u; φ)
        x_target_candidate  f_{θ^-}(x_curr, u, 1)
        KDC  100  PSNR(x_est, x_target_candidate)
    until KDC  T_KDC or u = 1
    x_target^{KDC}  x_target_candidate
    # 3. Compute distillation loss
    L_distill  d ( x_est, x_target^{KDC} )
    # 4. (Optional) Adversarial loss
    L_GAN  E[log D(x1)] + E[log (1D(x_est))]
    # 5. Backprop & update θ
    θ  θ  η _θ [ L_distill + λ_GAN L_GAN ]
    # 6. Update EMA teacher
    θ^-  μ θ^- + (1μ) θ  (stopgradient on θ^-)
end for
The loss function d(,)d(\cdot, \cdot) may be L2, L1, or LPIPS, and the solver is commonly an Euler ODE discretization.

4. Consistency-Distillation Loss Formulation

Standard NN-step Consistency Models utilize a loss:

LCDN=En[λ(σn)d(fθ(xσn+1,σn+1,ϵ),fθ(x^ϕ,σn,σn,ϵ))]\mathcal{L}_{CD^N} = \mathbb{E}_{n}\left[ \lambda(\sigma_n) \, d(f_\theta(x_{\sigma_{n+1}}, \sigma_{n+1}, \epsilon), f_{\theta^-}(\hat{x}_{\phi, \sigma_n}, \sigma_n, \epsilon)) \right]

CCM replaces the static (σn,σn+1)(\sigma_n, \sigma_{n+1}) pair with a dynamically selected, KDC-thresholded (t,u)(t, u). The CCM loss is:

LCCM(θ;ϕ)=EtU[0,1)EuKDCtuTKDCEx1,xtx1[d(fθ(xt,t,1),xtargetKDC(u,1))]\mathcal{L}_{CCM}(\theta; \phi) = \mathbb{E}_{t \sim U[0,1)}\, \mathbb{E}_{u\,|\,\mathrm{KDC}_t^u \approx T_{\mathrm{KDC}}}\, \mathbb{E}_{x_1, x_t \mid x_1} \left[ d(f_\theta(x_t, t, 1), x_{\mathrm{target}}^{KDC}(u, 1)) \right]

This loss enforces a consistent discrepancy at every training point, redistributing semantic and low-level focus appropriately.

5. Empirical Performance and Generalization

CCM achieves notable improvements in both unconditional and conditional synthesis. Key empirical metrics include:

  • Single-step FID on CIFAR-10: 1.64 (previous CM best ≈ 1.98)
  • Single-step FID on ImageNet 64×64 (conditional): 2.18 (vs. CTM on diffusion at 1.92, CCM tested mainly on flow-matching base)
  • Text-to-image (T2I) results: For SD3 (28→4 steps), original CLIP/FID: 28.09/99.61 → CCM: 32.42/32.54; for SDXL (40→4 steps), original CLIP/FID: 30.41/70.28 → CCM: 32.60/28.90
  • Compositionality enhancements across T2I-CompBench metrics
  • Over 70% user preference for CCM in direct sample comparison
  • Inference speed: Single-step CCM matches quality achieved by 50–100 step OT-CFM, enabling 50–100× acceleration

Table 1. Select Empirical Metrics from CCM

Task Baseline CCM Outcome
CIFAR-10, FID (NFE=1) ≈1.98 1.64
ImageNet64, FID (NFE=1) 1.92 (CTM) 2.18
SD3 T2I, CLIP/FID 28.09/99.61 32.42/32.54
SDXL T2I, CLIP/FID 30.41/70.28 32.60/28.90

Compositionality, semantic alignment, and robustness to text and object relationships are consistently improved, narrowing the gap to full-step models.

6. Theoretical Rationale and Implications

By maintaining KDC at a constant threshold, CCM ensures that the perceived learning challenge is uniform, avoiding the extremes of overly easy or too difficult curriculum steps. In contrast, traditional CM approaches—where the distillation step \ell often shrinks over time—diminish knowledge gaps and attenuate learning of pertinent semantic features. CCM expands \ell at high-tt (low noise), increasing the strength and reducing the frequency of knowledge transfer steps. This mechanism minimizes the cumulative error prevalent in conventional multi-step distillation (“curse of consistency”). Matching a fixed PSNR discrepancy also yields robust convergence in both fine detail (near t=1t=1) and essential semantics/structure (near t=0t=0). The approach adapts the teacher’s forecast horizon, stabilizing the student’s progression and directly connecting curriculum learning theory with the knowledge transfer protocol in generative modeling.

A plausible implication is that CCM reframes consistency distillation via curriculum theory, offering a unified method for balancing semantic and low-level detail preservation in compressed sampling. This suggests avenues for future research in adaptive curriculum metrics and their interaction with complex generative tasks.

7. Significance and Outlook

CCM integrates a PSNR-governed curriculum-consistency metric with adaptive teacher iteration, establishing state-of-the-art single-step sampling in image generation. Extensions to high-resolution text-to-image synthesis, flow matching architectures, and challenging compositionality tasks demonstrate generalization and efficiency gains. The model’s empirical results and theoretical foundation suggest broader applicability in domains requiring balanced knowledge transfer and scalable sampling. The curriculum principle embedded in CCM may inspire subsequent work in dynamic, metric-driven distillation frameworks for generative modeling and related tasks (Liu et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Curriculum Consistency Model (CCM).