FrozenRep Baseline Overview
- The paper demonstrates that freezing pretrained features leads to efficient adaptation and stability across diverse tasks.
- FrozenRep methods update only lightweight task-specific heads, significantly reducing overfitting and computational costs.
- Empirical comparisons across GANs, action localization, continual learning, and time series forecasting confirm robust performance with limited trainable parameters.
A "FrozenRep" (frozen representation) baseline refers broadly to methods that freeze a major part of a neural network's parameters—typically pretrained feature extractors or lower backbone layers—while only training a limited set of downstream, classifier, or head parameters. This design leverages strong, pretrained, and stable representations for efficient adaptation and transfer, while controlling for overfitting, catastrophic forgetting, or excessive training cost. The concept has been independently instantiated across several subdomains: transfer learning for generative models, spatiotemporal action localization, rehearsal-free continual category discovery, and efficient time series modeling. Rigorous experimental comparisons have shown that such baselines often outperform complex alternatives under data constraints or continual learning settings.
1. Core Principles of Frozen Representation Baselines
The fundamental principle in all FrozenRep approaches is to exploit the inductive biases and generalization capabilities of a pretrained feature extractor and constrain subsequent representation drift by freezing its parameters during adaptation or further learning. Only the final task-specific layers—be it classifier heads, adversarial discriminators, or transformer blocks—are updated, substantially reducing the number of trainable parameters, the risk of overfitting on small or noisy datasets, and the computational overhead associated with large-scale optimization.
This approach is well-motivated in scenarios such as low-resource transfer learning, continual learning without data replay, and parameter-efficient long-range modeling, where retaining powerful representations while limiting model adaptation is beneficial (Mo et al., 2020, Girdhar et al., 2018, Zhang et al., 12 Mar 2025, Singh et al., 25 Aug 2025).
2. Methodological Instantiations Across Domains
2.1 FreezeD for GAN Transfer Learning
FreezeD ("Freeze the Discriminator") is a baseline for fine-tuning generative adversarial networks (GANs) on small target datasets. Starting with a pretrained generator and discriminator , the method freezes the first (e.g., 4) lower layers of and only tunes the remaining upper layers and the generator. Losses (standard or hinge) are computed as usual, but gradients with respect to the frozen layers are blocked; only classifier-like upper layers of and all layers of remain trainable. Optimizer settings and learning rates are left as in the upstream paradigm (e.g., Adam for 50K/20K steps for StyleGAN/SNGAN-projection). This procedure prevents forgetting of generic low-level image features, mitigates overfitting, and stabilizes adversarial training dynamics, especially under data scarcity (Mo et al., 2020).
2.2 FrozenRep for Spatiotemporal Action Localization
In action detection, the "FrozenRep" baseline denotes a Faster R-CNN head trained atop a completely frozen I3D (Inflated 3D ConvNet) backbone, pretrained on large-scale video datasets such as Kinetics. All layers up to a certain block (Mixed_4f) are frozen, providing robust spatiotemporal features. Region proposals are extracted from the frozen features, RoI-pooled across frames, then passed through a small trainable head (subsequent I3D blocks + classification/regression heads). This construction enables precise action localization while significantly outperforming finetuning approaches or nonspatiotemporal backbones, with mean AP improvements from 11%–15% to 22% on AVA v2.1 (Girdhar et al., 2018).
2.3 Freeze and Cluster for Rehearsal-Free Continual Category Discovery
For continual category discovery, the "Freeze and Cluster" (FAC) baseline freezes a self-supervised or pretrained backbone (e.g., DINO ViT-B/16) after adapting it only once on supervised base classes. Subsequent sessions are entirely unlabeled; new classes are discovered by extracting frozen features, estimating the novel class count via over-clustering and cluster merging (with thresholds set by base-class statistics), and assigning pseudo-labels with -means. Only the classifier head is further updated—no representation adaptation occurs—and generative replay maintains performance on previous classes by modeling old clusters as Gaussians and sampling from them. Ablation shows that freezing the feature extractor is critical, as continued adaptation on noisy clusters degrades performance (Zhang et al., 12 Mar 2025).
2.4 Frozen Blocks in Reservoir-Augmented Time Series Transformers
In time series forecasting, FreezeTST alternates frozen (randomly initialized, never updated) and trainable Transformer encoder blocks, harnessing the so-called reservoir mechanism. These frozen blocks, akin to high-dimensional echo-state networks, act as contractive, nonlinear memory stores. Trainable blocks subsequently attend over these fixed features for downstream predictions. This approach halves the number of trainable parameters and accelerates training by 20–30% while maintaining top-level forecasting accuracy on ETTh/ETTm, Weather, and ILI datasets, verifying the effectiveness of frozen features for long-range memory extraction and parameter efficiency (Singh et al., 25 Aug 2025).
3. Algorithmic and Implementation Details
3.1 Representative Pseudocode Patterns
Common steps in FrozenRep baselines include:
- Initial backbone pretraining on large datasets.
- Optional supervised adaptation on an initial labeled session.
- Feature extraction using frozen parameters for subsequent adaptation.
- Downstream adaptation confined to lightweight heads:
- In GANs: only the generator and top layers of the discriminator updated.
- In action localization: only Faster R-CNN head trained, frozen I3D used as backbone.
- In continual learning: only linear classifier updated, backbone fixed.
- In time series: alternating frozen/trainable blocks, only some blocks updated.
Example for FreezeD (Mo et al., 2020):
1 2 3 4 5 6 7 8 9 10 11 12 13 |
G, D = G0, D0 freeze_first_k_layers(D, k) for step in range(T): x = sample_real_images(batch=B) z = sample_latent_vectors(batch=B) # Discriminator step D_real, D_fake = D(x), D(G(z)) loss_D = compute_gan_loss(D_real, D_fake) update_unfrozen_layers(D, loss_D) # Generator step D_fake = D(G(z)) loss_G = compute_gan_loss(D_fake) update_all(G, loss_G) |
3.2 Hyperparameter Regimes
Critical hyperparameters across studies are few:
- Number of frozen layers (): empirically tuned, often 1–4 for deep backbones.
- Learning rates, batch sizes, and optimizer settings as in original training are typically retained.
- For clustering: over-cluster factor (e.g., ), merging thresholds obtained from base class analysis.
- For frozen blocks: random initialization (e.g., Xavier), optional spectral-norm scaling (Lipschitz constraints), alternation schedule for hybrid blocks.
Effectiveness is robust to most choices so long as the majority of informative representations remain frozen, with a small downstream head maintained for task adaptation.
4. Empirical Results and Performance Benchmarks
Across diverse domains, FrozenRep baselines exhibit strong empirical performance:
| Task/Method | FrozenRep Variant | Test Metric | Performance |
|---|---|---|---|
| GAN transfer (StyleGAN) | FreezeD | FID (Lower=Better) | Cat: 69.6, Dog: 61.4 |
| Action localization | FrozenRep (I3D) | mAP (%) | 21.9 (val), 21.91 (test) |
| Continual discovery | FAC | Last/Old/New Acc. | CUB: 66.2/81.2/59.6 |
| Time series forecasting | FreezeTST | MSE (ETTh2, H=96) | 0.274 |
Relative to non-freezing and more complex competitors, freezing representations consistently yields lower overfitting, greater stability, and top-tier or second-best accuracy, despite lower parameter and computational cost (Mo et al., 2020, Girdhar et al., 2018, Zhang et al., 12 Mar 2025, Singh et al., 25 Aug 2025).
5. Mechanistic Insights, Advantages, and Limitations
Mechanisms and Advantages
Key reasons for FrozenRep effectiveness:
- Lower and mid-level deep features are highly transferable across modestly shifted domains.
- Freezing prevents catastrophic forgetting and overfitting in the face of limited or noisy target data.
- Fewer trainable parameters reduce optimization stochasticity and encourage stable dynamics.
- In continual learning, freezing prohibits representation drift due to pseudo-label noise.
- In sequence modeling, random frozen blocks provide rich nonlinear memory—with trainable blocks acting as selectors/query mechanisms.
Limitations
- If domain shift is substantial, an over-constrained frozen backbone may limit adaptation.
- In continual discovery, the method assumes each session contains only novel classes; mixed known/unknown labeling complicates clustering.
- For rare categories (e.g., low-frequency actions in AVA), the frozen backbone may lack sufficient discrimination.
- In practice, cluster-count estimators may mildly overestimate the number of novel classes (Zhang et al., 12 Mar 2025).
- For full parameter efficiency, careful selection of which blocks/layers to freeze is required—freezing too many may harm adaptation while too few may fail to regularize sufficiently.
6. Ablations, Sensitivity Analyses, and Future Directions
Ablation Insights
- GAN transfer: Freezing 3–4 discriminator layers yields optimal FID; freezing the entire discriminator or only the head degrades performance (Mo et al., 2020).
- Action localization: Kinetics pretraining and freezing val-mAP up by 2 points; augmentations and class-agnostic regression necessary for >21 mAP (Girdhar et al., 2018).
- Continual category discovery: FAC ablation shows supervised adaptation, generative replay, and logit normalization all contribute substantially, with freezing being critical to arresting performance degradation (Zhang et al., 12 Mar 2025).
- Time series: Even freezing all Transformer layers except the head tracks a full-transformer performance within 1% MSE (Singh et al., 25 Aug 2025).
Future Research Directions
- Extending frozen representation strategies to settings with partial or ambiguous class overlap.
- Incorporating self-supervised relabeling or active querying to support adaptation under limited supervision.
- Compositional freezing: dynamically determining layers to freeze based on data statistics or validation loss.
- Extensions to other modalities (audio, multimodal fusion), especially as pretrained backbones become more general-purpose.
The "FrozenRep" family of baselines, spanning GAN fine-tuning, temporal detection, continual learning, and transformer-based forecast models, demonstrates that extensive pretraining followed by selective freezing and lightweight adaptation remains an unusually strong, efficient, and robust paradigm in contemporary deep learning (Mo et al., 2020, Girdhar et al., 2018, Zhang et al., 12 Mar 2025, Singh et al., 25 Aug 2025).