FedOLF: Federated Learning with Ordered Layer Freezing

Updated 31 December 2025

The paper introduces a novel federated learning paradigm using ordered layer freezing that minimizes resource use and improves convergence.
It employs both static and adaptive layer freezing schedules, ensuring efficient local updates and rigorous theoretical convergence guarantees.
Empirical evaluations show up to 82% memory reduction and significant accuracy gains across diverse architectures like CNNs, ResNets, and Transformers.

Federated Learning with Ordered Layer Freezing (FedOLF) is an advanced paradigm in federated learning that strategically freezes model layers in a predetermined order, providing efficient distributed training under resource constraints, heterogeneous data, and limited device capacities. In FedOLF, layers are systematically frozen—typically from the shallowest upwards—so that each client only trains (and communicates updates for) the top-most unfrozen layers of the global model. This methodology ensures substantial reductions in computational, communication, and memory overheads, and is supported by formal convergence guarantees and empirical improvements in model accuracy, generalization, and resource usage. FedOLF subsumes and generalizes prior innovations in gradual unfreezing, progressive block freezing, and layer-wise scheduling, and has demonstrated broad efficacy across CNNs, ResNets, Transformers, and hybrid quantum–classical models.

1. Formal Problem Statement and FedOLF Mechanism

FedOLF operates within the conventional FL objective:

$L(\theta) = \sum_{k=1}^K \frac{n_k}{n} F_k(\theta),$

where $\theta \in \mathbb{R}^d$ denotes a neural network partitioned into $N$ ordered layers $(\theta_1, \ldots, \theta_N)$ , and $F_k$ is the empirical risk over local data $D_k$ (Niu et al., 29 Dec 2025). In each global round $t$ , clients receive the global model, but are assigned a freezing index $\ell_k\in\{0,1,\ldots,N-1\}$ according to device capability. Layers $1$ to $\ell_k$ are frozen; only the top layers $\theta \in \mathbb{R}^d$ 0 to $\theta \in \mathbb{R}^d$ 1 are updated locally.

The client-side update step, aggregating only unfrozen layers, is:

$\theta \in \mathbb{R}^d$ 2

where $\theta \in \mathbb{R}^d$ 3 is the frozen lower part, and $\theta \in \mathbb{R}^d$ 4 is the active-layer gradient (Niu et al., 29 Dec 2025). The server aggregates updates for each layer via weighted averaging, integrating unchanged values for frozen layers.

In block-wise extensions—such as SmartFreeze (Yebo et al., 2024)—the global model $\theta \in \mathbb{R}^d$ 5 is split into blocks of consecutive layers, trained and frozen in stagewise fashion, with memory and compute cost dropping proportionally with increased freezing.

2. Layer Freezing Schedules and Adaptive Unfreezing

Ordered layer freezing can be implemented with static schedules—where layers are consistently frozen based on device profiles—or adaptive schedules that track layer convergence and heterogeneity metrics. In FedBug (Kao et al., 2023), local training is divided into a “Gradual Unfreezing (GU)” phase, where one additional layer is unthawed every $\theta \in \mathbb{R}^d$ 6 iterations, and a “vanilla” phase of full model training. The schedule function $\theta \in \mathbb{R}^d$ 7 determines the thawed prefix at iteration $\theta \in \mathbb{R}^d$ 8.

More sophisticated approaches, such as the dynamic layer-wise adaptive freezing in AdeptHEQ-FL (Jahin et al., 9 Jul 2025), rank layers by their parameter-change:

$\theta \in \mathbb{R}^d$ 9

and freeze layers with exponentially averaged $N$ 0, with threshold $N$ 1.

Progressive frameworks also use metric-driven schedules, e.g., block perturbation metrics for per-block freezing as in SmartFreeze (Yebo et al., 2024), or mask-based scheduling for sequential layer expansion (Jang et al., 2024). These adapt the freezing process in real-time, facilitating both rapid initial training of shallow layers and preservation of deeper model adaptability.

3. Memory, Computation, and Communication Efficiency

FedOLF sharply reduces resource consumption by restricting training and backward computation to a limited set of active layers, which minimizes peak activation memory and compute requirements. For all variants, frozen layers are used only in the forward pass, with no gradient updates or backward storage allocation. SmartFreeze achieves up to 82% reduction in memory footprint and 2.02× training speedup on hardware-constrained testbeds, with participant rates reaching 100% versus 0–40% in canonical baselines (Yebo et al., 2024).

Federated Layer-wise Learning (FLL) (Guo et al., 2023) reports that training memory usage falls to 7–22% of full-model FedAvg, with similar drops in FLOPs and communication volume. Integrating Depth Dropout (randomly skipping frozen layers in forward/backward passes) further halves memory and compute with minimal accuracy degradation.

In heterogeneity-aware designs, client selection is governed by feasibility constraints: memory ( $N$ 2), system compute rate, and statistical diversity, optimized per-stage via utility scoring and RL-CD-based community detection (Yebo et al., 2024). Communication overhead is further controlled by approaches such as Tensor Operation Approximation (TOA), which sparsifies frozen layers for transmission, providing robust accuracy preservation compared to classical quantization (Niu et al., 29 Dec 2025).

4. Theoretical Analysis and Convergence Guarantees

FedOLF achieves provable convergence under standard FL assumptions. In (Niu et al., 29 Dec 2025), under $N$ 3-smoothness, bounded variance, and bounded error from freezing:

For $N$ 4, FedOLF attains $N$ 5 in $N$ 6 rounds.
The core bounding technique adapts the descent lemma to partitioned parameter space, with extra error terms accounting for frozen gradients and client sampling.

FedBug establishes that bottom-up unfreezing yields a strictly smaller contraction ratio than vanilla FedAvg in toy linear models, accelerating alignment and mitigating client drift (Kao et al., 2023). Sequential expansion methods (Jang et al., 2024) formalize the equivalence between unfreezing schedules and layer freezing via mask-based parameter updates; convergence rates match FedAvg in the unfrozen subspace, with empirical reductions in variance as freezing progress increases.

No substantial loss in model generality or convergence speed is observed when deploying ordered freezing or dynamic freezing schedules, even in non-IID federated environments with statistical and system heterogeneity (Jahin et al., 9 Jul 2025).

5. Empirical Evaluation and Comparative Results

FedOLF variants are consistently superior to dropout-based and random-freezing baselines under non-IID data splits, as well as in computational and memory-constrained environments.

Method	Memory Reduction	Accuracy Gain	Speedup
SmartFreeze	82%	83.1%	2.02×
FLL + Depth DD	78–92%	≤1% loss	—
FedOLF (TOA)	50–70%	0.3–6.4%	—

On EMNIST, CIFAR-10, CIFAR-100, and CINIC-10, FedOLF achieves 0.3%–6.4% absolute accuracy improvements versus best prior works, along with robust memory and energy savings (Niu et al., 29 Dec 2025). Block-wise freezing in SmartFreeze enables inclusion of all clients, regardless of device capacity, and maintains or improves model utility (Yebo et al., 2024). Layer-sparing in AdeptHEQ-FL realizes 40–60% communication reduction without impacting accuracy, even in hybrid classical–quantum settings (Jahin et al., 9 Jul 2025).

Experiments with sequential expansion scheduling show significant accuracy gains under heavy class and data heterogeneity (up to +6.8% and +8.1% over FedBABU), with computation cost falling by as much as 64% versus FedAvg (Jang et al., 2024).

Depth Dropout has negligible impact on downstream accuracy ( $N$ 70.2%), while further reducing computation and communication (Guo et al., 2023).

6. Applications, Extensions, and Future Directions

FedOLF is compatible with robust aggregation schemes, heterogeneity-aware personalization, and advanced resource-management strategies (Kao et al., 2023, Yebo et al., 2024). Integrations with quantization (TOA), privacy-preserving aggregation (HE, DP), and dynamic block sizing extend its utility to IoT, edge, and quantum federated systems (Niu et al., 29 Dec 2025, Jahin et al., 9 Jul 2025).

It is applicable to diverse architectures: CNNs, ResNets (block-wise granular freezing), Vision Transformers (encoder-group freezing), and hybrid CNN–PQCs. Meta-learning of block freezing criteria and continuous optimization of schedule parameters may further enhance adaptive efficiency.

A plausible implication is that ordered layer freezing constitutes a convergent design space for federated systems balancing model expressivity, communication, and compute constraints under practical heterogeneity. The use of freezing schedules as design variables enables trade-off exploration between model alignment, personalization, and resource allocation.

7. Relation to FL Theory and Open Issues

FedOLF resolves the client drift problem by systematically limiting local adaptation to higher layers while keeping low-level features globally aligned, yielding strong improvements in heterogeneous and non-IID settings (Kao et al., 2023). It generalizes prior approaches such as gradual unfreezing, progressive training, block-wise sparsification, and mask-based scheduling.

Remaining challenges include optimal schedule learning under nonlinear deep architectures, compounded client–system heterogeneity, and exact trade-off characterization between freezing granularity, gradient staleness, and personalization. The extension of FedOLF schemes to privacy-sensitive and adversarial environments (integration with secure aggregation, differential privacy noise models) represents a promising direction, as indicated in block-wise community selection extensions (Yebo et al., 2024, Jahin et al., 9 Jul 2025).

FedOLF is a foundational algorithmic family for energy-, memory-, and communication-efficient federated learning, addressing contemporary constraints in edge-based, collaborative, and quantum-classical domains.