EfficientNet Architecture Family

Updated 22 February 2026

EfficientNet is a family of CNN architectures designed for high classification accuracy and parameter efficiency through systematic scaling and neural architecture search.
EfficientNet-V1 introduced compound scaling with a unified baseline, while EfficientNet-V2 refined this approach with training-aware search and fused block designs.
Both generations achieve state-of-the-art ImageNet accuracy and excellent transfer learning performance, reducing computational cost and model size.

EfficientNet is a family of convolutional neural network (CNN) architectures optimized for classification accuracy, parameter efficiency, and training/inference speed, based on systematic network scaling and neural architecture search (NAS). Originating from work by Tan and Le at Google Research, EfficientNet has two primary generations: EfficientNetV1, which introduced compound scaling and a new baseline discovered via NAS, and EfficientNetV2, which advanced the methodology with training-aware architecture search, new fused block designs, and progressive training methods. Both generations have become standard references for high-performance, resource-efficient ConvNet design in computer vision applications (Tan et al., 2019, Tan et al., 2021).

1. Neural Architecture Search and EfficientNet-B0 Baseline

The foundation of the EfficientNet family is the EfficientNet-B0 architecture, obtained through neural architecture search on a variant of the MnasNet search space. The search space includes:

Operator choices: mobile inverted bottleneck convolutions (MBConv), with varying kernel sizes (3×3, 5×5), expansion ratios (1 or 6), optional squeeze-and-excitation (SE) modules, and residual skips.
Stage-wise variation: each of the eight main stages (following an initial 3×3 Conv stem) can select its operator configuration and number of block repeats.

The NAS process optimizes a multi-objective reward: $\text{ACC}(m) \cdot [\text{FLOPs}(m)/T]^w$ where $T=400$ M FLOP target and $w=-0.07$ , enforcing a trade-off between accuracy and computational cost. The resulting architecture (EfficientNet-B0) forms the template for subsequent scaling (Tan et al., 2019).

The structure of EfficientNet-B0 is shown below:

Stage	Operator	Resolution	Channels	Repeats
1	Conv3×3	224×224	32	1
2	MBConv1 k3	112×112	16	1
3	MBConv6 k3	112×112	24	2
4	MBConv6 k5	56×56	40	2
5	MBConv6 k3	28×28	80	3
6	MBConv6 k5	14×14	112	3
7	MBConv6 k5	14×14	192	4
8	MBConv6 k3	7×7	320	1
9	Conv1×1 → Pool→FC	7×7	1280	1

Key operator details:

Each MBConv: $1 \times 1$ expansion (BatchNorm + SiLU), $k \times k$ depthwise (BatchNorm + SiLU), $1 \times 1$ projection (BatchNorm), SE module between depthwise and projection, and skip if input/output shapes match.
All activations use SiLU/Swish: $f(x) = x \cdot \sigma(x)$ .

2. Compound Model Scaling

EfficientNet introduces "compound scaling," a principled method to simultaneously scale the three architectural dimensions: network depth $d$ , width $w$ , and input resolution $r$ , controlled by a global user-specified coefficient $\phi$ :

$d = \alpha^\phi, \quad w = \beta^\phi, \quad r = \gamma^\phi$

The coefficients $\alpha, \beta, \gamma \geq 1$ are chosen such that $\alpha \cdot \beta^2 \cdot \gamma^2 \approx 2$ (doubling FLOPs for each $\phi$ ), reflecting the cost structure of Conv layers. For EfficientNet, grid search yields $\alpha=1.2$ , $\beta=1.1$ , $\gamma=1.15$ . Integral rounding is applied to block counts and resolution.

This compound method outperforms traditional strategies that scale only a single axis, avoiding diminishing returns caused by excessive depth, width, or resolution alone. Empirically, uniform compound scaling delivers substantial accuracy improvements for fixed computation budgets (Tan et al., 2019).

3. EfficientNet-V1 and V2 Families: Model Variants and Scaling Progression

The original EfficientNet-V1 family comprises B0 to B7, with each successive model scaled according to the compound coefficient $\phi$ :

Model	$\phi$	Input	Params	FLOPs	Top-1 Acc
B0	0	224×224	5.3M	0.39B	77.1%
B1	1	240×240	7.8M	0.70B	79.1%
B2	2	260×260	9.2M	1.0B	80.1%
B3	3	300×300	12M	1.8B	81.6%
B4	4	380×380	19M	4.2B	82.9%
B5	5	456×456	30M	9.9B	83.6%
B6	6	528×528	43M	19B	84.0%
B7	7	600×600	66M	37B	84.3%

Regularization (dropout, stochastic depth, AutoAugment) increases with model scale. At ImageNet scale, B7 achieves 84.3% top-1 accuracy, matching far larger models (e.g., 556M-param GPipe) with an approximately $8\times$ reduction in size and $6\times$ increase in speed (Tan et al., 2019).

EfficientNetV2 refines model scaling by capping max resolution at $480\times480$ , heuristically increasing block counts in later stages. It further reduces over-parameterization at high input resolutions and improves parameter efficiency by adding layers where useful (Tan et al., 2021).

4. Architectural Innovations: Fused-MBConv and Operator Search

EfficientNetV2 introduces the Fused-MBConv module. In standard MBConv, a $1\times1$ expansion precedes a $k\times k$ depthwise convolution. Fused-MBConv merges these into a single $k\times k$ regular convolution, then optionally SE/projection. Parameters for Fused-MBConv are $M_{in} \cdot M_{out} \cdot k^2$ ; for MBConv, the total is $(M_{in} \cdot r) \cdot M_{in} \cdot 1^2$ [expand], $+ (M_{in} \cdot r) \cdot k^2$ [depthwise], $+ (M_{in} \cdot r) \cdot M_{out} \cdot 1^2$ [project].

Fused-MBConv reduces memory-access overhead in early stages (where input dimensions are large, and $M_{in}$ , $M_{out}$ are small). EfficientNetV2's architecture search operates in a space containing both MBConv and Fused-MBConv blocks, expansion ratios of $\{1,4,6\}$ , kernel sizes $\{3,5\}$ , and stagewise combinations. This approach systematically selects block types and placements for training and inference efficiency on modern accelerators (Tan et al., 2021).

5. Progressive Learning and Adaptive Regularization

EfficientNetV2 introduces progressive training in which input image size increases over $M$ training stages: $S_i = S_0 + (S_e - S_0)i/(M-1)$ . Training at smaller sizes early accelerates convergence and reduces computation, while later large images provide high final accuracy. To counteract accuracy drop typical of progressive resizing, EfficientNetV2 linearly interpolates regularization strength (dropout, RandAugment magnitude, mixup $\alpha$ ) in lockstep with image size:

$\phi^k_i = \phi^k_0 + (\phi^k_e - \phi^k_0)\frac{i}{M-1}$

This "adaptive regularization" significantly improves accuracy during progressive learning, mitigating capacity overfitting in later stages and stabilizing training dynamics (Tan et al., 2021).

6. Performance, Transfer, and Empirical Findings

EfficientNet architectures show strong performance on both standard and transfer learning tasks:

On ImageNet, B7 (V1) reaches 84.3% top-1 accuracy at 66M parameters and 37B FLOPs. EfficientNetV2-L achieves 85.7% at 120M parameters and 53B FLOPs.
EfficientNetV2-M matches B7’s accuracy (84.7%) with much faster training (13h vs 139h on TPUv3), and 2–3 $\times$ inference speedup at similar accuracy.
Transfer learning results are consistently state-of-the-art: on CIFAR-100, EfficientNetV1-B7 and V2-L achieve 91.7% and 92.3%, respectively; Flowers-102 achieves 98.8% with V1-B7, matched by V2-L (Tan et al., 2019, Tan et al., 2021).
With ImageNet21k pretraining, EfficientNetV2-L achieves 86.8% top-1, outperforming ViT-L/16 (85.3%) while requiring fewer parameters (120M vs 304M) and less compute (53B vs 192B FLOPs).

7. Impact and Directions

EfficientNet's compound scaling and NAS framework have set new standards for scaling ConvNet architectures and resource-efficient model design for both academia and production. The Fused-MBConv and adaptive progressive training of EfficientNetV2 demonstrate the impact of coupling architecture search with training dynamics and hardware-aware optimization.

A plausible implication is that the EfficientNet family’s principles—systematic multidimensional scaling, block-level heterogeneity, and training-aware search—represent an enduring blueprint for the design of high-accuracy, efficient backbones in computer vision, including beyond classification to detection and segmentation (Tan et al., 2019, Tan et al., 2021).

Markdown Report Issue Upgrade to Chat

References (2)

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (2019)

EfficientNetV2: Smaller Models and Faster Training (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EfficientNet Architecture Family.

EfficientNet Architecture Family

1. Neural Architecture Search and EfficientNet-B0 Baseline

2. Compound Model Scaling

3. EfficientNet-V1 and V2 Families: Model Variants and Scaling Progression

4. Architectural Innovations: Fused-MBConv and Operator Search

5. Progressive Learning and Adaptive Regularization

6. Performance, Transfer, and Empirical Findings

7. Impact and Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

EfficientNet Architecture Family

1. Neural Architecture Search and EfficientNet-B0 Baseline

2. Compound Model Scaling

3. EfficientNet-V1 and V2 Families: Model Variants and Scaling Progression

4. Architectural Innovations: Fused-MBConv and Operator Search

5. Progressive Learning and Adaptive Regularization

6. Performance, Transfer, and Empirical Findings

7. Impact and Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research