Papers
Topics
Authors
Recent
Search
2000 character limit reached

r100 Series Models for Face Recognition

Updated 30 January 2026
  • r100 Series Models are deep CNN backbones with a 100-layer ResNet design that excel in both unmasked and masked face recognition.
  • They employ bottleneck residual blocks and margin-based loss training to achieve over 99% unmasked and 90% masked verification accuracy at 0.01% FAR.
  • Designed for high-throughput scenarios like civil aviation security, they balance substantial computational demands with superior recognition performance.

The r100 series designates a family of deep convolutional neural network (CNN) backbones specifically tailored for high-accuracy face recognition, particularly in operationally challenging environments such as civil aviation security where masked face prevalence is high. The r100 models are canonical 100-layer residual networks derived from the ResNet architecture, incorporating bottleneck residual blocks and margin-based classification heads. Both standard and masked-specific trained variants exist, with the latter adapted through data augmentation strategies to maximize robustness to face occlusions due to masks. The r100 series is characterized by high verification and retrieval performance at stringent false acceptance rates (FAR), and serves as a leading backbone for large-scale face recognition systems (Zhang et al., 23 Jan 2026).

1. Network Architecture: r100 Series and Masked Variants

The r100 backbone is defined as a 100-layer residual network—a deeper analogue to ResNet-50—utilizing only standard bottleneck residual blocks. The canonical building block applies:

y=F(x,{Wi})+x,F(x)=W3  σ(W2σ(W1x))\mathbf{y} = \mathcal{F}(\mathbf{x}, \{W_i\}) + \mathbf{x}, \quad \mathcal{F}(\mathbf{x}) = W_3 \;\sigma(W_2\,\sigma(W_1\,\mathbf{x}))

where each WjW_j is a convolution followed by batch normalization, and σ\sigma is a ReLU nonlinearity. The global network follows the prototypical ResNet pattern: initial 7x7 convolution, batch normalization, ReLU, and MaxPool (“stem”), followed by four sequential stages of bottleneck blocks, ending with global average pooling, a fully-connected layer, and softmax output. The precise allocation of blocks among the four stages (N1,...,N4)(N_1, ..., N_4) and the full parameter count are not provided; the model is generically referenced as a “100-layer” ResNet (Zhang et al., 23 Jan 2026).

The r100_mask_v2 variant preserves identical architecture but is distinguished by its training data: 15% of examples in the WebFace42M source dataset are masked faces (synthetic or real). No change to layer types, residual block structure, or channel widths is reported.

2. Training Protocols and Hyperparameterization

Training of both r100 and r100_mask_v2 is performed on WebFace42M with 100,000 “live-ID” samples added, employing a classification-style margin loss parametrized as

margin_list=(m1,m2,m3)=(1.0,0.0,0.4)\text{margin\_list} = (m_1, m_2, m_3) = (1.0, 0.0, 0.4)

This margin configuration matches typical CosFace or ArcFace output heads. The routines differ as follows:

Model Init. Learning Rate Epochs LR Decay Type Masked Data Ratio
r100 (v1) 0.30 linear→decay 0%
r100_mask_v2 0.20 30 step/decay 15%

No explicit information is available regarding batch size, optimizer, weight decay, or detailed data loading protocol. The r100_mask_v2 version (as opposed to earlier v1 or alternative v3) is recommended as the primary masked-face model.

3. Quantitative Evaluation: Verification and Retrieval Performance

The series demonstrates strong performance at standard evaluation points: verification at 0.01% FAR on 100k-pair test sets, and face search (top-n accuracy) across galleries up to 100k distractors.

Verification Accuracy

Model Threshold (0.01% FAR) Unmasked Acc. (%) Masked Acc. (%)
r100 0.2996 99.11 80.93
r100_mask_v1 88.98
r100_mask_v2 0.2991 (unmasked) 99.06 (unmasked) 90.07 (masked)
r100_mask_v3 89.70

Search (Retrieval) Performance: r100 vs. r100_mask_v2

Top-1 Accuracy (%)

#Gallery/104 Unmasked r100 Unmasked r100_mask_v2 Masked r100_mask_v2
1 98.18 98.15 89.60
2 97.40 97.32 85.33
3 96.96 96.85 83.39
10 96.15 96.05 80.22

Top-5 Accuracy (%)

#Gallery/104 Unmasked r100 Unmasked r100_mask_v2 Masked r100_mask_v2
1 99.81 99.79 96.55
2 99.59 99.60 94.06
3 99.44 99.43 92.54
10 99.18 99.15 90.30

No confidence intervals or statistical significance tests are provided in the source.

4. Comparative Performance Analysis Against Other Backbones

The r100 family consistently outperforms r50 and r34_mask_v1 backbones and achieves superior results to ViT-Tiny in all relevant metrics. At 0.01% FAR, the r100 backbone yields 99.11% accuracy for unmasked faces, which is +2.18% relative to r50, +2.09% over r34_mask_v1, and +2.88% over ViT-Tiny. For masked face verification, r100_mask_v2 reaches 90.07%, outperforming r50_mask_v2 by +5.27%, r34_mask by +10.20%, and ViT-Tiny_mask by +0.64%. Vision Transformer (ViT) models show competitive Top-5 recall but are penalized by higher computational and memory cost due to global attention mechanisms.

5. Operational Deployment and Trade-offs

The r100 series is intended for deployment in high-throughput environments, such as airport security, where achieving >99% accuracy at low FAR is a priority and compute resource constraints can be relaxed. The series’ substantial computational and storage requirements (on the order of tens of millions of parameters) favor server-class GPU or high-performance CPU implementations with batch processing.

For edge or mobile scenarios, the deployment of r50-mask or ViT-Tiny variants is suggested as a practical trade-off, accepting a minor loss in recognition accuracy in exchange for improved inference speed and reduced memory requirements. The compute/memory demands of r100 preclude routine real-time operation on low-power platforms.

6. Recommendations and Context for Civil Aviation and Masked Face Scenarios

The r100_mask_v2 model is preferred for environments where masked faces are common, such as in post-pandemic civil aviation, as it demonstrates a 5–10% absolute gain over baseline models at the operationally relevant 0.01% FAR. The addition of 15% masked-face examples during training confers substantial robustness, without necessitating architectural modifications. ViT models may be considered where highest Top-5 recall is needed, but impose higher hardware burdens.

The series remains the default recommendation for both masked and unmasked face recognition in aviation security, balancing operational accuracy requirements against hardware availability. Where significant real-time constraints exist, practitioners can downscale to r50_mask or ViT-Tiny, accepting a modest depreciation in performance (Zhang et al., 23 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to r100 Series Models.