Tiny AutoEncoder with RRDB Blocks
- Tiny AutoEncoder with RRDB Blocks is a compact deep neural architecture that uses dual autoencoders to separately restore luma and chroma channels.
- It employs specialized pipelines (LumiNet and ChromaNet) with three-layer encoders, RRDB trunks, and decoders to achieve high-fidelity image restoration.
- The design leverages nested residual connections and dense blocks to expand the receptive field and enable scalable model reductions for resource-constrained deployments.
A Tiny AutoEncoder (TinyAE) employing Residual-in-Residual Dense Blocks (RRDB) is a deep neural architecture designed to perform high-fidelity image restoration, with prioritization on efficiency and parameter reduction. Originating in the context of robust JPEG artifact removal regardless of compression quality, such architectures utilize the learning and representational capacity of deep residual and dense connections, organized for memory and computation efficiency. The approach divides restoration into two coupled autoencoders—one for luma (LumiNet) and one for chroma (ChromaNet)—and enables complexity scaling via principled reduction in width, depth, and convolutional kernel sizes (Zini et al., 2019).
1. Dual Autoencoder Architecture
The design consists of two distinct autoencoders each specialized for a color subspace:
- LumiNet restores the Y (luminance) channel using 2D convolutions, accepting inputs of size and outputting restored luma.
- ChromaNet processes the channels, leveraging 3D convolutions (in its first layer) for the chroma channels' restoration, using the output of LumiNet as part of its input. ChromaNet maps to .
Each autoencoder contains:
- Encoder: Three sequential convolutional layers with LeakyReLU activations, maintaining the spatial resolution (, constant ), with kernel sizes and .
- RRDB trunk: A stack of (LumiNet) or (ChromaNet) RRDBs, each acting on 64-channel feature maps.
- Decoder: Three additional convolutional layers, mirroring the encoder, leading to final outputs with Tanh activation mapping to .
Batch normalization is omitted throughout, as are downsampling/upsampling operations, to reduce overhead and stabilize training. Weight initialization uses He/Kaiming scaled by 0.1.
2. RRDB Block Structure
The RRDB module is a composite block featuring two nested levels of residual connections:
- Dense Block Sequence: Each RRDB comprises dense layers with growth rate , where at each layer,
The resulting feature depth grows as , which is reduced back to 64 using a convolution bottleneck.
- Inner and Outer Residuals: The dense block is connected by an inner residual scaled by ,
where is the post-bottleneck representation. RRDBs are then stacked, and the outer skip connects the input to the output of all five dense blocks, again with residual scaling.
- Receptive Field Expansion: Each RRDB increases the receptive field by $11$ pixels, resulting in roughly $60$ pixels for LumiNet after five RRDBs.
- Parameterization: Each RRDB contains approximately parameters, with precise enumeration available via:
3. Parameterization and Computational Footprint
The default configuration results in the following parameter counts:
| Component | Encoder | RRDB Trunk | Decoder | Total |
|---|---|---|---|---|
| LumiNet | ≈ 0.3 M | M | ≈ 0.3 M | ≈ 5.8 M |
| ChromaNet | ≈ 0.4 M | M | ≈ 0.3 M | ≈ 3.8 M |
| Combined | — | — | — | ≈ 9.6 M |
The autoencoders operate fully convolutionally, so memory and computational complexity are dominated by the RRDB blocks.
4. Strategies for Constructing Tiny Variants
Tiny autoencoders with RRDBs are created through dimension, depth, and operation reductions, such as:
- Width Scaling: Scaling all channel widths by a factor ; e.g., reduces base channels 64→32, growth rate 32→16.
- RRDB Count Reduction: Reducing and (e.g., , ).
- Dense Layer Pruning: Shortening dense blocks to layers per block.
- Kernel Replacement: Substituting convolutions by pairs of convolutions.
With , , and , the total parameter and MAC count can be reduced to under of the full model, while preserving the recovery pipeline structure.
5. Training Procedure and Loss Formulation
Training employs a pixel-wise loss across both autoencoders:
with . Optimization uses Adam (, ), an initial learning rate of , and $34,000$ images from DIV2K/Flickr2K, covering JPEG quality factors from 10 to 100. There is no weight decay or dropout.
6. Complexity and Deployment Considerations
Let denote parameters with base width and growth :
Inference FLOPs per convolution scale as , so total inference cost is proportional to times the cost of the full model. The architecture is thus highly amenable to resource-constrained deployments by design.
7. Distinction from Related Approaches
The use of RRDBs as described eschews all BatchNorm layers, leverages residual scaling (0.2), and does not introduce channel attention or weight normalization. The model is remarkable for supporting a quality-independent parameterization, using one set of weights for all JPEG quality factors rather than multiple specialized models. This enables robust operation even on compression qualities not seen during training, which is a significant advancement relative to prior art (Zini et al., 2019).