Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tiny AutoEncoder with RRDB Blocks

Updated 19 December 2025
  • Tiny AutoEncoder with RRDB Blocks is a compact deep neural architecture that uses dual autoencoders to separately restore luma and chroma channels.
  • It employs specialized pipelines (LumiNet and ChromaNet) with three-layer encoders, RRDB trunks, and decoders to achieve high-fidelity image restoration.
  • The design leverages nested residual connections and dense blocks to expand the receptive field and enable scalable model reductions for resource-constrained deployments.

A Tiny AutoEncoder (TinyAE) employing Residual-in-Residual Dense Blocks (RRDB) is a deep neural architecture designed to perform high-fidelity image restoration, with prioritization on efficiency and parameter reduction. Originating in the context of robust JPEG artifact removal regardless of compression quality, such architectures utilize the learning and representational capacity of deep residual and dense connections, organized for memory and computation efficiency. The approach divides restoration into two coupled autoencoders—one for luma (LumiNet) and one for chroma (ChromaNet)—and enables complexity scaling via principled reduction in width, depth, and convolutional kernel sizes (Zini et al., 2019).

1. Dual Autoencoder Architecture

The design consists of two distinct autoencoders each specialized for a color subspace:

  • LumiNet restores the Y (luminance) channel using 2D convolutions, accepting inputs of size H×W×1H\times W \times 1 and outputting restored luma.
  • ChromaNet processes the [Y,Cb,Cr][Y', C_b, C_r] channels, leveraging 3D convolutions (in its first layer) for the chroma channels' restoration, using the output of LumiNet as part of its input. ChromaNet maps H×W×3H\times W\times 3 to H×W×2H\times W\times 2.

Each autoencoder contains:

  • Encoder: Three sequential convolutional layers with LeakyReLU activations, maintaining the spatial resolution (stride=1\text{stride}=1, constant H×WH\times W), with kernel sizes 3×33\times3 and 5×55\times5.
  • RRDB trunk: A stack of BY=5B_Y = 5 (LumiNet) or BC=3B_C = 3 (ChromaNet) RRDBs, each acting on 64-channel feature maps.
  • Decoder: Three additional convolutional layers, mirroring the encoder, leading to final outputs with Tanh activation mapping to [1,1][-1, 1].

Batch normalization is omitted throughout, as are downsampling/upsampling operations, to reduce overhead and stabilize training. Weight initialization uses He/Kaiming scaled by 0.1.

2. RRDB Block Structure

The RRDB module is a composite block featuring two nested levels of residual connections:

  • Dense Block Sequence: Each RRDB comprises L=5L=5 dense layers with growth rate g=32g=32, where at each layer,

x=LReLU(Conv3×3([F0,x1,...,x1]))x_\ell = \mathrm{LReLU}\left(\mathrm{Conv}^{3\times3}\left([F_0, x_1, ..., x_{\ell-1}]\right)\right)

The resulting feature depth grows as 64+532=22464 + 5 \cdot 32 = 224, which is reduced back to 64 using a 1×11 \times 1 convolution bottleneck.

  • Inner and Outer Residuals: The dense block is connected by an inner residual scaled by β=0.2\beta = 0.2,

Fout=F0+0.2FDBF_{\rm out} = F_0 + 0.2 \, F_{\rm DB}

where FDBF_{\rm DB} is the post-bottleneck representation. RRDBs are then stacked, and the outer skip connects the input to the output of all five dense blocks, again with residual scaling.

  • Receptive Field Expansion: Each RRDB increases the receptive field by $11$ pixels, resulting in roughly $60$ pixels for LumiNet after five RRDBs.
  • Parameterization: Each RRDB contains approximately 1.03×1061.03 \times 10^6 parameters, with precise enumeration available via:

=15(9(64+(1)g)g+g)+(22464+64)\sum_{\ell=1}^5\bigl(9 \cdot (64 + (\ell-1)g )g + g\bigr) + (224 \cdot 64 + 64)

3. Parameterization and Computational Footprint

The default configuration results in the following parameter counts:

Component Encoder RRDB Trunk Decoder Total
LumiNet ≈ 0.3 M 5×1.035\times1.03 M ≈ 0.3 M ≈ 5.8 M
ChromaNet ≈ 0.4 M 3×1.033\times1.03 M ≈ 0.3 M ≈ 3.8 M
Combined ≈ 9.6 M

The autoencoders operate fully convolutionally, so memory and computational complexity are dominated by the RRDB blocks.

4. Strategies for Constructing Tiny Variants

Tiny autoencoders with RRDBs are created through dimension, depth, and operation reductions, such as:

  • Width Scaling: Scaling all channel widths by a factor α(0,1]\alpha \in (0,1]; e.g., α=0.5\alpha = 0.5 reduces base channels 64→32, growth rate 32→16.
  • RRDB Count Reduction: Reducing BYB_Y and BCB_C (e.g., BY=3B_Y = 3, BC=2B_C = 2).
  • Dense Layer Pruning: Shortening dense blocks to L=3L = 3 layers per block.
  • Kernel Replacement: Substituting 5×55\times5 convolutions by pairs of 3×33\times3 convolutions.

With α=0.5\alpha = 0.5, BY=3B_Y = 3, and L=3L = 3, the total parameter and MAC count can be reduced to under 20%20\% of the full model, while preserving the recovery pipeline structure.

5. Training Procedure and Loss Formulation

Training employs a pixel-wise 1\ell_1 loss across both autoencoders:

L=1Nn=1NY^nYn1+λchroma[C^b,n,C^r,n][Cb,n,Cr,n]1\mathcal{L} = \frac{1}{N} \sum_{n=1}^{N} \left\| \hat Y_n - Y_n^{\star} \right\|_1 + \lambda_{\rm chroma} \left\| [\hat{C}_{b,n},\hat{C}_{r,n}] - [C_{b,n}^{\star},C_{r,n}^{\star}] \right\|_1

with λchroma=1\lambda_{\rm chroma} = 1. Optimization uses Adam (β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999), an initial learning rate of 2×1042\times 10^{-4}, and $34,000$ images from DIV2K/Flickr2K, covering JPEG quality factors from 10 to 100. There is no weight decay or dropout.

6. Complexity and Deployment Considerations

Let P(α,B,L,g)P(\alpha,B,L,g) denote parameters with base width c0=64αc_0 = 64\alpha and growth g=32αg = 32\alpha:

P(α,B,L,g)B(L(9(c0+(1)g)g+g)+(c0+Lg)c0)P(\alpha,B,L,g) \approx B \Big( L (9 (c_0 + (\ell-1)g)g + g ) + (c_0 + Lg)c_0 \Big)

Inference FLOPs per convolution scale as 2HWCinCoutk22HWC_{\rm in}C_{\rm out}k^2, so total inference cost is proportional to α2(B/5)\alpha^2(B/5) times the cost of the full model. The architecture is thus highly amenable to resource-constrained deployments by design.

The use of RRDBs as described eschews all BatchNorm layers, leverages residual scaling (0.2), and does not introduce channel attention or weight normalization. The model is remarkable for supporting a quality-independent parameterization, using one set of weights for all JPEG quality factors rather than multiple specialized models. This enables robust operation even on compression qualities not seen during training, which is a significant advancement relative to prior art (Zini et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tiny AutoEncoder and RRDB Blocks.