Papers
Topics
Authors
Recent
Search
2000 character limit reached

Asymmetric Encoder–Decoder Structures

Updated 16 February 2026
  • Asymmetric Encoder–Decoder Structures are designs where the encoder and decoder have distinct, unbalanced complexities optimized for specific operational constraints.
  • They leverage differences in algorithmic order, computational burden, and information access to enhance performance in tasks such as lossless compression, semantic segmentation, and federated learning.
  • Real-world implementations, like DeepLab v3+, LEDNet, and AsymLLIC, illustrate how shifting complexity can yield efficient processing and reduced computational costs.

An asymmetric encoder–decoder structure is one in which the computational or representational complexities of the encoder and decoder are intentionally unbalanced, with each sub-module optimized for a distinct operational or domain constraint. This architectural asymmetry is deployed across information theory, signal processing, neural networks, and distributed learning, enabling efficiency or expressivity unattainable with symmetric designs. Asymmetry can manifest in module depth, parameterization, algorithmic ordering, or resource utilization, and is commonly leveraged for tasks such as compression, semantic segmentation, federated learning, and source-channel coding.

1. Formal Principles of Asymmetric Encoder–Decoder Structures

The core defining element is the partition of the end-to-end system into an encoder E\mathcal{E} and decoder D\mathcal{D}, with the design constraint that E\mathcal{E} and D\mathcal{D} do not mirror each other in architecture, workload, or algorithmic ordering. Asymmetry can be imposed at several levels:

  • Algorithmic order: As in backward-order encoding with forward-order decoding (Yamamoto et al., 16 Jan 2026).
  • Computational burden: A deep, heavy encoder paired with a shallow, lightweight decoder (or vice versa) (Wang et al., 2024, Chen et al., 2018, Wang et al., 2019).
  • Information access: Encoder and decoder possess different side information or priors, such as in channel coding with decoder side information (Muramatsu et al., 2016).
  • Role specialization: Encoder generates general or task-agnostic representations, while decoder performs specialized, context-dependent processing (Zhou et al., 14 Apr 2025, Fu et al., 2023).
  • Weight sharing vs. discrimination: Decoder branches may share weights and specialize post-hoc based on split representations, as in speaker separation (Shin et al., 2024).

Asymmetric design often exploits deployment constraints (e.g., intensive encoding in the cloud, lightweight decoding at clients or edge devices) and modular training strategies to optimize global performance.

2. Asymmetry in Lossless Data Compression

In lossless compression, Yamamoto & Iwata’s Asymmetric Encoding-Decoding Scheme (AEDS) generalizes the tabled variant of Asymmetric Numeral Systems (tANS) by decoupling the codebook symmetry constraint (Yamamoto et al., 16 Jan 2026). AEDS encodes a symbol sequence s=s1s2sn\bm{s}=s_1 s_2 \cdots s_n in reverse, starting from the last symbol, via backward-order recursion:

  • At each timestep t=n,,1t=n,\ldots,1, the encoder emits a bit-string βt=Ex^t(st)\beta_t = E_{\hat{x}_t}(s_t) and transitions to a new state xt1=Fx^t(st)x_{t-1}=F_{\hat{x}_t}^{-}(s_t).
  • The decoder processes the bitstream in forward order, using prefix-free codebooks Bx(D)\mathcal{B}^{(D)}_x at each state to recover st=Dxt1(βt)s_t = D_{x_{t-1}}(\beta_t) and state xt=Fxt1+(βt)x_t = F^+_{x_{t-1}}(\beta_t).

AEDS admits a far broader class of code mappings than tANS: the code-word lengths for a symbol ss can vary arbitrarily across states, and the prefix code-trees Bx(D)\mathcal{B}_x^{(D)} need not synchronize. This structural freedom allows AEDS with as few as N=2N=2 (resp., $5$) states to outperform Huffman coding when the leading Huffman branch exceeds probability $0.61803$ (resp., $0.56984$). The average code length achieves entropy up to an O(1/N)O(1/N) redundancy, matching tANS in the limit but surpassing it in expressivity and potential compression efficiency.

3. Computational Asymmetry in Neural and Signal Processing Architectures

Neural segmentation architectures such as DeepLab v3+ and LEDNet epitomize computationally asymmetric encoder–decoder designs:

  • DeepLab v3+: A very deep encoder (ResNet-101/Xception with Atrous Spatial Pyramid Pooling) extracts multi-scale semantic features at aggressive downsampling (OS=16\text{OS}=16). The decoder is a shallow module with two or fewer 3×33\times3 convolutions and minimal channel width, focusing on spatial refinement and boundary sharpening (Chen et al., 2018). This asymmetry ensures high semantic fidelity with minimal overhead: encoder 70\sim 70–$80$B MACs and >40> 40M parameters; decoder <1<1M parameters, $10$–$15$B MACs.
  • LEDNet: Over 95%95\% of parameters and 92%92\% of FLOPs reside in the encoder (deep ResNet with split-shuffle), while the decoder (Attention Pyramid Network) comprises <5%<5\% of overall complexity (Wang et al., 2019). LEDNet achieves real-time rates ($71$ FPS) and mIoU competitive with much larger symmetric designs.

This division is motivated by empirical findings: heavy computation allocated to context aggregation and global representation is best situated in the encoder, while the decoder only needs lightweight spatial refinement or reweighting to recover precise object boundaries or class maps.

4. Asymmetry in Learned Compression and Federated Learning

Asymmetric encoder–decoder design is increasingly central in learned compression and distributed learning:

  • AsymLLIC: Learned image compression with a complex encoder (full Swin+ResNet blocks, full hyperprior) and a lightweight decoder (no shifted windows, channel-reduced, reversed pyramid, grouped context) (Wang et al., 2024). Training substitutes each complex decoder module with lighter variants in stages, maintaining R-D performance competitive with VVC but reducing decoder complexity to $51.47$ GMACs and $19.65$M parameters. Empirically, this structure occupies the "knee" of the trade-off curve, having much lower decoding complexity at similar BD-rate (–18.68% vs. BPG) compared to symmetric alternatives.
  • Multi-task Federated Learning (M-Fed): Clients implement a shared encoder and task-specific decoder; the server aggregates decoders per task and encoders globally. Asymmetry lies in the separation of generalizable feature extraction (encoder, synchronized globally) from task-specific prediction (decoder, synchronized intra-task) (Zhou et al., 14 Apr 2025). This permits cross-task knowledge sharing even with heterogeneous tasks and model structures. On dense-prediction tasks (e.g., Pascal-Context, MS-COCO), M-Fed outperforms both local-only and classic FL baselines, yielding consistent multi-task gains (e.g., ++16% over local-only for ResNet-50 on COCO, ++12.4% over local-only on Pascal).

5. Asymmetry in Channel Coding and Sequence Modeling

  • Channel coding with decoder side information: Muramatsu & Miyake split channel coding into an encoder that only enforces random codeword generation with prescribed hash constraints, and a decoder that leverages both side information (syndrome) and noisy channel output (Muramatsu et al., 2016). This strictly asymmetric access enables the transmitter to solve simpler constraint satisfaction, while the receiver performs sophisticated Slepian–Wolf decoding, matching channel capacity with belief-propagation or MCMC decoding for sparse matrices.
  • Asymmetric encoder–decoder in sequence modeling: The Regularized Encoder–Decoder (RED) architecture and derivatives such as PALM ("Partial-Attention LLM") analyze the effect of architectural asymmetry in language modeling for sequence-to-sequence tasks (Fu et al., 2023). A decoder-only LLM emulated within an encoder–decoder form reveals that unidirectional cross-attention leads to "attention degeneration," with target-side representations focusing less on source tokens as the sequence grows. Augmenting with partial-attention layers that only attend to source representations restores stable sensitivity and mitigates hallucination and early-stop effects.

6. Asymmetry in Modern Speech Separation and Task-Specific Models

Advanced speech separation architectures, such as SepRe ("Separate and Reconstruct"), employ not only architectural but also operational asymmetry (Shin et al., 2024):

  • The encoder processes the input mixture to a latent space, which is then split into JJ channels (one per speaker) early in the pipeline.
  • The decoder applies the same weight-shared (Siamese) reconstruction module to each channel, augmented by cross-speaker transformers that allow residual interaction among speakers at each time frame.
  • Besides, computational modules such as Global and Local Transformer blocks operate in the decoder, forsaking prior dual-path chunking for direct, efficient long-sequence modeling.
  • The separation objective (SI-SNR with PIT) is enforced at intermediate and final layers, biasing the entire asymmetric structure toward early source discriminability and parameter efficiency.

This design obviates the need for large, symmetric, late-split decoders and achieves state-of-the-art SI-SNR with reduced model size and computational cost, by exploiting early discrimination and modular, weight-shared decoding.

7. Design Rationales, Trade-offs, and Performance Profiles

A summary table contrasting key asymmetric encoder–decoder realizations:

Domain Encoder Decoder Performance/Trade-off
Lossless compression (Yamamoto et al., 16 Jan 2026) State recursion, backward-order Prefix tree, forward-order O(1/N)O(1/N) redundancy, flexible code
Semantic segmentation (Chen et al., 2018, Wang et al., 2019) Deep CNN/ASPP (most params, FLOPs) Shallow, boundary-focused (few params/FLOPs) >>70 mIoU, real-time, efficient
Image compression (Wang et al., 2024) Transformer-CNN hybrid, hyperprior Channel/pruned transformer, simplified prior VVC-level BD-rate, 51.47 GMACs decode
Speech separation (Shin et al., 2024) Early speaker split, multi-stage Siamese, cross-speaker, multi-loss SoTA SI-SNR, eliminates chunking
FL, multi-task (Zhou et al., 14 Apr 2025) Shared across tasks Task-spec., local/global >>12% mIoU gain vs. local, flexible

The central trade-offs include shifting complexity to devices with greater computational affordance, precision control over context recovery or discrimination, and improved deployability for diverse environments (e.g., edge/mobile, federated, cross-domain).

References

  • "Asymmetric Encoding-Decoding Schemes for Lossless Data Compression" (Yamamoto et al., 16 Jan 2026)
  • "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" (Chen et al., 2018)
  • "LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation" (Wang et al., 2019)
  • "AsymLLIC: Asymmetric Lightweight Learned Image Compression" (Wang et al., 2024)
  • "Multi-task Federated Learning with Encoder-Decoder Structure: Enabling Collaborative Learning Across Different Tasks" (Zhou et al., 14 Apr 2025)
  • "Construction of a Channel Code from an Arbitrary Source Code with Decoder Side Information" (Muramatsu et al., 2016)
  • "Decoder-Only or Encoder-Decoder? Interpreting LLM as a Regularized Encoder-Decoder" (Fu et al., 2023)
  • "Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation" (Shin et al., 2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Asymmetric Encoder–Decoder Structure.