Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quantized DETR (Q-DETR) Overview

Updated 12 January 2026
  • The paper introduces a quantization method for DETR that uses information-theoretic and Fisher-based criteria to preserve accuracy with low-bit representations.
  • It leverages distribution rectification distillation and foreground-aware query matching to align quantized queries with the full-precision teacher model.
  • The approach employs Fisher-aware mixed precision and a two-stage training process with QAT and Fisher-trace regularization, significantly reducing model size and computation.

Quantized DETR (Q-DETR) denotes a family of methods that enable efficient, low-bit quantization of the Detection Transformer (DETR) architecture and its variants, targeting significant reductions in computation and memory requirements while minimizing performance degradation. Core challenges center on information loss in quantized attention and regression modules, as well as disparate quantization sensitivities across category labels. Modern Q-DETR approaches employ information-theoretic and Fisher information-based criteria to adaptively allocate precision and stabilize key representations, thereby achieving state-of-the-art accuracy under tight quantization constraints (Xu et al., 2023, Yang et al., 2024).

1. Quantization Challenges in DETR-based Architectures

The DETR framework introduces end-to-end object detection using transformers, with a sequence of object queries interacting via cross-attention with image-encoded features. Standard quantization-aware training (QAT) of DETR typically yields substantial accuracy drops, especially at ≤4-bit precision. This degradation is most severe in the decoder's attention layers due to their pivotal role in maintaining the fidelity of object query distributions. Empirically, quantized queries exhibit distorted distributions—misaligning attention maps and leading to substantial mean average precision (mAP) reductions. The impact is highly non-uniform, with certain object categories (e.g., persons, animals) demonstrating heightened sensitivity to quantization due to sharper local loss landscapes and increased risk of overfitting post-QAT (Xu et al., 2023, Yang et al., 2024).

2. Information Bottleneck Distillation and Distribution Rectification

A central innovation is the distribution rectification distillation (DRD) scheme grounded in the Information Bottleneck (IB) principle. Treating the real-valued DETR as a teacher and the quantized model as a student, the distillation objective is formalized as:

minθSI(X;ES)+βI(ES,qS;yGT)+γI(qS;qT)\min_{\theta^S} \quad I(X; E^S) + \beta\, I(E^S, q^S ; y^{GT}) + \gamma\, I(q^S ; q^T)

where I(;)I(\cdot;\cdot) denotes mutual information, XX is the input image, ESE^S the student encoder output, qSq^S and qTq^T the student/teacher queries, and yGTy^{GT} the ground-truth detection set. The critical query alignment term is recast as a bi-level problem. At the inner level, the query distribution self-entropy H(qS)H(q^S) is maximized via Gaussian normalization:

qS=(qSμ(qS))/σ(qS)2+ϵ×γqS+βqSq^{S*} = (q^S - \mu(q^S)) / \sqrt{\sigma(q^S)^2 + \epsilon} \times \gamma_q^S + \beta_q^S

with γqS\gamma_q^S, βqS\beta_q^S trainable and ϵ=105\epsilon = 10^{-5}. At the upper level, a foreground-aware query matching (FQM) mechanism filters and aligns student and teacher queries based on high generalized IoU (GIoU) overlap with ground-truth boxes. The conditional entropy minimization is operationalized via 2\ell_2 matching of co-attended features:

LDRD=Ej[DjSDπ(j)T2]L_{DRD} = \mathbb{E}_j \left[ \| D_j^{S*} - D_{\pi(j)}^T \|_2 \right]

The final training objective is the sum of the standard DETR loss and the DRD penalty (Xu et al., 2023).

3. Fisher-aware Mixed Precision and Critical-category Objectives

An alternative paradigm formulates quantization as a sensitivity assignment problem leveraging Fisher information. The Hessian of the loss with respect to network parameters is approximated by the diagonal Fisher information matrix. Per-layer sensitivity is quantified via the trace entries IiiI_{ii}, accumulated as squared gradient norms from a calibration set. A mixed-precision allocation is then determined by solving the integer linear program:

minQ1Liq(wi)wi22Iiis.t.iQiwi0B\min_{Q_{1\dots L}} \sum_i \| q(w_i) - w_i \|_2^2 I_{ii} \quad \text{s.t.} \quad \sum_i Q_i \cdot \| w_i \|_0 \leq B

where QiQ_i is the per-layer bit-width, wiw_i the float weights, q(wi)q(w_i) the quantized weights, and BB the total bit budget. To address fine-grained application needs, a composite sensitivity metric

Lα(w)=αLA(w)+(1α)LF(w)\mathcal{L}_\alpha(w) = \alpha\, \mathcal{L}_A(w) + (1-\alpha)\, \mathcal{L}_F(w)

weights the standard DETR loss LA\mathcal{L}_A and a critical-category collapse loss LF\mathcal{L}_F, where all non-critical class logits are merged to ensure the optimization protects task-relevant categories (Yang et al., 2024).

4. Training Regimes: Quantization-aware Training and Fisher-trace Regularization

A two-stage training protocol is employed. Initially, per-layer precisions are allocated using the Fisher-aware scheme (either on overall or critical loss). Subsequently, quantization-aware training (QAT) fine-tunes the entire network, applying the quantizer in each forward pass and propagating gradients using a straight-through estimator. To further suppress overfitting and enhance generalization on critical classes, Fisher-trace regularization is introduced during QAT:

minwLA(q(w))+λTr[Fisher(LF(q(w)))]\min_w \mathcal{L}_A(q(w)) + \lambda\, \text{Tr}[ \text{Fisher}(\mathcal{L}_F(q(w))) ]

Here, the trace is computed diagonally, and λ\lambda is annealed from low to high over 50 epochs (e.g., from 1×1031 \times 10^{-3} to 5×1035 \times 10^{-3}), first permitting loss convergence then penalizing sharp/minima by flattening the Fisher landscape. Empirical results demonstrate that this schedule curbs late-stage overfitting without sacrificing mean mAP (Yang et al., 2024).

5. Implementation and Engineering Considerations

Implementation employs symmetric per-layer uniform quantization, with each linear and attention module quantized as a group, and biases/LayerNorm left in floating-point. In Q-DETR (as specified in (Xu et al., 2023)), the LSQ+^+ quantizer with channel-wise scaling is favored, and attention activations are kept at 8 bits for stability. Both DETR and its improved variants (Deformable DETR, DAB-DETR, SMCA-DETR) are supported. Training schedules typically involve hundreds of epochs, with final Q-DETR models demonstrating model sizes 4–16× smaller and computational costs 6–16× lower than their full-precision counterparts. Only weights are mixed-precision quantized; activations can optionally be group-quantized but are not focus of the core methodology. No clamping is performed in weight scaling; quantizer scale is always the per-layer maximum absolute value.

6. Empirical Performance and Ablations

Consistent empirical findings across benchmarks (COCO, VOC) show competitive or superior accuracy at extreme quantization. For instance, with DETR-R50 on COCO, the 4-4-8 Q-DETR achieves 39.4% AP (versus 42.0% for real-valued) with a 6.6×6.6 \times speedup and 8×8 \times smaller model (Xu et al., 2023). Fisher-aware critical-category mixed-precision improves person mAP in Deformable DETR on COCO Panoptic from 28.9% (uniform 4-bit) to 43.1%. QAT combined with Fisher-trace regularization further raises person mAP from 35.56% to 35.75%, while maintaining overall mAP at ≈37%. Ablation studies reveal that the combination of distribution alignment and foreground-aware query matching is essential for best accuracy, and that a carefully annealed Fisher-trace penalty is required to prevent overfitting on critical classes (Yang et al., 2024).

Model Quantization Model Size (MB) Ops (G) AP Speedup
DETR-R50 (FP32) 32-32-32 159.3 85.5 42.0 1.0×
Q-DETR (4-4-8) 4-4-8 19.9 13.0 39.4 6.6×
Q-DETR (3-3-8) 3-3-8 15.0 7.6 11.2×

In the hardest COCO Panoptic classes, Fisher-aware mixed-precision quantization and Fisher-trace QAT raise critical-class mAP by +10–14% absolute over uniform-precision quantization (Yang et al., 2024).

7. Best Practices and Research Significance

For practitioners, inclusion of a critical-class loss in sensitivity metrics is emphasized for tasks with label imbalance or application-specific objectives. Diagonal Fisher-based sensitivity estimation is advocated for scalability to large transformer models. A two-stage pipeline—first solving the Fisher ILP for bit-width selection, then QAT with Fisher-trace regularization—is found to yield >90% floating-point mAP in most scenarios while drastically reducing model size. Quantizing only post-QAT the final classification/regression heads is discouraged; they should remain in float during PTQ and be trained under QAT.

These approaches collectively advance quantized object detection, enabling DETR-based models to operate on resource-constrained devices without severe deterioration of detection accuracy, especially for critical categories (Xu et al., 2023, Yang et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quantized DETR (Q-DETR).