Papers
Topics
Authors
Recent
Search
2000 character limit reached

EKDC-Net: Expert-Guided Calibration for Fine Classification

Updated 30 January 2026
  • The paper introduces a dual-module approach that integrates CAM-based local knowledge extraction with uncertainty-guided decision calibration.
  • EKDC-Net is a modular architecture that fuses data-driven and expert representations, significantly enhancing tree species identification accuracy.
  • The system achieves state-of-the-art performance with a Top-1 accuracy gain of +6.42% and robust improvements on long-tail and few-shot benchmark datasets.

The Expert Knowledge-Guided Classification Decision Calibration Network (EKDC-Net) is a modular architecture for fine-grained visual classification in domains exhibiting pronounced long-tail distributions and high inter-class similarity, with primary application to tree species identification. EKDC-Net introduces an external domain expert into the learning workflow, coupling data-driven representations from standard image classification backbones with knowledge-centric calibration, via a dual-module framework for knowledge extraction and uncertainty-aware fusion. The system demonstrates significant improvements over backbone-only and conventional fusion approaches, establishing state-of-the-art results on the large-scale CU-Tree102 dataset as well as auxiliary challenging benchmarks (Long et al., 23 Jan 2026).

1. Network Architecture and Workflow

EKDC-Net is designed as a lightweight "plug-and-play" add-on to conventional image classifiers (ResNet-50, Swin, ViT, etc.). Its processing pipeline comprises three principal stages:

1. Backbone Feature & Logit Extraction:

The input image IRH×W×3\mathcal{I}\in\mathbb{R}^{H\times W\times 3} is processed by a backbone to extract multi-scale feature maps Fb={fb1,fb2,fb3,fb4}F_b=\{\,f_b^1,f_b^2,f_b^3,f_b^4\}, where fblRCl×Hl×Wlf_b^l\in\mathbb{R}^{C_l\times H_l\times W_l}. Initial class logits zbRKz_b\in\mathbb{R}^K are also generated, with KK the number of species.

  1. Local Prior–Guided Knowledge Extraction Module (LPKEM): Utilizes the backbone’s feature maps and logits (plus the original image) to compute Channel Activation Maps (CAMs) that spatially highlight discriminative regions, filtering out background. CAM-derived binary masks are applied to a frozen vision transformer expert (BioCLIP2), which processes only the foreground token sequences to output expert-level feature representations FeF_e. These are aggregated into expert logits zeRKz_e\in\mathbb{R}^K via an MLP.
  2. Uncertainty-Guided Decision Calibration Module (UDCM): Integrates information from both the backbone and expert by quantifying class-level and instance-level uncertainties for each. These are concatenated and projected to generate a bin-based distribution over calibration weights, yielding a soft blending coefficient λ[0,2]\lambda\in[0,2], facilitating adaptive logit fusion: z^=zb+λze\hat z = z_b + \lambda\,z_e.

2. Local Prior–Guided Knowledge Extraction Module (LPKEM)

LPKEM operationalizes expert knowledge extraction and grounding via three sub-routines:

  • CAM-Based Local Prior:

Pseudo-label selection is performed via c^=argmax(zb)\hat c = \arg\max(z_b). For every scale ll and channel kk,

αkl=GAP(yc^fbl,k)\alpha_k^l = \mathrm{GAP}\left(\frac{\partial y^{\hat c}}{\partial f_b^{l,k}}\right)

and

hl(x,y)=ReLU(k=1Clαklfbl,k(x,y))h_l(x,y) = \mathrm{ReLU}\left(\sum_{k=1}^{C_l} \alpha_k^l f_b^{l,k}(x,y)\right)

producing scale-specific activation maps hlh_l.

  • Binary Mask Generation:

Each hlh_l is upsampled to the foundation expert’s token grid and binarized via median thresholding:

ml(i,j)={1,h~l(i,j)Median(h~l) 0,otherwisem_l^{(i,j)}= \begin{cases} 1, & \tilde h_l^{(i,j)}\ge\mathrm{Median}(\tilde h_l) \ 0, & \text{otherwise} \end{cases}

  • Expert Feature Extraction:

Masked token sequences are fed into the frozen BioCLIP2 expert to obtain felf_e^l, while the unmasked full image yields fe0f_e^0. Aggregation is performed as:

ze=MLP([fe0,fe1,,fe4])z_e = \mathrm{MLP}\left([f_e^0, f_e^1, \dots, f_e^4]\right)

This module explicitly localizes expert attention, suppressing background distractions and enhancing key discriminative features.

3. Uncertainty-Guided Decision Calibration Module (UDCM)

UDCM addresses the fusion of backbone and expert output via a dual-uncertainty mechanism:

  • Class-Level Uncertainty:

Encoded in a learnable embedding WclsRKW_{\mathrm{cls}}\in\mathbb{R}^K initialized to represent class difficulty, using:

μ~c=1NcNminNmaxNminwc=(μ~c)β\tilde\mu_c = 1 - \frac{N_c - N_{\min}}{N_{\max}-N_{\min}} \quad w_c = (\tilde\mu_c)^\beta

Top-3 class logits’ difficulty weights are extracted for both agents.

  • Instance-Level Uncertainty:

For softmax probabilities p=Softmax(z)p_*=\mathrm{Softmax}(z_*),

ui=[c=1Kp,clogp,c,  maxcp,c]u_*^i = \left[-\sum_{c=1}^K p_{*,c}\log p_{*,c},\;\max_c p_{*,c}\right]

  • Calibration Coefficient and Fusion:

Concatenate class and instance uncertainty: u=[uc,ui]R5u_*=[u_*^c,\,u_*^i]\in\mathbb{R}^5 (for backbone and expert). These are passed through MLPs and a bin-classifier to yield soft calibration weights:

λ=i=1Nsibi(si=Softmax()i)\lambda = \sum_{i=1}^N s_i b_i\quad (s_i = \mathrm{Softmax}(\cdots)_i)

Logit fusion then proceeds by z^=zb+λze\hat z = z_b + \lambda\,z_e.

This adaptive scheme dynamically allocates trust between the backbone and expert based on both prior class difficulty and sample-level prediction entropy.

4. Training Strategy and Optimization

EKDC-Net is optimized end-to-end except for the frozen expert, using the cross-entropy objective:

Lb=CE(zb,y) Le=CE(ze,y) Ltotal=Lb+Le+stopgrad(λ)CE(z^,y)\begin{aligned} \mathcal{L}_b &= \mathrm{CE}(z_b,y) \ \mathcal{L}_e &= \mathrm{CE}(z_e,y) \ \mathcal{L}_{\mathrm{total}} &= \mathcal{L}_b + \mathcal{L}_e + \mathrm{stopgrad}(\lambda)\,\mathrm{CE}(\hat z,y) \end{aligned}

Stop-gradient is applied to λ\lambda to prevent trivial collapse. All trainable components (mask projections, MLPs, uncertainty predictors) are updated jointly using SGD with learning rate 0.0005 over 100 epochs. Bin count is set to N=8N=8 and smoothing parameter β=2.0\beta=2.0.

5. Dataset Design: CU-Tree102 and Evaluation Protocols

The CU-Tree102 dataset comprises 9,134 expert-curated images spanning K=102K=102 tree species, split into train/val/test sets (80\%/10\%/10\%). Samples are drawn from real outdoor sources, ensuring coverage and challenging ambiguous cases. CU-Tree102 features pronounced class imbalance (the largest class: 286 samples; smallest: 11), reflecting real-world frequency. Two auxiliary datasets evaluate generalizability and long-tail robustness: RSTree (8,324 samples, 23 classes, severe tail) and Jekyll (4,804 samples, 23 classes).

6. Experimental Results and Comparative Analysis

  • Performance Gains:

EKDC-Net consistently improves backbone-only and FGVC-specific methods, yielding a Top-1 accuracy gain of +6.42% and macro-F1 improvement of +11.93% over baselines. Greatest improvements are observed in tail classes and few-shot regimes.

  • Ablation and Fusion Paradigms:

Adding expert features without mask or calibration provides moderate gains; adding LPKEM alone with naïve fusion provides minor additional benefit. Full LPKEM+UDCM realizes best performance (e.g. CGL backbone: 81.47% \rightarrow 86.65% accuracy). Standard feature/logit-level fusion methods saturate at \lesssim81% accuracy, whereas UDCM outperforms by \sim5%.

  • Robustness on RSTree:

Standard backbones collapse in extreme long-tail (macro-F1 \approx 6.6%), EKDC-Net archives macro-F1 \approx 49.6% and Top-1 accuracy \approx 87.8%. Macro-F1 improves by \gtrsim170% in most imbalanced settings.

  • Generalization (Jekyll):

CU-Tree102-pretrained models drop to \lesssim30% accuracy on Jekyll without fine-tuning. EKDC-Net elevates results to the 39–59% range, averaging relative gain of +30.56%.

  • Parameters and Efficiency:

The system introduces only 0.08M additional parameters.

7. Technical Significance and Applicability

EKDC-Net advances fine-grained classification by tightly integrating region-wise localization via CAMs, leveraging foundation-model priors, and blending outputs using explicit uncertainty modeling. This approach effectively mitigates biases and performance collapse in imbalanced scenarios and visually ambiguous cases. The lightweight, modular nature makes it suitable for deployment across diverse backbones and datasets. A plausible implication is that expert-guided, uncertainty-calibrated fusion can generalize to other domains suffering from data scarcity, long-tail distributions, and confounding similar-class noise.

The CU-Tree102 dataset and reference implementation are publicly available, facilitating further benchmarking and adaptation (Long et al., 23 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Expert Knowledge-Guided Classification Decision Calibration Network (EKDC-Net).