Papers
Topics
Authors
Recent
Search
2000 character limit reached

TaxoNet: Embedding-Based Plant Taxonomy

Updated 29 December 2025
  • TaxoNet is a domain-specialized embedding-based encoder for plant taxonomy that addresses extreme class imbalance, fine-grained morphological differences, and open-set recognition challenges.
  • It employs a dual-margin penalization loss with norm-guided and power-based regularization to enhance learning signals from both rare and common taxa.
  • Built on a ResNet–101 backbone and validated on diverse ecological datasets, TaxoNet demonstrates superior macro recall and robust performance under realistic, open-world conditions.

TaxoNet is a domain-specialized embedding-based encoder designed for plant-level ecological taxonomy under realistic, open-world conditions. It directly addresses the intertwined issues of extreme class imbalance (long-tailed taxonomic distribution), fine-grained morphological distinction, test-time spatiotemporal domain shift, and the requirement for open-set recognition in ecological monitoring. TaxoNet employs a dual-margin penalization loss to enhance learning signals from rare, underrepresented taxa while alleviating overrepresentation bias, facilitating scalable, robust taxonomic classification using high-resolution visual data from ecological and citizen-science sources (Low et al., 22 Dec 2025).

1. Architectural Design and Embedding Strategy

TaxoNet is instantiated as a purely vision-based model, featuring a ResNet–101 encoder backbone pretrained on ImageNet. Each input photograph xRH×W×3x \in \mathbb{R}^{H \times W \times 3} (with H=614H = 614, W=512W = 512) is transformed into a dd-dimensional feature vector x^=φ(x)Rd\hat{x} = \varphi(x) \in \mathbb{R}^d, where d=2, ⁣048d = 2,\!048 denotes the embedding dimensionality. A set of learnable prototype weight vectors {wjRd}j=1c\{w_j \in \mathbb{R}^d\}_{j=1}^c serves as class representatives for the cc taxa. Both embeddings and prototypes are 2\ell_2-normalized prior to cosine similarity logit computation: zj=w^jx^,w^j=x^=1.z_j = \hat{w}_j^\top \hat{x},\quad \|\hat{w}_j\| = \|\hat{x}\| = 1. TaxoNet operates solely on high-resolution images without incorporating language modules, multi-head attention, or transformer blocks. The architectural novelty is concentrated in the loss function and a tailored sample selection mechanism.

2. Dual-Margin Penalization Loss

The learning objective extends additive-margin softmax (AM-Softmax) by differentiating between within-class (ground-truth) and between-class (non-target) margin penalization. Given a mini-batch of NN samples, true label yy, fixed scale s>0s > 0 (s=32s = 32 or $64$), and per-class margin mjm_j, the loss is

LTaxoNet=1Ni=1Nlogexp(s(zi,yimyi))exp(s(zi,yimyi))+kyiexp(s(zi,kmk)).\mathcal{L}_{\mathrm{TaxoNet}} = -\frac{1}{N}\sum_{i=1}^N \log \frac{\exp(s(z_{i, y_i} - m_{y_i}))}{\exp(s(z_{i, y_i} - m_{y_i})) + \sum_{k \neq y_i}\exp(s(z_{i, k} - m_k))}.

Margins are defined as follows:

  • Within-class (for yiy_i): myi=m+Δyim_{y_i} = m + \Delta_{y_i}
  • Between-class (for kyik \neq y_i): mk=Δkm_k = \Delta_k

Here, the base margin m>0m > 0 (e.g., $0.15$) imposes class compactness and separation. The class-relative margin

Δj=α(logρj+ε)m\Delta_j = \alpha(-\log \rho_j + \varepsilon)m

is a function of empirical class prior ρj=Nj/kNk\rho_j = N_j / \sum_k N_k, with hyperparameter α[0,1]\alpha \in [0, 1] and ε1\varepsilon \ll 1. Class priors ρj\rho_j are further smoothed into ρ~j\tilde{\rho}_j via class-balanced weighting (Low et al., 22 Dec 2025). To introduce non-linearity, a power-based scaling is applied: Δ~j=m(Δjm)ζ(γ),ζ(γ)=log(1+eγ)>1,\widetilde{\Delta}_j = -m\left(\frac{|\Delta_j|}{m}\right)^{\zeta(\gamma)},\quad \zeta(\gamma) = \log(1 + e^\gamma) > 1, where γ\gamma is learnable. The final objective is

L=LTaxoNet+λLreg,Lreg=j=1c(ΔjΔ~j)2,\mathcal{L} = \mathcal{L}_{\mathrm{TaxoNet}} + \lambda \mathcal{L}_{\mathrm{reg}},\quad \mathcal{L}_{\mathrm{reg}} = \sum_{j=1}^c(\Delta_j - \widetilde{\Delta}_j)^2,

with λ\lambda as regularization weight.

Larger within-class margins for rare taxa amplify tail-class gradients, promoting tight prototype alignment. Between-class margins for head classes are tempered, restricting their dominance. The gradient norm from head-class samples on tail prototypes is provably bounded by exp(mheadmtail)\exp(m_\mathrm{head} - m_\mathrm{tail}), enforcing stability under severe imbalance.

3. Handling Long-Tailed, Fine-Grained, and Open-World Scenarios

TaxoNet’s dual-margin paradigm is explicitly constructed to address:

  • Long-tailed distributions: Margin scaling by inverse class frequency corrects head-class dominance. 2\ell_2 normalization further curtails logit variance among overrepresented taxa.
  • Fine-grained taxonomy: Explicit cosine-margin separation enhances discrimination of visually similar, taxonomically distinct taxa (e.g., Acer rubrum versus Acer saccharum), yielding compact, distinct clusters.
  • Open-set recognition: While trained in a cc-way closed-set regime, the widened inter-prototype gaps enable effective thresholding for “unknown” taxa, attaining true-negative rates exceeding 90% at a 95% TPR.
  • Domain shift: Norm-guided sampling preferentially oversamples “hard” examples—those with low embedding norm—enhancing generalization to spatiotemporal shifts such as transferring models across distinct regions (e.g., AA-Central \rightarrow AA-West/East).

4. Ecological Datasets, Preprocessing, and Benchmarking

TaxoNet is systematically evaluated on three plant-centered datasets:

Dataset Scope & Taxa Train/Val/Test Sizes Imbalance
Google Auto-Arborist N. America, Family/Genus 23K / 7.8K / 2.6K \sim1000:1 (genus)
iNat-Plantae iNaturalist Plantae, 682 species 155K / 1.4K / 2K 26.5:1
NAFlora-Mini N. Am. herbarium, 1,863 species 45K / 1.7K / 9.3K 10:1

Preprocessing includes resizing inputs to 614×512614 \times 512, applying random horizontal flips, AugMix augmentation, optimizing with AdamW, 30 training epochs, and linear learning rate annealing. Domain-shift is explicitly assessed via cross-region transfer for Auto-Arborist.

5. Quantitative Results and Ablation Analyses

TaxoNet demonstrates superior macro recall, especially on rare taxa, and maintains high rank-1 accuracy:

Dataset/Setting Baseline (CE) Macro Recall LDAM Macro Recall TaxoNet Macro Recall (R@1)
AA-Central/West/East 63.6/59.2/56.2 67.9/62.8/62.2 72.9/67.7/64.9
iNat-Plantae 89.9 (81.6%) 91.5 (83.2%)
NAFlora-Mini 90.0 (91.3%) 90.4 (91.5%)

Under domain-shift (AA-Central\rightarrowWest/East), TaxoNet achieves recalls of 48.1%/40.5% (vs LDAM’s 46.5%/39.9%). For open-set detection at 95% TPR, TaxoNet achieves a TNR of ~91% compared to LDAM’s ~89%.

Ablation studies reveal:

  • Base margin only (no class-relative margins): recall = 63.9%
  • Dual-margin, no oversampling: 67.6%
    • Random oversampling: 67.6%
    • Norm-guided oversampling: 69.5%
    • Power-based regularization (full model): 72.9%

Norm-guided selection improves recall by approximately 2% over random oversampling; power-based margin regularization contributes an additional ~3%.

6. Limitations and Prospective Directions

Failures commonly occur with incomplete morphological cues (e.g., “leaf-only” or “flower-only” images) or near-indistinguishable taxa (e.g., Opuntia polyacantha vs. O. cespitosa). TaxoNet’s training regime remains closed-set; dynamic open-world operation would necessitate on-the-fly novel-class discovery or integration with unsupervised clustering. Future research directions include the fusion of margin-based visual embeddings with multimodal (e.g., vision–language) reasoning to incorporate ecological context, the use of multi-view series (leaf, bark, fruit) with attention mechanisms, and the deployment of advanced adaptation strategies (meta-learning, adversarial invariance) for increased robustness in biodiversity monitoring.

7. Broader Significance and Outlook

TaxoNet establishes a new standard for fine-grained, long-tailed, and open-set plant classification, blending a conventional ResNet architecture with a dual-margin loss objective and a norm-guided sampling scheme. Its approach directly addresses the ecological need for scalable, reliable species monitoring under substantial real-world complexities. While demonstrating consistent improvements over leading baselines, TaxoNet also reveals the current limitations of general-purpose multimodal foundation models in specialized plant-domain applications. Future integration with multimodal and dynamic open-world classifiers is suggested as a route toward more holistic, adaptive ecological monitoring frameworks (Low et al., 22 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TaxoNet.