TaxoNet: Embedding-Based Plant Taxonomy

Updated 29 December 2025

TaxoNet is a domain-specialized embedding-based encoder for plant taxonomy that addresses extreme class imbalance, fine-grained morphological differences, and open-set recognition challenges.
It employs a dual-margin penalization loss with norm-guided and power-based regularization to enhance learning signals from both rare and common taxa.
Built on a ResNet–101 backbone and validated on diverse ecological datasets, TaxoNet demonstrates superior macro recall and robust performance under realistic, open-world conditions.

TaxoNet is a domain-specialized embedding-based encoder designed for plant-level ecological taxonomy under realistic, open-world conditions. It directly addresses the intertwined issues of extreme class imbalance (long-tailed taxonomic distribution), fine-grained morphological distinction, test-time spatiotemporal domain shift, and the requirement for open-set recognition in ecological monitoring. TaxoNet employs a dual-margin penalization loss to enhance learning signals from rare, underrepresented taxa while alleviating overrepresentation bias, facilitating scalable, robust taxonomic classification using high-resolution visual data from ecological and citizen-science sources (Low et al., 22 Dec 2025).

1. Architectural Design and Embedding Strategy

TaxoNet is instantiated as a purely vision-based model, featuring a ResNet–101 encoder backbone pretrained on ImageNet. Each input photograph $x \in \mathbb{R}^{H \times W \times 3}$ (with $H = 614$ , $W = 512$ ) is transformed into a $d$ -dimensional feature vector $\hat{x} = \varphi(x) \in \mathbb{R}^d$ , where $d = 2,\!048$ denotes the embedding dimensionality. A set of learnable prototype weight vectors $\{w_j \in \mathbb{R}^d\}_{j=1}^c$ serves as class representatives for the $c$ taxa. Both embeddings and prototypes are $\ell_2$ -normalized prior to cosine similarity logit computation: $z_j = \hat{w}_j^\top \hat{x},\quad \|\hat{w}_j\| = \|\hat{x}\| = 1.$ TaxoNet operates solely on high-resolution images without incorporating language modules, multi-head attention, or transformer blocks. The architectural novelty is concentrated in the loss function and a tailored sample selection mechanism.

2. Dual-Margin Penalization Loss

The learning objective extends additive-margin softmax (AM-Softmax) by differentiating between within-class (ground-truth) and between-class (non-target) margin penalization. Given a mini-batch of $N$ samples, true label $y$ , fixed scale $s > 0$ ( $s = 32$ or $64$), and per-class margin $m_j$ , the loss is

$\mathcal{L}_{\mathrm{TaxoNet}} = -\frac{1}{N}\sum_{i=1}^N \log \frac{\exp(s(z_{i, y_i} - m_{y_i}))}{\exp(s(z_{i, y_i} - m_{y_i})) + \sum_{k \neq y_i}\exp(s(z_{i, k} - m_k))}.$

Margins are defined as follows:

Within-class (for $y_i$ ): $m_{y_i} = m + \Delta_{y_i}$
Between-class (for $k \neq y_i$ ): $m_k = \Delta_k$

Here, the base margin $m > 0$ (e.g., $0.15$) imposes class compactness and separation. The class-relative margin

$\Delta_j = \alpha(-\log \rho_j + \varepsilon)m$

is a function of empirical class prior $\rho_j = N_j / \sum_k N_k$ , with hyperparameter $\alpha \in [0, 1]$ and $\varepsilon \ll 1$ . Class priors $\rho_j$ are further smoothed into $\tilde{\rho}_j$ via class-balanced weighting (Low et al., 22 Dec 2025). To introduce non-linearity, a power-based scaling is applied: $\widetilde{\Delta}_j = -m\left(\frac{|\Delta_j|}{m}\right)^{\zeta(\gamma)},\quad \zeta(\gamma) = \log(1 + e^\gamma) > 1,$ where $\gamma$ is learnable. The final objective is

$\mathcal{L} = \mathcal{L}_{\mathrm{TaxoNet}} + \lambda \mathcal{L}_{\mathrm{reg}},\quad \mathcal{L}_{\mathrm{reg}} = \sum_{j=1}^c(\Delta_j - \widetilde{\Delta}_j)^2,$

with $\lambda$ as regularization weight.

Larger within-class margins for rare taxa amplify tail-class gradients, promoting tight prototype alignment. Between-class margins for head classes are tempered, restricting their dominance. The gradient norm from head-class samples on tail prototypes is provably bounded by $\exp(m_\mathrm{head} - m_\mathrm{tail})$ , enforcing stability under severe imbalance.

3. Handling Long-Tailed, Fine-Grained, and Open-World Scenarios

TaxoNet’s dual-margin paradigm is explicitly constructed to address:

Long-tailed distributions: Margin scaling by inverse class frequency corrects head-class dominance. $\ell_2$ normalization further curtails logit variance among overrepresented taxa.
Fine-grained taxonomy: Explicit cosine-margin separation enhances discrimination of visually similar, taxonomically distinct taxa (e.g., Acer rubrum versus Acer saccharum), yielding compact, distinct clusters.
Open-set recognition: While trained in a $c$ -way closed-set regime, the widened inter-prototype gaps enable effective thresholding for “unknown” taxa, attaining true-negative rates exceeding 90% at a 95% TPR.
Domain shift: Norm-guided sampling preferentially oversamples “hard” examples—those with low embedding norm—enhancing generalization to spatiotemporal shifts such as transferring models across distinct regions (e.g., AA-Central $\rightarrow$ AA-West/East).

4. Ecological Datasets, Preprocessing, and Benchmarking

TaxoNet is systematically evaluated on three plant-centered datasets:

Dataset	Scope & Taxa	Train/Val/Test Sizes	Imbalance
Google Auto-Arborist	N. America, Family/Genus	23K / 7.8K / 2.6K	$\sim$ 1000:1 (genus)
iNat-Plantae	iNaturalist Plantae, 682 species	155K / 1.4K / 2K	26.5:1
NAFlora-Mini	N. Am. herbarium, 1,863 species	45K / 1.7K / 9.3K	10:1

Preprocessing includes resizing inputs to $614 \times 512$ , applying random horizontal flips, AugMix augmentation, optimizing with AdamW, 30 training epochs, and linear learning rate annealing. Domain-shift is explicitly assessed via cross-region transfer for Auto-Arborist.

5. Quantitative Results and Ablation Analyses

TaxoNet demonstrates superior macro recall, especially on rare taxa, and maintains high rank-1 accuracy:

Dataset/Setting	Baseline (CE) Macro Recall	LDAM Macro Recall	TaxoNet Macro Recall (R@1)
AA-Central/West/East	63.6/59.2/56.2	67.9/62.8/62.2	72.9/67.7/64.9
iNat-Plantae	—	89.9 (81.6%)	91.5 (83.2%)
NAFlora-Mini	—	90.0 (91.3%)	90.4 (91.5%)

Under domain-shift (AA-Central $\rightarrow$ West/East), TaxoNet achieves recalls of 48.1%/40.5% (vs LDAM’s 46.5%/39.9%). For open-set detection at 95% TPR, TaxoNet achieves a TNR of ~91% compared to LDAM’s ~89%.

Ablation studies reveal:

Base margin only (no class-relative margins): recall = 63.9%
Dual-margin, no oversampling: 67.6%
- Random oversampling: 67.6%
- Norm-guided oversampling: 69.5%
- Power-based regularization (full model): 72.9%

Norm-guided selection improves recall by approximately 2% over random oversampling; power-based margin regularization contributes an additional ~3%.

6. Limitations and Prospective Directions

Failures commonly occur with incomplete morphological cues (e.g., “leaf-only” or “flower-only” images) or near-indistinguishable taxa (e.g., Opuntia polyacantha vs. O. cespitosa). TaxoNet’s training regime remains closed-set; dynamic open-world operation would necessitate on-the-fly novel-class discovery or integration with unsupervised clustering. Future research directions include the fusion of margin-based visual embeddings with multimodal (e.g., vision–language) reasoning to incorporate ecological context, the use of multi-view series (leaf, bark, fruit) with attention mechanisms, and the deployment of advanced adaptation strategies (meta-learning, adversarial invariance) for increased robustness in biodiversity monitoring.

7. Broader Significance and Outlook

TaxoNet establishes a new standard for fine-grained, long-tailed, and open-set plant classification, blending a conventional ResNet architecture with a dual-margin loss objective and a norm-guided sampling scheme. Its approach directly addresses the ecological need for scalable, reliable species monitoring under substantial real-world complexities. While demonstrating consistent improvements over leading baselines, TaxoNet also reveals the current limitations of general-purpose multimodal foundation models in specialized plant-domain applications. Future integration with multimodal and dynamic open-world classifiers is suggested as a route toward more holistic, adaptive ecological monitoring frameworks (Low et al., 22 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Towards AI-Guided Open-World Ecological Taxonomic Classification (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TaxoNet.