TaxoNet: Embedding-Based Plant Taxonomy
- TaxoNet is a domain-specialized embedding-based encoder for plant taxonomy that addresses extreme class imbalance, fine-grained morphological differences, and open-set recognition challenges.
- It employs a dual-margin penalization loss with norm-guided and power-based regularization to enhance learning signals from both rare and common taxa.
- Built on a ResNet–101 backbone and validated on diverse ecological datasets, TaxoNet demonstrates superior macro recall and robust performance under realistic, open-world conditions.
TaxoNet is a domain-specialized embedding-based encoder designed for plant-level ecological taxonomy under realistic, open-world conditions. It directly addresses the intertwined issues of extreme class imbalance (long-tailed taxonomic distribution), fine-grained morphological distinction, test-time spatiotemporal domain shift, and the requirement for open-set recognition in ecological monitoring. TaxoNet employs a dual-margin penalization loss to enhance learning signals from rare, underrepresented taxa while alleviating overrepresentation bias, facilitating scalable, robust taxonomic classification using high-resolution visual data from ecological and citizen-science sources (Low et al., 22 Dec 2025).
1. Architectural Design and Embedding Strategy
TaxoNet is instantiated as a purely vision-based model, featuring a ResNet–101 encoder backbone pretrained on ImageNet. Each input photograph (with , ) is transformed into a -dimensional feature vector , where denotes the embedding dimensionality. A set of learnable prototype weight vectors serves as class representatives for the taxa. Both embeddings and prototypes are -normalized prior to cosine similarity logit computation: TaxoNet operates solely on high-resolution images without incorporating language modules, multi-head attention, or transformer blocks. The architectural novelty is concentrated in the loss function and a tailored sample selection mechanism.
2. Dual-Margin Penalization Loss
The learning objective extends additive-margin softmax (AM-Softmax) by differentiating between within-class (ground-truth) and between-class (non-target) margin penalization. Given a mini-batch of samples, true label , fixed scale ( or $64$), and per-class margin , the loss is
Margins are defined as follows:
- Within-class (for ):
- Between-class (for ):
Here, the base margin (e.g., $0.15$) imposes class compactness and separation. The class-relative margin
is a function of empirical class prior , with hyperparameter and . Class priors are further smoothed into via class-balanced weighting (Low et al., 22 Dec 2025). To introduce non-linearity, a power-based scaling is applied: where is learnable. The final objective is
with as regularization weight.
Larger within-class margins for rare taxa amplify tail-class gradients, promoting tight prototype alignment. Between-class margins for head classes are tempered, restricting their dominance. The gradient norm from head-class samples on tail prototypes is provably bounded by , enforcing stability under severe imbalance.
3. Handling Long-Tailed, Fine-Grained, and Open-World Scenarios
TaxoNet’s dual-margin paradigm is explicitly constructed to address:
- Long-tailed distributions: Margin scaling by inverse class frequency corrects head-class dominance. normalization further curtails logit variance among overrepresented taxa.
- Fine-grained taxonomy: Explicit cosine-margin separation enhances discrimination of visually similar, taxonomically distinct taxa (e.g., Acer rubrum versus Acer saccharum), yielding compact, distinct clusters.
- Open-set recognition: While trained in a -way closed-set regime, the widened inter-prototype gaps enable effective thresholding for “unknown” taxa, attaining true-negative rates exceeding 90% at a 95% TPR.
- Domain shift: Norm-guided sampling preferentially oversamples “hard” examples—those with low embedding norm—enhancing generalization to spatiotemporal shifts such as transferring models across distinct regions (e.g., AA-Central AA-West/East).
4. Ecological Datasets, Preprocessing, and Benchmarking
TaxoNet is systematically evaluated on three plant-centered datasets:
| Dataset | Scope & Taxa | Train/Val/Test Sizes | Imbalance |
|---|---|---|---|
| Google Auto-Arborist | N. America, Family/Genus | 23K / 7.8K / 2.6K | 1000:1 (genus) |
| iNat-Plantae | iNaturalist Plantae, 682 species | 155K / 1.4K / 2K | 26.5:1 |
| NAFlora-Mini | N. Am. herbarium, 1,863 species | 45K / 1.7K / 9.3K | 10:1 |
Preprocessing includes resizing inputs to , applying random horizontal flips, AugMix augmentation, optimizing with AdamW, 30 training epochs, and linear learning rate annealing. Domain-shift is explicitly assessed via cross-region transfer for Auto-Arborist.
5. Quantitative Results and Ablation Analyses
TaxoNet demonstrates superior macro recall, especially on rare taxa, and maintains high rank-1 accuracy:
| Dataset/Setting | Baseline (CE) Macro Recall | LDAM Macro Recall | TaxoNet Macro Recall (R@1) |
|---|---|---|---|
| AA-Central/West/East | 63.6/59.2/56.2 | 67.9/62.8/62.2 | 72.9/67.7/64.9 |
| iNat-Plantae | — | 89.9 (81.6%) | 91.5 (83.2%) |
| NAFlora-Mini | — | 90.0 (91.3%) | 90.4 (91.5%) |
Under domain-shift (AA-CentralWest/East), TaxoNet achieves recalls of 48.1%/40.5% (vs LDAM’s 46.5%/39.9%). For open-set detection at 95% TPR, TaxoNet achieves a TNR of ~91% compared to LDAM’s ~89%.
Ablation studies reveal:
- Base margin only (no class-relative margins): recall = 63.9%
- Dual-margin, no oversampling: 67.6%
- Random oversampling: 67.6%
- Norm-guided oversampling: 69.5%
- Power-based regularization (full model): 72.9%
Norm-guided selection improves recall by approximately 2% over random oversampling; power-based margin regularization contributes an additional ~3%.
6. Limitations and Prospective Directions
Failures commonly occur with incomplete morphological cues (e.g., “leaf-only” or “flower-only” images) or near-indistinguishable taxa (e.g., Opuntia polyacantha vs. O. cespitosa). TaxoNet’s training regime remains closed-set; dynamic open-world operation would necessitate on-the-fly novel-class discovery or integration with unsupervised clustering. Future research directions include the fusion of margin-based visual embeddings with multimodal (e.g., vision–language) reasoning to incorporate ecological context, the use of multi-view series (leaf, bark, fruit) with attention mechanisms, and the deployment of advanced adaptation strategies (meta-learning, adversarial invariance) for increased robustness in biodiversity monitoring.
7. Broader Significance and Outlook
TaxoNet establishes a new standard for fine-grained, long-tailed, and open-set plant classification, blending a conventional ResNet architecture with a dual-margin loss objective and a norm-guided sampling scheme. Its approach directly addresses the ecological need for scalable, reliable species monitoring under substantial real-world complexities. While demonstrating consistent improvements over leading baselines, TaxoNet also reveals the current limitations of general-purpose multimodal foundation models in specialized plant-domain applications. Future integration with multimodal and dynamic open-world classifiers is suggested as a route toward more holistic, adaptive ecological monitoring frameworks (Low et al., 22 Dec 2025).