Hierarchical Open-Set Classification

Updated 30 January 2026

Hierarchical open-set classification is a method that uses semantic tree structures to output fine or coarse predictions based on model confidence.
It employs techniques like path probability, entropy thresholding, and branch softmax to distinguish between known and unknown classes.
Applications span image recognition, activity detection, object localization, and multimodal learning, enhancing OOD detection and model interpretability.

Hierarchical open-set classification is the task of recognizing inputs that may not correspond to any known leaf class in a structured semantic hierarchy, and returning predictions at the most specific meaningful level (internal node) supported by model confidence. Unlike traditional open-set recognition, which typically uses flat classes and hard “known/unknown” rejection, hierarchical open-set methods leverage taxonomic structures to provide informative and interpretable fall-back predictions when confronted with out-of-distribution (OOD) data. This paradigm applies across domains including image classification, human activity recognition, object detection, and multimodal few-shot learning.

1. Hierarchical Taxonomic Structures

A hierarchical taxonomy is formalized as a rooted, directed acyclic tree $\mathcal{H}=(V,E)$ , with nodes $V$ comprising both leaf classes $\mathcal{Y}$ (known classes) and internal nodes $\mathcal{Z}$ (potential superclasses or fallbacks) (Linderman et al., 2022). Each node $n$ may be specified by its parent $\mathsf{par}(n)$ , children $\mathsf{ch}(n)$ , ancestors $\mathsf{anc}(n)$ , and leaves/no-leaves status. Training labels are provided at the leaves, and hierarchical relations encode semantic similarity, granularity, or domain structure.

Hierarchies may be manually specified (e.g., WordNet for ImageNet (Linderman et al., 2022)), extracted from ontologies (Wu et al., 2023), or constructed automatically in embedding space using agglomerative clustering, where prototypes $\mu_c$ represent classes and are iteratively merged based on inter-cluster Euclidean distance (Hannum et al., 2024). In special cases, non-binary or high-overlap hierarchies (such as the superclass/fine-grained structures in object detection) may be embedded in hyperbolic space for geometric consistency (Doan et al., 2023).

2. Core Model Architectures and Inference Mechanisms

Hierarchical open-set classifiers typically use a shared feature extractor (e.g., ResNet-50 or transformer backbone), augmented with classifier heads at internal nodes to model conditional branch selection (Linderman et al., 2022). For sample $x$ , a path probability is computed for each leaf $k\in\mathcal{Y}$ : $\Pr(y=k\,|\,x) = \prod_{a\in\mathsf{anc}(k)\setminus\{0\}} p_{\mathsf{par}(a),\,a}(x)$ where each $p_{n,j}(x)$ is the softmax probability for child $j$ at node $n$ .

Inference proceeds by selecting the most probable leaf label (argmax), but, crucially, introduces mechanisms to halt prediction at coarser hierarchy levels when model uncertainty is high or OOD risk is detected. Mean path entropy, minimum path entropy, or path-max probability scores along the selected prediction path serve as OOD indicators, and thresholding these scores determines whether to output a leaf or an internal node (Linderman et al., 2022, McCarthy et al., 8 Oct 2025).

Traversal-based classifiers use top-down binary (or multiway) split decisions at each node, with outlier detectors that can halt inference and assign a sample to the current internal node if the outlier score falls below a calibrated threshold (Hannum et al., 2024). This allows for interpretable coarse predictions even when the fine class is entirely unseen at training.

3. Training Objectives and OOD Regularization

Supervised hierarchical open-set models combine two key losses (Linderman et al., 2022, McCarthy et al., 8 Oct 2025):

Hierarchical softmax cross-entropy: Encourages correct path traversal down to the ground-truth leaf, weighted by node importance.
Outlier regularization: For internal nodes not on the ground-truth path, branch softmax outputs are regularized toward uniform distributions. This maximizes entropy for unrelated nodes, reducing spurious confidence that causes flat misclassification.

Formally, for node $n$ with weight $W_n$ , the supervised and regularizer losses are: $\mathcal{L}_{\mathrm{soft}} = \sum_{n\in\mathcal{Z}} W_n H[\delta_{y_i}, \Pr(\cdot|x_i)]$

$\mathcal{L}_{\mathrm{other}} = \sum_{n\in\mathcal{Z}} H[\mathcal{U}(|\mathsf{ch}(n)|),\,\mathbf{p}_n(x)] \quad \text{for } (x,y)\in\mathcal{D}_{\neg n}$

where $\mathcal{U}(m)$ is the uniform distribution over $m$ classes.

The total loss is a weighted sum: $\mathcal{L} = \alpha\,\mathcal{L}_{\mathrm{soft}} + \beta\,\mathcal{L}_{\mathrm{other}}$ , with $\alpha,\beta$ calibrated on validation data. For activity recognition, Hi-OSCAR builds a similar composite loss with inverse frequency weighting and KL-divergence regularization (McCarthy et al., 8 Oct 2025).

Semi-supervised frameworks such as SemiHOC extend these ideas by pseudo-labeling unlabeled data, using subtree-level confidence summed over descendants and age-gating to maintain high purity and prevent overconfident deep assignments of OOD samples (Wallin et al., 23 Jan 2026).

4. OOD Detection, Thresholding, and Localization

Hierarchical open-set approaches generalize standard OOD detection metrics (max-softmax, entropy) to paths and nodes in the tree (Linderman et al., 2022, McCarthy et al., 8 Oct 2025). Thresholds are set either globally (on path-level scores) or per-node (using percentiles of entropy from ID validation samples). Inference proceeds as follows:

Compute the most probable leaf $\hat{y}$ and its hierarchy path $\mathsf{anc}(\hat{y})$ .
Traverse from root downward; if OOD score at any node falls below threshold $\tau$ (global or $\tau_n$ per node), halt and output the parent as coarse prediction.
If no threshold is violated, output full leaf prediction.

Performance metrics include ID accuracy, AUROC for OOD detection at various hierarchy levels, mean hierarchical distance to ground truth, and balanced mean hierarchical distance (BMHD) (Wallin et al., 23 Jan 2026). Unknown sample localization is quantitatively assessed by measuring closeness between predicted internal node and true leaf location in the hierarchy.

Multi-level thresholding offers tunable trade-offs: higher thresholds favor conservative (coarser) predictions, lowering false positives for known classes; lower thresholds enable finer predictions at risk of misclassification.

5. Extensions: Data-driven Hierarchies, Hyperbolic Embeddings, and Multimodal Prompt Tuning

Automated hierarchy induction via constrained agglomerative clustering creates binary trees based on embedding proximity, discarding the need for hand-crafted taxonomies (Hannum et al., 2024). Score-based assignment and traversal-based classification leverage learned splitters and outlier detectors for each internal node.

In open-set object detection, hyperbolic embeddings (Poincaré ball with curvature parameter) encode both fine-grained and coarse (superclass) structure with minimal distortion (Doan et al., 2023). A SuperClass regularizer pulls classes within the same category close, and adaptive relabeling via hyperbolic distance permits unknown proposals to be assigned to the nearest semantic category, improving recall and localization.

Prompt tuning for taxonomic open-set (TOS) is addressed by ProTeCt, which introduces node-centric and dynamic treecut losses for hierarchical consistency in CLIP-based visual-LLMs (Wu et al., 2023). Metrics such as Hierarchical Consistent Accuracy (HCA) and Mean Treecut Accuracy (MTA) quantify alignment between leaf and internal predictions; empirical findings show dramatic gains in hierarchical consistency without significant loss in leaf accuracy.

Approach	Hierarchy Source	OOD Mechanism
(Linderman et al., 2022)	Manual (WordNet)	Path entropy, multi-loss, threshold stopping
(Hannum et al., 2024)	Data-driven (clustering)	Score-based, traversal outlier detection
(Doan et al., 2023)	Manual + Hyperbolic embedding	SuperClass regularizer, geometric thresholding
(Wallin et al., 23 Jan 2026)	Manual	Semi-supervised subtree pseudo-labeling, age-gating
(Wu et al., 2023)	Manual (ontology)	Prompt tuning, HCA/MTA metrics

6. Empirical Results, Impact, and Limitations

Across image, activity, and detection domains, hierarchical open-set algorithms outperform flat reject methods by giving informative, semantically relevant fall-back predictions for OOD samples (Linderman et al., 2022, McCarthy et al., 8 Oct 2025). Hierarchical classifiers match or exceed baseline AUROC and mean hierarchical distance for OOD, and in semi-supervised regimes (SemiHOC), achieve state-of-the-art BMHD with much reduced ID labeling. Hyperbolic regularization and prompt tuning approaches further improve both recall and consistency, notably in object detection and multimodal domains (Doan et al., 2023, Wu et al., 2023).

Limitations include reliance on meaningful, well-specified hierarchies, performance degradation if hierarchy fails to reflect visual similarity, need for OOD class overlap between unlabeled and test-time data (Wallin et al., 23 Jan 2026), and scaling challenges for large trees (Hannum et al., 2024). The Concentration Centrality (CC) metrics for unknown class consistency merit additional empirical validation for deeper taxonomies.

7. Research Directions and Practice Considerations

Open problems concern automated or continual hierarchy construction, separation between uncertain ID and novel OOD in semi-supervised data, improved pseudo-label strategies for OOD assignments, and scalable traversal/outlier modules for large hierarchies (Wallin et al., 23 Jan 2026, Hannum et al., 2024). Richer linkage criteria or joint embedding-hierarchy learning may further advance utility and explainability.

For deployment, practitioners should:

Adopt/review a task-relevant taxonomy (manual ontology, agglomerative clustering, hyperbolic geometry).
Use path-based uncertainty measures with thresholded inference halting for reliability.
Consider semi-supervised pipelines (subtree pseudo-labels, age gating) or multimodal prompt tuning for data-scarce settings.
Employ relevant hierarchical metrics (BMHD, HCA, MTA, CCC) to quantify both ID accuracy and quality of OOD/superclass assignment.

Hierarchical open-set classification offers a principled framework for reliable, interpretable AI in open-world scenarios, where both fine-grained discrimination and structured fallback predictions are essential.