Local Classifier per Parent/Node (LCPN)

Updated 2 February 2026

LCPN is a modular hierarchical classification strategy that trains distinct multi-class classifiers at each internal node of a label tree.
Extensions LCPN+ and LCPN+F enhance robust inference by integrating probabilistic all-paths and global flat classification mechanisms.
Implemented in the HiClass framework, LCPN ensures hierarchy-consistent predictions while leveraging specialized metrics for structured evaluation.

Local Classifier per Parent/Node (LCPN) is a modular hierarchical classification strategy wherein distinct multi-way classifiers are trained at each internal node of a tree-structured label hierarchy. Each classifier discriminates among its direct child classes using only those training examples whose true class resides in the node’s subtree. Extensions such as LCPN+ (probabilistic all-paths) and LCPN+F (combining local and global classification) further improve robustness and performance, especially in structured or time-series domains. This paradigm is foundational for hierarchical classification, prominently implemented in frameworks such as HiClass (Miranda et al., 2021), and empirically validated in automatic hierarchy induction settings (&&&1&&&).

1. Formal Framework and Problem Definition

Let $H = (N, E)$ denote the tree-structured hierarchy, where $N$ is the set of nodes (internal and leaves) and $E \subseteq N \times N$ is the parent-child relation. The leaves $\ell \subseteq N$ represent the atomic classes; each training sample $(x_i, y_i)$ is labeled by a unique leaf $y_i \in \ell$ . The path $Path(y_i) = (v_0=\text{root}, v_1, ..., v_d=y_i)$ identifies the sequence of ancestor nodes for each class.

For every internal node $p \in N \setminus \ell$ with children $C(p)$ , LCPN trains a multi-class classifier $f_p: X \rightarrow C(p)$ . The training set for $f_p$ is constructed from all examples whose label lies in $p$ ’s subtree:

$D_p = \{ (x_i, c_i^p) \mid y_i \in \text{Subtree}(p),\ c_i^p = \text{child of}\ p\ \text{on the path to}\ y_i \}$

The overall objective is to minimize the node-wise aggregate loss:

$L(\{f_p\}) = \sum_{p \in N\setminus\ell} \sum_{(x_i, c_i^p) \in D_p} \ell(f_p(x_i), c_i^p)$

where $\ell(\cdot, \cdot)$ may be zero–one or a convex surrogate loss.

Inference proceeds in a top-down fashion: starting at the root, each classifier selects the most probable child node; the process recurses until a leaf is reached, producing a hierarchy-consistent prediction path.

2. Algorithms and Variants

Standard LCPN

Training: For each internal node $p$ , construct $D_p$ and train $f_p$ on discriminating amongst $C(p)$ .

Inference:

p = root
while C(p) != ∅:
    c_pred = f_p.predict(x)  # argmax over children
    p = c_pred
return p  # must be a leaf

LCPN+ (Probabilistic All-Paths)

Rather than greedily following a single path, LCPN+ computes a chain-rule probability for every leaf $\ell$ and returns the maximizer:

$\ell^* = \arg\max_{\ell \in \ell} \prod_{p \in \Omega_\ell} p(y=q_p \mid f_p, x)$

where $\Omega_\ell$ is the set of internal nodes along the path to $\ell$ , and $q_p$ is the chosen child on that path.

Pseudocode:

for each leaf ℓ in C:
    score[ℓ] = 1
    for each parent p on path to ℓ:
        q = child of p on that path
        score[ℓ] *= f_p(x)[q]
return argmax_ℓ score[ℓ]

LCPN+F (Local–Global Combined)

Augments LCPN+ by multiplying the global flat classifier’s prediction for each leaf:

$score[\ell] = \left[ \prod_{p \in \Omega'_\ell} p(y=q_p | f_p, x) \right] \times p(y=\ell | f, x)$

where $\Omega'_\ell$ excludes the leaf split; $f$ is a flat classifier over leaf classes.

Pseudocode:

for each leaf ℓ in C:
    score[ℓ] = f(x)[ℓ]  # global flat classifier
    for each parent p on path to ℓ:
        q = child of p on that path
        score[ℓ] *= f_p(x)[q]
return argmax_ℓ score[ℓ]

No special loss is introduced—each $f_p$ and $f$ is trained with standard cross-entropy.

3. HiClass Implementation

The HiClass Python library (Miranda et al., 2021) implements LCPN in hiclass.local.lcpn.LocalClassifierPerParentNode. The constructor accepts any scikit-learn classifier as a base_estimator and a specified or inferred hierarchy. Input hierarchical labels may be in matrix (samples $\times$ depth) or nested-list form; HiClass’s HierarchicalLabelEncoder facilitates encoding/decoding.

Typical usage proceeds as:

from hiclass.local.lcpn import LocalClassifierPerParentNode
from hiclass.utils import HierarchicalLabelEncoder
X = ...  # features, shape (n_samples, n_features)
y_paths = ...  # list of lists of node IDs/strings
hle = HierarchicalLabelEncoder()
y_encoded = hle.fit_transform(y_paths)
lcpn = LocalClassifierPerParentNode(base_estimator=..., hierarchy=hle.hierarchy_)
lcpn.fit(X, y_encoded)
y_pred = lcpn.predict(X)

The classifier returns predicted paths per sample. Hierarchical labels may also be provided as pandas DataFrames (1 column per depth-level).

4. Comparative Analysis and Efficiency

HiClass supports multiple local strategies:

Strategy	Classifiers per...	Inference Consistency	#Class Discrimination
LCPN	Parent Node	Guaranteed	$\ll$ total #classes
LCL (Level-wise)	Depth Level	Not always	All nodes at same level
LCN (Node-wise)	Each Node	Depends	Child nodes only

LCPN provides notably smaller classification tasks, leveraging local structures, with hierarchical consistency by construction. Its main trade-offs are the increased number of classifiers to train and the possibility of top-down error propagation—an incorrect early split blocks correct leaves downstream.

Computational cost for standard LCPN scales as $O(d \cdot T_{loc})$ , where $d$ is tree depth and $T_{loc}$ cost of one local prediction; LCPN+ and LCPN+F induce $O(c \cdot T_{loc})$ ( $c$ = leaf count) but LCPN+F includes an additional $O(T_{flat})$ for global prediction (Alagoz, 2023).

Empirically, flat classification (FC) is fastest, followed by global, LCPN+, and LCPN+F—all $\approx 3\times$ slower than FC, but standard LCPN can be up to $100\times$ slower (due to many small classifiers). LCPN+F and LCPN+ are similar in runtime, with the flat classifier overhead negligible for moderate $c$ .

5. Hierarchy Induction and Empirical Performance

Automated hierarchy generation (using divisive or agglomerative tree-building and LDA for dimensionality reduction) can significantly influence hierarchical classifier performance (Alagoz, 2023).

Scheme	Glass	PPTW	Yeast	Faces	FiftyWords
LCPN (Div)	0.877	1.119	0.935	0.845	0.842
LCPN+ (Div)	0.905	1.079	0.905	0.848	0.927
LCPN+F (Div)	1.007	1.029	0.897	1.002	1.022

(Table shows Learning-Efficiency $=F_1^{HC}/F_1^{FC}$ ; bold $>1$ means HC $>$ flat baseline.)

LCPN+F dominates in most divisive clustering settings, especially for time-series datasets (PPTW, FiftyWords) and structured domains (Glass, Faces). Hierarchy induction quality is critical; poor clustering or reduction will degrade hierarchical classification relative to flat classifiers.

6. Hierarchical Evaluation Metrics

HiClass provides specialized metrics for hierarchical classification assessment (Miranda et al., 2021):

Hierarchical Precision, Recall, F1: Based on overlaps of node sets along true and predicted paths:

$hPrecision_i = \frac{|A(y_i) \cap \widehat{A}(\hat{y}_i)|}{|\widehat{A}(\hat{y}_i)|},\quad hRecall_i = \frac{|A(y_i) \cap \widehat{A}(\hat{y}_i)|}{|A(y_i)|},\quad hF1_i = \frac{2 \cdot hP_i \cdot hR_i}{hP_i + hR_i}$

where $A(y_i)$ denotes true node-path, $\widehat{A}(\hat{y}_i)$ predicted.

Tree Induced Error (TE): Fraction of hierarchy levels misclassified:

$TE_i = 1 - \left( \frac{|A(y_i) \cap \widehat{A}(\hat{y}_i)|}{\text{depth}(y_i)} \right)$

Python usage:

1
2
3

from hiclass.metrics import hierarchical_precision, hierarchical_recall
hp = hierarchical_precision(y_true, y_pred)
hr = hierarchical_recall(y_true, y_pred)

These metrics provide fine-grained evaluation sensitive to hierarchical structure, outperforming flat accuracy measures for structured tasks.

7. Strengths, Limitations, and Practical Guidelines

Local-per-parent strategies exhibit modularity, exploiting local decision boundaries with parallelizable training and inference. LCPN+ and LCPN+F address error propagation inherent in greedy LCPN by evaluating entire leaf chains; LCPN+F additionally injects global discrimination, enhancing robustness.

Limitations include computational overhead for large $c$ , sensitivity to hierarchy induction method, and reduced efficacy where global flat models are inherently optimal (e.g., weakly structured problems).

Practitioners are advised:

For $c \leq 50$ and moderate inference cost, LCPN+F is optimal for balancing error propagation/global context.
For strict efficiency requirements, standard LCPN is preferable—accepting error path risks.
Always empirically validate against flat baselines, experiment with hierarchy induction (divisive/agglomerative clustering plus LDA), and prioritize LCPN+F in structured or time-series domains.

Local Classifier per Parent/Node thus offers a powerful framework for hierarchical classification, with probabilistic and combined extensions further enabling robust and accurate modeling of complex class structures (Miranda et al., 2021, Alagoz, 2023).

Markdown Report Issue Upgrade to Chat

References (2)

HiClass: a Python library for local hierarchical classification compatible with scikit-learn (2021)

Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Local Classifier per Parent/Node (LCPN/LCPC).