Papers
Topics
Authors
Recent
Search
2000 character limit reached

Local Classifier per Parent/Node (LCPN)

Updated 2 February 2026
  • LCPN is a modular hierarchical classification strategy that trains distinct multi-class classifiers at each internal node of a label tree.
  • Extensions LCPN+ and LCPN+F enhance robust inference by integrating probabilistic all-paths and global flat classification mechanisms.
  • Implemented in the HiClass framework, LCPN ensures hierarchy-consistent predictions while leveraging specialized metrics for structured evaluation.

Local Classifier per Parent/Node (LCPN) is a modular hierarchical classification strategy wherein distinct multi-way classifiers are trained at each internal node of a tree-structured label hierarchy. Each classifier discriminates among its direct child classes using only those training examples whose true class resides in the node’s subtree. Extensions such as LCPN+ (probabilistic all-paths) and LCPN+F (combining local and global classification) further improve robustness and performance, especially in structured or time-series domains. This paradigm is foundational for hierarchical classification, prominently implemented in frameworks such as HiClass (Miranda et al., 2021), and empirically validated in automatic hierarchy induction settings (&&&1&&&).

1. Formal Framework and Problem Definition

Let H=(N,E)H = (N, E) denote the tree-structured hierarchy, where NN is the set of nodes (internal and leaves) and EN×NE \subseteq N \times N is the parent-child relation. The leaves N\ell \subseteq N represent the atomic classes; each training sample (xi,yi)(x_i, y_i) is labeled by a unique leaf yiy_i \in \ell. The path Path(yi)=(v0=root,v1,...,vd=yi)Path(y_i) = (v_0=\text{root}, v_1, ..., v_d=y_i) identifies the sequence of ancestor nodes for each class.

For every internal node pNp \in N \setminus \ell with children C(p)C(p), LCPN trains a multi-class classifier fp:XC(p)f_p: X \rightarrow C(p). The training set for fpf_p is constructed from all examples whose label lies in pp’s subtree:

Dp={(xi,cip)yiSubtree(p), cip=child of p on the path to yi}D_p = \{ (x_i, c_i^p) \mid y_i \in \text{Subtree}(p),\ c_i^p = \text{child of}\ p\ \text{on the path to}\ y_i \}

The overall objective is to minimize the node-wise aggregate loss:

L({fp})=pN(xi,cip)Dp(fp(xi),cip)L(\{f_p\}) = \sum_{p \in N\setminus\ell} \sum_{(x_i, c_i^p) \in D_p} \ell(f_p(x_i), c_i^p)

where (,)\ell(\cdot, \cdot) may be zero–one or a convex surrogate loss.

Inference proceeds in a top-down fashion: starting at the root, each classifier selects the most probable child node; the process recurses until a leaf is reached, producing a hierarchy-consistent prediction path.

2. Algorithms and Variants

Standard LCPN

Training: For each internal node pp, construct DpD_p and train fpf_p on discriminating amongst C(p)C(p).

Inference:

1
2
3
4
5
p = root
while C(p) != :
    c_pred = f_p.predict(x)  # argmax over children
    p = c_pred
return p  # must be a leaf

LCPN+ (Probabilistic All-Paths)

Rather than greedily following a single path, LCPN+ computes a chain-rule probability for every leaf \ell and returns the maximizer:

=argmaxpΩp(y=qpfp,x)\ell^* = \arg\max_{\ell \in \ell} \prod_{p \in \Omega_\ell} p(y=q_p \mid f_p, x)

where Ω\Omega_\ell is the set of internal nodes along the path to \ell, and qpq_p is the chosen child on that path.

Pseudocode:

1
2
3
4
5
6
for each leaf ℓ in C:
    score[ℓ] = 1
    for each parent p on path to ℓ:
        q = child of p on that path
        score[ℓ] *= f_p(x)[q]
return argmax_ℓ score[ℓ]

LCPN+F (Local–Global Combined)

Augments LCPN+ by multiplying the global flat classifier’s prediction for each leaf:

score[]=[pΩp(y=qpfp,x)]×p(y=f,x)score[\ell] = \left[ \prod_{p \in \Omega'_\ell} p(y=q_p | f_p, x) \right] \times p(y=\ell | f, x)

where Ω\Omega'_\ell excludes the leaf split; ff is a flat classifier over leaf classes.

Pseudocode:

1
2
3
4
5
6
for each leaf ℓ in C:
    score[ℓ] = f(x)[ℓ]  # global flat classifier
    for each parent p on path to ℓ:
        q = child of p on that path
        score[ℓ] *= f_p(x)[q]
return argmax_ℓ score[ℓ]

No special loss is introduced—each fpf_p and ff is trained with standard cross-entropy.

3. HiClass Implementation

The HiClass Python library (Miranda et al., 2021) implements LCPN in hiclass.local.lcpn.LocalClassifierPerParentNode. The constructor accepts any scikit-learn classifier as a base_estimator and a specified or inferred hierarchy. Input hierarchical labels may be in matrix (samples ×\times depth) or nested-list form; HiClass’s HierarchicalLabelEncoder facilitates encoding/decoding.

Typical usage proceeds as:

1
2
3
4
5
6
7
8
9
from hiclass.local.lcpn import LocalClassifierPerParentNode
from hiclass.utils import HierarchicalLabelEncoder
X = ...  # features, shape (n_samples, n_features)
y_paths = ...  # list of lists of node IDs/strings
hle = HierarchicalLabelEncoder()
y_encoded = hle.fit_transform(y_paths)
lcpn = LocalClassifierPerParentNode(base_estimator=..., hierarchy=hle.hierarchy_)
lcpn.fit(X, y_encoded)
y_pred = lcpn.predict(X)
The classifier returns predicted paths per sample. Hierarchical labels may also be provided as pandas DataFrames (1 column per depth-level).

4. Comparative Analysis and Efficiency

HiClass supports multiple local strategies:

Strategy Classifiers per... Inference Consistency #Class Discrimination
LCPN Parent Node Guaranteed \ll total #classes
LCL (Level-wise) Depth Level Not always All nodes at same level
LCN (Node-wise) Each Node Depends Child nodes only

LCPN provides notably smaller classification tasks, leveraging local structures, with hierarchical consistency by construction. Its main trade-offs are the increased number of classifiers to train and the possibility of top-down error propagation—an incorrect early split blocks correct leaves downstream.

Computational cost for standard LCPN scales as O(dTloc)O(d \cdot T_{loc}), where dd is tree depth and TlocT_{loc} cost of one local prediction; LCPN+ and LCPN+F induce O(cTloc)O(c \cdot T_{loc}) (cc = leaf count) but LCPN+F includes an additional O(Tflat)O(T_{flat}) for global prediction (Alagoz, 2023).

Empirically, flat classification (FC) is fastest, followed by global, LCPN+, and LCPN+F—all 3×\approx 3\times slower than FC, but standard LCPN can be up to 100×100\times slower (due to many small classifiers). LCPN+F and LCPN+ are similar in runtime, with the flat classifier overhead negligible for moderate cc.

5. Hierarchy Induction and Empirical Performance

Automated hierarchy generation (using divisive or agglomerative tree-building and LDA for dimensionality reduction) can significantly influence hierarchical classifier performance (Alagoz, 2023).

Scheme Glass PPTW Yeast Faces FiftyWords
LCPN (Div) 0.877 1.119 0.935 0.845 0.842
LCPN+ (Div) 0.905 1.079 0.905 0.848 0.927
LCPN+F (Div) 1.007 1.029 0.897 1.002 1.022

(Table shows Learning-Efficiency =F1HC/F1FC=F_1^{HC}/F_1^{FC}; bold >1>1 means HC >> flat baseline.)

LCPN+F dominates in most divisive clustering settings, especially for time-series datasets (PPTW, FiftyWords) and structured domains (Glass, Faces). Hierarchy induction quality is critical; poor clustering or reduction will degrade hierarchical classification relative to flat classifiers.

6. Hierarchical Evaluation Metrics

HiClass provides specialized metrics for hierarchical classification assessment (Miranda et al., 2021):

  • Hierarchical Precision, Recall, F1: Based on overlaps of node sets along true and predicted paths:

hPrecisioni=A(yi)A^(y^i)A^(y^i),hRecalli=A(yi)A^(y^i)A(yi),hF1i=2hPihRihPi+hRihPrecision_i = \frac{|A(y_i) \cap \widehat{A}(\hat{y}_i)|}{|\widehat{A}(\hat{y}_i)|},\quad hRecall_i = \frac{|A(y_i) \cap \widehat{A}(\hat{y}_i)|}{|A(y_i)|},\quad hF1_i = \frac{2 \cdot hP_i \cdot hR_i}{hP_i + hR_i}

where A(yi)A(y_i) denotes true node-path, A^(y^i)\widehat{A}(\hat{y}_i) predicted.

  • Tree Induced Error (TE): Fraction of hierarchy levels misclassified:

TEi=1(A(yi)A^(y^i)depth(yi))TE_i = 1 - \left( \frac{|A(y_i) \cap \widehat{A}(\hat{y}_i)|}{\text{depth}(y_i)} \right)

Python usage:

1
2
3
from hiclass.metrics import hierarchical_precision, hierarchical_recall
hp = hierarchical_precision(y_true, y_pred)
hr = hierarchical_recall(y_true, y_pred)

These metrics provide fine-grained evaluation sensitive to hierarchical structure, outperforming flat accuracy measures for structured tasks.

7. Strengths, Limitations, and Practical Guidelines

Local-per-parent strategies exhibit modularity, exploiting local decision boundaries with parallelizable training and inference. LCPN+ and LCPN+F address error propagation inherent in greedy LCPN by evaluating entire leaf chains; LCPN+F additionally injects global discrimination, enhancing robustness.

Limitations include computational overhead for large cc, sensitivity to hierarchy induction method, and reduced efficacy where global flat models are inherently optimal (e.g., weakly structured problems).

Practitioners are advised:

  • For c50c \leq 50 and moderate inference cost, LCPN+F is optimal for balancing error propagation/global context.
  • For strict efficiency requirements, standard LCPN is preferable—accepting error path risks.
  • Always empirically validate against flat baselines, experiment with hierarchy induction (divisive/agglomerative clustering plus LDA), and prioritize LCPN+F in structured or time-series domains.

Local Classifier per Parent/Node thus offers a powerful framework for hierarchical classification, with probabilistic and combined extensions further enabling robust and accurate modeling of complex class structures (Miranda et al., 2021, Alagoz, 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Local Classifier per Parent/Node (LCPN/LCPC).