Explainable Particle Chebyshev Network

Updated 15 December 2025

E-PCN is a deep graph neural network that models jets as particle graphs using Chebyshev spectral convolutions and EdgeConv layers.
It employs four parallel branches weighted by distinct Lund-plane kinematic features to capture jet substructure with precision.
Integrating Grad-CAM explainability, E-PCN quantitatively attributes classification decisions to underlying physical features.

The Explainable Particle Chebyshev Network (E-PCN) constitutes an advanced graph neural network (GNN) framework optimized for jet tagging tasks in experimental high-energy physics. E-PCN enhances the interpretability and discrimination power of deep graph-based classifiers by simultaneously encoding multiple kinematic relationships via parallel spectral graph branches, each derived from distinct jet substructure measures. The architecture incorporates both Chebyshev spectral convolutions and EdgeConv layers over kinematically weighted particle graphs, allowing explicit attribution of classification decisions to underlying physical features via a Grad-CAM–derived approach (Islam et al., 8 Dec 2025).

1. Foundations: Particle Chebyshev Networks and Jet Graph Representation

Particle Chebyshev Networks (PCN) model jets as undirected graphs $G=(V,E)$ , where $V$ is the set of detected constituent particles and $E$ encodes proximity relations in pseudorapidity–azimuth ( $\eta,\phi$ ) space. Each node is instantiated with a $d_{\text{in}}=16$ –dimensional feature vector encompassing momentum components ( $p_x,p_y,p_z$ ), energy ( $E$ ), transverse momentum ( $p_T$ ), spatial coordinates ( $\eta,\phi$ ), impact parameters, and particle identification flags.

Edges join each particle to its $k=3$ nearest neighbors determined in $(\eta,\phi)$ using a KD-tree. The resulting adjacency matrix $A\in\mathbb{R}^{n\times n}$ is utilized to construct the graph Laplacian $L = D - A$ , with $D$ the degree matrix. To enable spectral graph convolutions via Chebyshev polynomials, $L$ is rescaled to $\tilde{L} = (2/\lambda_{\max})L - I$ , with $\lambda_{\max}$ typically approximated as $2$.

Chebyshev convolution applies polynomial filters $T_k(\tilde{L})$ recursively:

$T_0(\tilde{L})=I;\quad T_1(\tilde{L})=\tilde{L};\quad T_k(\tilde{L}) = 2 \tilde{L}\, T_{k-1}(\tilde{L}) - T_{k-2}(\tilde{L}).$

A single ChebConv layer transforms input node signals $x_{\text{in}} \in \mathbb{R}^{n \times d_{\text{in}}}$ as:

$x_{\text{out}} = \sum_{k=0}^K \theta_k T_k(\tilde{L}) x_{\text{in}},$

with learnable parameters $\{\theta_k\}$ over polynomial orders $K=16$ .

2. E-PCN Architecture: Multi-Kinematic Graph Branches

E-PCN advances PCN by constructing four parallel graph views, each weighted by a specific Lund-plane–inspired kinematic variable: angular separation ( $\Delta$ ), transverse momentum ( $k_T$ ), momentum fraction ( $z$ ), and invariant mass squared ( $m^2$ ). For each edge $(a,b)$ , the following are computed in logarithmic form:

$\Delta_{ab} = \sqrt{(\eta_a-\eta_b)^2+(\phi_a-\phi_b)^2}$
$k_{T,ab} = \min(p_{T,a}, p_{T,b}) \cdot \Delta_{ab}$
$z_{ab} = \min(p_{T,a}, p_{T,b})/(p_{T,a}+p_{T,b})$
$m_{ab}^2 = (E_a+E_b)^2 - \|\vec{p}_a + \vec{p}_b\|^2$

Each kinematic feature's logarithm ( $w_\Delta, w_{k_T}, w_z, w_{m^2}$ ) is used to re-weight the base adjacency:

$A^{(i)}_{ab} = A_{ab} \cdot w_i(a,b),\qquad i\in\{\Delta, k_T, z, m^2\}.$

Each weighted graph $G^{(i)}$ is processed through a dedicated GNN branch with five layers alternating ChebConv ( $K=16$ , hidden dim $64$) and EdgeConv, followed by BatchNorm and ReLU. Node features $h^{(i)}$ are mean-pooled to obtain per-graph embeddings $e^{(i)} \in \mathbb{R}^{64}$ . These are stacked into a $4 \times 64$ tensor $E$ and further processed by a $1$D convolution across graph channels, then flattened to a $256$-dimensional feature, followed by two fully connected layers ( $\text{FC}_1: 256 \rightarrow 64$ ; $\text{FC}_2: 64 \rightarrow 10$ jet class logits).

Key hyperparameters include hidden dimension $64$, polynomial order $16$, $4$ parallel branches, $k=3$ nearest neighbors, AdamW (lr= $10^{-3}$ ), batch size $256$, and $0.1$ dropout.

3. Grad-CAM–based Explainability in E-PCN

Interpretability is achieved by adapting Gradient-weighted Class Activation Mapping (Grad-CAM) to the GNN context. For output pre-softmax class score $y^c$ , importance weights for graph type $k$ are calculated as:

$\alpha_k^c = \frac{1}{Z}\sum_{i,j} \frac{\partial y^c}{\partial A^{(k)}_{ij}},$

with $Z$ as a normalization constant. The class-specific edge activation map is:

$M^c = \text{ReLU}\left( \sum_k \alpha_k^c \cdot A^{(k)} \right ).$

Global branch importance is estimated by averaging $\alpha_k^c$ or $\sum_{i,j} |\partial y^c/\partial A^{(k)}_{ij}|$ across all test instances. The normalized contributions are:

$\Delta$ : 40.72%
$k_T$ : 35.67%
$z$ : 14.06%
$m^2$ : 9.54%

This quantifies the relative impact of each kinematic feature on classifier decisions, offering direct physical interpretation.

4. Empirical Performance and Kinematic Attribution

On the JetClass dataset (10 classes, 1M training jets), E-PCN yields:

Macro-Accuracy: $94.67\%$ $(+2.36\%)$
Macro-AUC: $96.78\%$ $(+4.13\%)$
Macro-AUPR: $82.41\%$ $(+24.88\%)$

In comparison, the baseline PCN achieves:

Macro-Accuracy: $92.49\%$
Macro-AUC: $92.94\%$
Macro-AUPR: $65.99\%$

The table below summarizes performance:

Model	Macro-Accuracy	Macro-AUC	Macro-AUPR
PCN (baseline)	92.49%	92.94%	65.99%
E-PCN	94.67%	96.78%	82.41%

Grad-CAM analysis reveals angular separation ( $\Delta$ ) and transverse momentum ( $k_T$ ) account for approximately $76\%$ of classification decisions, corroborating the Lund-plane driven hypothesis that soft–collinear QCD dynamics are most discriminative in jet substructure. Momentum fraction ( $z$ ) and invariant mass ( $m^2$ ) contribute complementary discrimination, especially for heavy flavor processes (e.g., $H\rightarrow b\bar{b}$ shows elevated $m^2$ branch importance near $13.6\%$ ).

5. Significance and Implications

The E-PCN framework combines interpretable graph-based learning with physically motivated kinematic encoding, enabling identification of salient features underpinning jet classification tasks. The clear attribution facilitated by kinematic weighting and Grad-CAM–based analysis validates the soft–collinear structure hypothesis and enables domain experts to link model outcomes to QCD substructure intuition.

A plausible implication is that similar multi-branch graph architectures can generalize to other areas where interpretability and domain-based feature attribution are crucial. The explicit quantification of kinematic variable importance supports data-driven theoretical investigations and systematic studies of signal/background separation mechanisms.

E-PCN operationalizes Chebyshev spectral graph convolutions [ChebConv], edge-based convolutions [EdgeConv], and Grad-CAM attribution strategies, consistent with formal definitions from prior literature. The multi-graph approach aligns conceptually with Lund-plane jet analysis, emphasizing the integration of domain knowledge into machine learning workflows for collider physics.

The architecture and methodology follow the notation and algorithmic conventions established in the foundational E-PCN publication (Islam et al., 8 Dec 2025). Implementation fidelity requires adherence to prescribed hyperparameters, training schedules, and preprocessing techniques specified in that work.

Markdown Report Issue Upgrade to Chat

References (1)

E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Explainable Particle Chebyshev Network (E-PCN).