HeteroGAT-Rank: OSS Supply Chain Analysis

Updated 18 January 2026

HeteroGAT-Rank is an industry-oriented system that models OSS runtime behavior as lightweight, one-hop heterogeneous graphs for supply chain threat analysis.
It integrates offline behavior mining with online ranking, utilizing file, network, and command activity features to surface actionable evidence for analysts.
The system achieves scalability and interpretability through parallel graph construction, relation-aware attention, and efficient multi-GPU training.

HeteroGAT-Rank is an industry-oriented system for mining and ranking operational runtime behavior in open-source software (OSS) supply chains. It models the execution-time activities of OSS packages as lightweight, one-hop heterogeneous graphs and applies attention-based graph learning to identify and rank the most security-relevant behavioral patterns, guiding manual investigation for supply chain threat analysis. Unlike approaches seeking full automation, HeteroGAT-Rank is designed for analyst-in-the-loop workflows, surfacing actionable runtime evidence such as file, network, and command activities. The system integrates parallel graph construction, decouples offline mining from online analysis, and scales across multiple OSS ecosystems, aiming to support practical supply chain security workflows under realistic operational constraints (Tan et al., 11 Jan 2026).

1. System Pipeline and Architecture

HeteroGAT-Rank operates as a two-stage pipeline, composed of offline behavior mining and online behavioral pivot ranking.

Offline Behavior Mining: This stage consumes structured sandbox execution logs (primarily from the OpenSSF Package Analysis framework) encompassing install-time and import-time events such as file I/O, network connections, DNS queries, and command executions. Each package run yields a one-hop, star-shaped heterogeneous subgraph $G_r = (V_r, E_r)$ rooted at a Package_Name node, capturing the runtime behavior of the OSS package. Subgraphs are serialized for efficient reuse and are batched using PyG HeteroData structures. Model training occurs on these batched subgraphs, employing a heterogeneous graph attention network (HeteroGAT) and recording node- and edge-level attention coefficients $\alpha_{ij}^\phi$ to facilitate interpretability.
Online Behavioral Pivot Ranking: For new sandbox traces, the system generates a corresponding one-hop subgraph and passes it through pre-trained HeteroGAT layers, extracting attention scores $\alpha_{ij}^\phi$ . A ranking module selects the top- $k$ nodes and edges—either by mean attention across heads or by gradient-based influence (Grad-CAM)—yielding a prioritized list of runtime artifacts (e.g., specific files, commands, domains, sockets) as pivots for analyst-led investigation.

The serialization of subgraphs and reuse of learned attention parameters maximize efficiency, enabling fast, per-instance analysis at query time and obviating repeated preprocessing.

2. Heterogeneous Graph Construction

Node and Edge Types: Graphs constructed for each package run encompass the following node types: Package_Name (root of each subgraph), Path (file entities), DNS (queried hostnames), CMD (executed command strings), and Socket (network IP/port/hostname endpoints). Edges are typed, directed relations originating from the package root, representing:
- File actions (read, write, delete)
- DNS query interactions (A, AAAA, CNAME types)
- Command execution
- Socket communications
Feature Encoding: Initial node features $h_i \in \mathbb{R}^d$ are derived from categorical encoders or a distilled transformer (such as MiniLM) operating on normalized token values for files and domains, maintaining semantic similarity among artifacts.
Cross-Ecosystem Statistics: The dataset comprises 9,758 package instances from five major OSS ecosystems (Rust, npm, packagist, PyPI, Ruby), materializing over 10 million nodes and 54 million edges. There is marked imbalance in malicious sample prevalence, from 0.1% in crates.io to 81.9% in Rubygems; all ecosystems are modeled jointly without explicit ecosystem features to encourage cross-ecosystem behavioral generalization.
Parallel Construction: Graph materialization leverages Ray Actors for distributed processing across CPU workers, reducing peak memory usage by maintaining stateful feature encoders and serializing subgraphs for mini-batch training.

3. Relation-Aware Attention Mechanism

HeteroGAT-Rank employs a heterogeneous graph attention mechanism, supporting precise interpretability and robust modeling of diverse OSS runtime behaviors.

Relation-Aware GATv2Conv: For each relation type $\phi$ and attention head $k$ , attention coefficients are computed as

$e_{ij}^{\phi,(k)} = \mathrm{LeakyReLU}\left(a_\phi^{(k)^\mathsf{T}}\left[W_\phi^{(k)}h_i \Vert W_\phi^{(k)}h_j\right]\right)$

$\alpha_{ij}^{\phi,(k)} = \frac{\exp\left(e_{ij}^{\phi,(k)}\right)}{\sum_{j'\in\mathcal{N}_i^\phi}\exp\left(e_{ij'}^{\phi,(k)}\right)}$

where $W_\phi^{(k)}$ and $a_\phi^{(k)}$ are parameters specific to each relation and head.

Aggregation and Update Rule: Updated representations aggregate across relations and heads:

$h_i' = \sigma\left(\bigg\Vert_{\phi\in\mathcal{R}}\,\bigg\Vert_{k=1}^K \sum_{j\in\mathcal{N}_i^\phi} \alpha_{ij}^{\phi,(k)} W_\phi^{(k)} h_j\right)$

Layer normalization is applied to mitigate instability from small batch sizes.

Semantic Handling of Heterogeneity: Each relation receives unique projections and attention vectors, allowing the model to discriminate between, for example, filesystem actions and network queries. Node and edge types directly inform parameter selection, ensuring robust modeling of the operational diversity present in OSS runtimes.

4. Ranking Objective and Training Paradigm

Composite Loss Function: Training optimizes for behavioral classification accuracy, embedding robustness, interpretability, and sparsity through

$\mathcal{L}_{\mathrm{total}} = \lambda_{\mathrm{cls}}\mathcal{L}_{\mathrm{cls}} + \lambda_{\mathrm{ctr}}\mathcal{L}_{\mathrm{contrastive}} + \lambda_{\mathrm{ent}}\mathcal{L}_{\mathrm{entropy}} + \lambda_{\mathrm{sp}}\mathcal{L}_{\mathrm{sparsity}}$

where terms represent cross-entropy for label prediction, supervised contrastive learning, attention entropy minimization, and attention sparsity. Small regularization weights ( $\lambda_{\mathrm{ent}}=\lambda_{\mathrm{sp}}=0.01$ ) prevent over-regularization.

Sampling Strategy: Positive samples are labeled as “malicious” or “benign” per OSPTrack and replayed runs; negatives for contrastive loss are drawn from opposite label classes and other ecosystems.
Explainability and Ranking: Two strategies surface actionable pivots:
- Attention-Based Ranking: Computes head-averaged edge importance $\bar\alpha_{ij}$ , selecting top- $k$ edges and associated nodes per type.
- Grad-CAM-Based Ranking: Backpropagates from the prediction logit to attention scores, aggregating per-edge gradient magnitudes $g_{ij}$ to locate the most influential graph components.

This dual approach supports both interpretability and fidelity of explanation in practice.

5. Scalability and Engineering Considerations

HeteroGAT-Rank addresses substantial scalability challenges presented by OSS ecosystem diversity and heavy-tailed behavior distributions.

Parallel Graph Construction: Ray Actors maintain per-worker feature encoders, reducing per-host peak memory from 89 GB to below 50 GB, each processing shards into PyG HeteroData format.
Streaming Data Loading: Mini-batch training is facilitated by a custom DataLoader that streams serialized subgraphs as needed, rather than memory-mapping the entire set of 9,758 graphs.
Efficient GPU Utilization: Sparse operations are prioritized, especially for memory-bound phases such as attention and pooling on extremely skewed subgraphs (up to ~128,000 Action edges). This controls memory blow-up and maintains tractable computation.
Multi-GPU Scheduling: Data-parallel training via HuggingFace Accelerate supports seamless, synchronized multi-GPU runs, halving HeteroGAT model run time from 26 hours to 15 hours for 15 epochs.

A summary of key engineering optimizations is provided in the original study’s Table 2 (Tan et al., 11 Jan 2026).

6. Evaluation, Results, and Interpretability

Datasets and Metrics: Evaluations utilize the OSPTrack collection and a replay-augmented extension, totaling 9,758 OSS package runs across five ecosystems. Metrics assessed include AUC, accuracy, precision, recall, F1, as well as resource cost and the utility of top- $k$ pivots measured via linkage to CVEs, trend alignment scores, and description alignment scores.
Predictive Performance:
- HeteroGAT baseline (GATConv + mean pooling): AUC 0.9529, accuracy 0.9120.
- DHeteroGAT (GATv2Conv + attention pooling + entropy + sparsity): higher recall (0.9510), AUC 0.7265, indicating increased sensitivity but some loss in precision.
- PNHeteroGAT (adding contrastive loss) enhances separability under label imbalance.
Resource Cost: Lightweight comparisons (Entropy, Corr, SHAP+XGB) require seconds to minutes and <500 MB memory. In contrast, HeteroGAT variants demand hundreds of GB host memory and tens of GB GPU memory, with epoch times ranging from 10–26 hours for multi-GPU training.
Actionable Runtime Indicators: Unlike coarse statistical baselines, HeteroGAT-based methods identify specific file paths (e.g., /tmp/pip-ephem-wheel-cache-*), recognizable source files (custom/zalgo.js, maps/package.json), and ephemeral build artifacts. DHeteroGAT and PNHeteroGAT consistently surface module files and dependency manifests, correlating with known supply-chain attack patterns in, for example, the npm ecosystem.
Qualitative Insights: Top- $k$ pivots guide analyst triage by focusing attention on malicious artifacts in the context of their surrounding runtime actions. Analysts leveraging HeteroGAT-Rank surfaced pivots translate tokens into actionable queries for forensic tools and SIEMs (e.g., Velociraptor, ELK). Campaign studies (e.g., [email protected] and installation hooks in package.json) demonstrate the alignment of ranked indicators with concrete attack procedures. DHeteroGAT/PNHeteroGAT models achieved linkage with 186 CVEs under name-level mapping; HeteroGAT displayed superior semantic trend scores.

HeteroGAT-Rank thus provides a scalable, interpretable framework supporting analyst-led supply chain threat hunting, underpinned by relation-aware attention mechanisms, robust engineering, and empirically validated efficacy at ecosystem scale (Tan et al., 11 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Operational Runtime Behavior Mining for Open-Source Supply Chain Security (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HeteroGAT-Rank.