FedGraph-VASP: Federated Graph Learning
- FedGraph-VASP is a privacy-preserving federated graph learning framework that securely detects cross-institutional money laundering among VASPs.
- It employs a Boundary Embedding Exchange protocol to share encrypted GNN embeddings of boundary nodes, preserving global graph topology without raw data exposure.
- The framework integrates post-quantum cryptography and demonstrates topology-dependent performance trade-offs, achieving near-centralized results in high-connectivity regimes.
FedGraph-VASP is a privacy-preserving federated graph learning framework designed to detect cross-institutional money laundering among Virtual Asset Service Providers (VASPs) while safeguarding sensitive transaction data and providing post-quantum security. The method fundamentally addresses the tension between regulatory requirements for collaborative anti-money laundering (AML) and the imperatives of user privacy. Its essential innovation is the Boundary Embedding Exchange protocol, which enables the secure, quantum-resistant sharing of graph neural network (GNN) embeddings pertaining solely to boundary accounts that link distinct institutional transaction subgraphs, thus retaining global graph topology for learning while precluding raw feature exposure (Commey et al., 25 Jan 2026).
1. Federated Graph AML Setting and Objectives
Cross-institutional AML detection imposes the need to correlate transaction patterns across distinct VASPs, each of which maintains a private subgraph of the overall asset transfer network. Let denote the transaction subgraph at VASP . Existing solutions are hampered by the requirement to either share sensitive data—thus undermining privacy—or to analyze data in isolation—thus missing cross-silo laundering chains. FedGraph-VASP synthesizes federated learning, graph neural networks (GNNs), and cryptographically enforced security. The stated objectives are:
- Retention of cross-institutional topological information for GNN message passing,
- Prevention of raw transaction feature leakage,
- Assurance of quantum-safe security for communications and data exchange (Commey et al., 25 Jan 2026).
2. Boundary Embedding Exchange Protocol
FedGraph-VASP decomposes graph learning such that each client VASP computes local node representations using a 2-layer GraphSAGE model. For node at layer ,
After the second layer (), the 128-dimensional final embedding is extracted for boundary nodes , defined as accounts with cross-VASP connections. These embeddings—without further lossy compression—are treated as compressed, non-invertible representations. For each boundary node, embeddings are encrypted and exchanged exclusively via authenticated post-quantum channels (see Section 3). Each VASP stores decrypted foreign embeddings in a boundary buffer, facilitating cross-silo alignment during training (Commey et al., 25 Jan 2026).
3. Post-Quantum Secure Communication Design
FedGraph-VASP employs a hybrid KEM-DEM cryptographic architecture. The key encapsulation mechanism (KEM) uses the NIST-standardized Kyber-512 scheme; data encapsulation (DEM) leverages AES-256-GCM. Parties derive public/private key pairs with Kyber-512; for each exchange, a symmetric key is generated by encapsulation, and boundary embeddings are encrypted as:
$c_v = \mathrm{AES\mbox{-}GCM.Enc}(K, \text{nonce}, h_v)$
Confidentiality and integrity are thereby assured, and the cryptosystem is robust even against "harvest-now, decrypt-later" quantum adversaries. Embedding encryption and decryption are performed via the following procedures (abridged pseudocode):
1 2 3 4 5 6 7 8 9 10 11 12 |
function EncryptBoundaryEmbeddings(pk_recipient, {h_v}):
(ct_KEM, K) ← Kyber.Encaps(pk_recipient)
nonce ← RandomBytes(12)
for each v in B_k:
ct_data[v] ← AESGCM_Encrypt(K, nonce, h_v)
return (ct_KEM, nonce, {ct_data[v]})
function DecryptForeignEmbeddings(sk_k, ct_KEM, nonce, {ct_data}):
K ← Kyber.Decaps(sk_k, ct_KEM)
for each ciphertext ct_data[v]:
h_v ← AESGCM_Decrypt(K, nonce, ct_data[v])
return {h_v} |
The cryptographic overhead is measured at less than 0.5% of total training time (10,500 embeddings/sec encryption), confirming negligible impact on scalability (Commey et al., 25 Jan 2026).
4. Federated Graph Learning Procedure
FedGraph-VASP's federated training loop follows an adaptation of Algorithm 1 in (Commey et al., 25 Jan 2026). In each of federated rounds:
- A central server broadcasts current global GNN weights ;
- Each client runs local epochs over its subgraph, optimizing the combined loss:
where classification loss is cross-entropy, and the boundary-alignment loss penalizes angular distance between local and received foreign embeddings over boundary nodes:
- FedAvg aggregates model parameters weighted by local sample count and updates global parameters;
- Each client decrypts foreign boundary embeddings for use in the next round.
This protocol preserves cross-institutional message-passing capacity, explicitly modeling cross-silo relationships under rigorous privacy constraints (Commey et al., 25 Jan 2026).
5. Experimental Evaluation and Topology-Dependent Trade-offs
Experiments are conducted on the Elliptic Bitcoin dataset (203,769 nodes, 234,355 edges, 2.23% labeled illicit) and an Ethereum transaction fraud dataset (9,841 nodes, 98,410 edges, constructed via -NN, ). The core evaluation contrasts three partitioning regimes with varying cross-silo connectivities:
- Louvain partitioning: 0.24% cross-silo edges (realistic, highly modular setting)
- METIS partitioning: 33% cross-silo edges (balanced, high-connectivity setting)
- Ethereum: 3.5% cross-edge ratio
Training regime parameters are: 2-layer GraphSAGE, hidden dimension 128, learning rate 0.01, weight decay , , 50 rounds, 3 epochs/round, with results averaged over 5 seeds (statistical significance ).
Summary of F1-scores across methods and regimes:
| Dataset | Local GNN | FedSage+ | FedAvg | FedGraph-VASP |
|---|---|---|---|---|
| Elliptic (Louvain, 0.24%) | 0.389 | 0.453 | 0.499 | 0.508 |
| Elliptic (METIS, 33%) | 0.48 | 0.55 | 0.62 | 0.63 |
| Ethereum (3.5%) | 0.785 | 0.855 | 0.640 | 0.635 |
Key outcomes include:
- On Elliptic (Louvain), FedGraph-VASP outperforms generative baseline FedSage+ by 12.1% (F1: 0.508 vs 0.453, ).
- In high-connectivity (METIS) regimes, FedGraph-VASP nearly approaches centralized performance (centralized GraphSAGE F1: 0.65).
- On Ethereum, with sparse and highly modular topology, FedSage+ (F1: 0.855) strongly outperforms FedGraph-VASP (F1: 0.635), indicating a regime where generative imputation is preferable.
- These findings elucidate a topology-dependent trade-off: explicit embedding exchange is superior in well-connected graphs, while generative imputation dominates in modular, sparse partitions (Commey et al., 25 Jan 2026).
6. Privacy Audit and Security Evaluation
Privacy risks are empirically characterized using embedding inversion and membership inference attacks:
- Embedding Inversion: An MLP regressor (256-128 hidden units) is trained to map . The test set mean squared error (MSE) is 0.581, , and Pearson . This indicates only partial invertibility of boundary embeddings: original features cannot be reliably reconstructed, but some information leakage persists.
- Membership Inference: Using the shadow model methodology of Shokri et al., the AUC for inferring the presence of a node in training is 0.95, indicating high membership leakage.
Interpretation: While the embedding protocol resists exact feature reconstruction, it is insufficient for membership privacy, indicating one direction for further privacy strengthening (Commey et al., 25 Jan 2026).
7. Critical Assessment and Future Directions
FedGraph-VASP achieves robust topology retention, effective cross-institutional information propagation for AML tasks, and quantum-safe privacy with minimal performance overhead. Strengths include approaching centralized performance in high-connectivity graphs and outperforming the previous state-of-the-art baseline in realistic modular regimes. Encryption overhead is marginal, and the exchange protocol is efficient at scale.
Noted limitations are:
- Pronounced vulnerability to membership inference (AUC=0.95),
- Partial, but nonzero, risk of feature inversion (),
- Suboptimal performance in ultra-low connectivity (cross-edge ratio ),
- No resilience against Byzantine (malicious) client behaviors,
- Dependence on a trusted Private Set Intersection (PSI) for discovering boundary nodes.
Recommended directions for advancing the framework include integrating differential privacy (DP-SGD) to mitigate membership inference, developing Byzantine-resilient aggregation schemes (such as Krum or Median), extending applicability to multi-chain and cross-chain environments, optimizing for fragmented topologies, and leveraging fully privacy-preserving PSI to eliminate trust assumptions in boundary discovery. Formal analysis of privacy–utility trade-offs with the combined application of differential privacy and post-quantum cryptography is also posed as a critical area for future work (Commey et al., 25 Jan 2026).