3P-ADMM-PC2: Parallel Privacy-Preserving ADMM
- The paper introduces 3P-ADMM-PC2, a protocol that integrates ADMM with homomorphic encryption and quantization for secure, distributed LASSO optimization.
- It employs a three-phase structure—initialization, secure data sharing, and parallel privacy-computing—with adaptive GPU acceleration to reduce computation and communication overhead.
- Experimental outcomes demonstrate near-lossless accuracy and significant runtime improvements, showcasing scalability in large-scale edge networks compared to traditional CPU implementations.
Three-Phase Parallel Collaborative ADMM Privacy Computing (3P-ADMM-PC2) is a cryptographically enhanced distributed optimization protocol designed for edge networks, addressing the need to simultaneously reduce computational burden, minimize information leakage, and enable privacy-preserving model training over split data. It integrates the alternating direction method of multipliers (ADMM) with Paillier homomorphic encryption (HE), a real-to-integer quantization strategy, and adaptive GPU acceleration for efficient and private solution of high-dimensional, distributed LASSO problems (Xia et al., 21 Jan 2026).
1. Protocol Structure: The Three Phases
3P-ADMM-PC2 operates in three sequential phases that collectively enable secure, distributed ADMM optimization on partitioned data:
- Initialization Phase: The master node partitions the large global LASSO problem,
column-wise into smaller subproblems. For each edge , the master transmits , and edge precomputes and quantized .
- Data Security Sharing Phase: The master quantizes and Paillier-encrypts the sensitive vector for each edge as . Edge nodes download and locally store these encrypted values for reuse.
- Parallel Privacy-Computing Phase: For , master and edges collaboratively perform an ADMM iteration on encrypted data:
- Master computes and . It quantizes and Paillier-encrypts , as and .
- Edge performs, under Paillier homomorphism,
then returns to the master, which decrypts and inverse-quantizes to recover .
Each node operates exclusively on low-dimensional data, ensuring privacy and reducing communication overhead (Xia et al., 21 Jan 2026).
2. ADMM Update Mechanism
The protocol is anchored in ADMM for the LASSO regression objective:
with augmented Lagrangian,
Centralized ADMM has the update rules:
-update:
-update:
-update:
The 3P-ADMM-PC2 distributed form upper-bounds , leading to per-node subproblems,
with synchronized global and updates.
This matrix partitioning and update design enables independent encrypted computations at each edge, with secure aggregation by the master (Xia et al., 21 Jan 2026).
3. Quantization for Real-Valued Encryption
Because Paillier HE only supports integer arithmetic, real-valued vectors are mapped into finite integer intervals for encryption:
- For vector , use
- For matrix-vector or two-term operations, use squared scaling:
Rounding error per entry is at most $1/2$, so worst-case reconstruction error scales as . The decrypted output after inverse quantization differs from the true real value by . With practical , this quantization error becomes negligible, ensuring near-lossless privacy-preserving updates (Xia et al., 21 Jan 2026).
4. Paillier Homomorphic Encryption Scheme
3P-ADMM-PC2 applies the Paillier cryptosystem:
Key generation involves large primes , modulus , special , and computation of and .
Encryption of a message :
- Decryption of ciphertext :
- Homomorphic properties:
These properties realize the secure sum and scalar-multiply required in the edge-side ADMM subproblem,
This procedure never exposes the raw data vectors, ensuring full Paillier-level confidentiality during collaborative computations (Xia et al., 21 Jan 2026).
5. Adaptive GPU Acceleration
Due to the computational intensity of large-integer modular exponentiation, 3P-ADMM-PC2 adopts several GPU-specific optimizations:
- CRT Decomposition: Modular exponentiation is decomposed into computations modulo (edges) and (master), combined via the Chinese Remainder Theorem.
- GPU FFT-accelerated Multiplication: Large integers are represented as digit-vectors; multiplications are performed with FFT and IFFT in parallel on the GPU, with modular reduction (e.g., Barrett reduction) applied via low-bitwidth arithmetic.
- Parallel ModExp in GPU Kernels: Each GPU streaming multiprocessor loads low-bitwidth digit chunks; a bitwise loop performs modular multiplications in parallel, using FFT routines and Barrett reduction.
- Three-Round CRT Computation: CRT computations are distributed: edges handle computations , master handles , and the result is combined such that neither party ever operates directly on the full modulus.
This strategy achieves substantial speedup: with a 4096-bit Paillier key, GPU throughput for modular exponentiation is approximately 20 that of a 64-core CPU (Xia et al., 21 Jan 2026).
6. Computational Complexity and Solution Quality
The chief computational bottleneck is large-integer modular exponentiation (ModExp), which under FFT-based multiplication has complexity for -digit integers, with each exponentiation requiring such multiplications—yielding overall costs per ciphertext of .
Per ADMM iteration, computation is dominated by the ciphertext operations plus fixed for local solves. GPU acceleration yields a per-ModExp throughput more than 20 times higher than CPU implementation on long keys.
Convergence analysis (as per approximate ADMM theory) holds so long as quantization and data splitting errors remain uniformly bounded ( and respectively). Mean squared error (MSE) deviation from non-private distributed ADMM is on the order of in typical settings.
Empirically, on LASSO problems with and , wall-clock times are:
- CPU-based distributed HE-ADMM: 29,800 s (1024-bit), 41,000 s (2048-bit), 79,300 s (4096-bit)
- GPU-accelerated 3P-ADMM-PC2: 11,700 s (1024-bit), 20,500 s (2048-bit), 34,900 s (4096-bit) This yields $2$– speedup with matching accuracy (Xia et al., 21 Jan 2026).
7. Experimental Outcomes and Topology Variation
Experiments evaluate accuracy, computational efficiency, and scalability:
- Accuracy: 3P-ADMM-PC2 closely tracks non-private distributed ADMM (Dis-ADMM) with MSE error within ; by contrast, DP-ADMM (differential privacy) incurs 0.2 units higher MSE.
- Edge node count: For , , increasing edge count ( to ) reduces per-iteration wall-clock but marginally increases partition-induced MSE.
- GPU latency reduction: Per iteration, node waiting times drop with GPU acceleration—master waiting s (vs. 30s on CPU), edges $1$–$2$s (vs. $10$s).
- Application: Power-network reconstruction: On large-scale MATPOWER (13,569-bus) benchmarks, 3P-ADMM-PC2 achieves AUROC/AUPRC parity with Dis-ADMM, confirming no quality loss.
Combined, these results document that 3P-ADMM-PC2 attains secure, nearly lossless privacy-preserving distributed optimization with significant runtime gains in heterogeneous, large-scale edge network settings (Xia et al., 21 Jan 2026).