3P-ADMM-PC2: Parallel Privacy-Preserving ADMM

Updated 28 January 2026

The paper introduces 3P-ADMM-PC2, a protocol that integrates ADMM with homomorphic encryption and quantization for secure, distributed LASSO optimization.
It employs a three-phase structure—initialization, secure data sharing, and parallel privacy-computing—with adaptive GPU acceleration to reduce computation and communication overhead.
Experimental outcomes demonstrate near-lossless accuracy and significant runtime improvements, showcasing scalability in large-scale edge networks compared to traditional CPU implementations.

Three-Phase Parallel Collaborative ADMM Privacy Computing (3P-ADMM-PC2) is a cryptographically enhanced distributed optimization protocol designed for edge networks, addressing the need to simultaneously reduce computational burden, minimize information leakage, and enable privacy-preserving model training over split data. It integrates the alternating direction method of multipliers (ADMM) with Paillier homomorphic encryption (HE), a real-to-integer quantization strategy, and adaptive GPU acceleration for efficient and private solution of high-dimensional, distributed LASSO problems (Xia et al., 21 Jan 2026).

1. Protocol Structure: The Three Phases

3P-ADMM-PC2 operates in three sequential phases that collectively enable secure, distributed ADMM optimization on partitioned data:

Initialization Phase: The master node partitions the large global LASSO problem,

$\min_{x\in\mathbb R^N}\frac12\|y - A x\|_2^2 + \lambda\|x\|_1,$

column-wise into $K$ smaller subproblems. For each edge $k$ , the master transmits $\{A_k^T A_k,\,\rho\}$ , and edge $k$ precomputes $B_k=(A_k^T A_k + \rho I)^{-1}$ and quantized $\bar B_k=\Gamma_2(B_k \rho)$ .

Data Security Sharing Phase: The master quantizes and Paillier-encrypts the sensitive vector $B_k A_k^T y$ for each edge as $\hat\alpha_k = f_{en}(\Gamma_1(B_k A_k^T y))$ . Edge nodes download and locally store these encrypted values for reuse.
Parallel Privacy-Computing Phase: For $t=1,2,\dots$ $t = 1, 2, \dots$ , master and edges collaboratively perform an ADMM iteration on encrypted data:
- Master computes $z^{(t)} = S_{\lambda/\rho}(v^{(t-1)} + x^{(t-1)})$ and $v^{(t)} = v^{(t-1)} + x^{(t-1)} - z^{(t)}$ . It quantizes and Paillier-encrypts $z_k^{(t)}$ , $-v_k^{(t)}$ as $\hat z_k^{(t)} = E(\Gamma_2(z_k^{(t)}))$ and $\hat v_k^{(t)} = E(\Gamma_2(-v_k^{(t)}))$ .
- Edge $k$ performs, under Paillier homomorphism,
$\hat x_k^{(t)} = \hat\alpha_k \oplus \left[\Gamma_2(\bar B_k) \otimes (\hat z_k^{(t-1)} \oplus (-\hat v_k^{(t-1)}))\right],$

then returns $\hat x_k^{(t)}$ to the master, which decrypts and inverse-quantizes to recover $x_k^{(t)}$ .

Each node operates exclusively on low-dimensional data, ensuring privacy and reducing communication overhead (Xia et al., 21 Jan 2026).

2. ADMM Update Mechanism

The protocol is anchored in ADMM for the LASSO regression objective:

$\min_{x,z}\; \tfrac12\|y-Ax\|_2^2 + \lambda\|z\|_1\quad \text{subject to } x-z=0,$

with augmented Lagrangian,

$\mathcal L_\rho(x,z,u) = \tfrac12\|y-Ax\|_2^2 + \lambda\|z\|_1 + u^T(x-z) + \tfrac\rho2\|x-z\|_2^2.$

Centralized ADMM has the update rules:

$x$ -update: $(A^T A + \rho I)^{-1}(A^T y + \rho(z^{(t-1)} - u^{(t-1)}/\rho))$
$z$ -update: $S_{\lambda/\rho}(x^{(t)} + u^{(t-1)}/\rho)$
$u$ -update: $u^{(t-1)} + \rho(x^{(t)} - z^{(t)})$

The 3P-ADMM-PC2 distributed form upper-bounds $\|y-\sum A_k x_k\|^2$ , leading to per-node subproblems,

$x_k^{(t)} = (A_k^T A_k + \rho I)^{-1}(A_k^T y/K + \rho(z_k^{(t-1)}-v_k^{(t-1)})),$

with synchronized global $z$ and $v$ updates.

This matrix partitioning and update design enables independent encrypted computations at each edge, with secure aggregation by the master (Xia et al., 21 Jan 2026).

3. Quantization for Real-Valued Encryption

Because Paillier HE only supports integer arithmetic, real-valued vectors are mapped into finite integer intervals for encryption:

For vector $u\in[u_{\min}, u_{\max}]$ , use

$\Gamma_2(u) = \left\lfloor \Delta \frac{u-u_{\min}}{u_{\max}-u_{\min}} \right\rceil \in [0, \Delta]$

For matrix-vector or two-term operations, use squared scaling:

$\Gamma_1(w) = \left\lfloor \Delta^2 \frac{w-w_{\min}}{(w_{\max}-w_{\min})^2} \right\rceil$

Rounding error per entry is at most $1/2$, so worst-case reconstruction error scales as $O(1/\Delta)$ . The decrypted output after inverse quantization differs from the true real value by $O((z_{\max}-z_{\min})^2/\Delta^2)$ . With practical $\Delta$ , this quantization error becomes negligible, ensuring near-lossless privacy-preserving updates (Xia et al., 21 Jan 2026).

4. Paillier Homomorphic Encryption Scheme

3P-ADMM-PC2 applies the Paillier cryptosystem:

Key generation involves large primes $p,q$ , modulus $n=pq$ , special $g \in \mathbb{Z}_{n^2}^*$ , and computation of $\lambda$ and $\mu$ .
Encryption of a message $m$ :

$E(m) = g^m r^n \bmod n^2,\quad r\in\mathbb{Z}_n^*$

Decryption of ciphertext $c$ :

$D(c) = [L(c^\lambda\!\bmod n^2)\mu] \bmod n,\quad L(x) = (x-1)/n$

Homomorphic properties:
- $E(m_1)\times E(m_2) = E(m_1+m_2\bmod n)$
- $E(m)^k = E(k m\bmod n)$

These properties realize the secure sum and scalar-multiply required in the edge-side ADMM subproblem,

$\hat x_k^{(t)} = \hat\alpha_k \oplus \left[\Gamma_2(\bar B_k) \otimes (\hat z_k^{(t-1)} \oplus ( -\hat v_k^{(t-1)}))\right].$

This procedure never exposes the raw data vectors, ensuring full Paillier-level confidentiality during collaborative computations (Xia et al., 21 Jan 2026).

5. Adaptive GPU Acceleration

Due to the computational intensity of large-integer modular exponentiation, 3P-ADMM-PC2 adopts several GPU-specific optimizations:

CRT Decomposition: Modular exponentiation $\bmod\, n^2$ is decomposed into computations modulo $p^2$ (edges) and $q^2$ (master), combined via the Chinese Remainder Theorem.
GPU FFT-accelerated Multiplication: Large integers are represented as digit-vectors; multiplications are performed with FFT and IFFT in parallel on the GPU, with modular reduction (e.g., Barrett reduction) applied via low-bitwidth arithmetic.
Parallel ModExp in GPU Kernels: Each GPU streaming multiprocessor loads low-bitwidth digit chunks; a bitwise loop performs modular multiplications in parallel, using FFT routines and Barrett reduction.
Three-Round CRT Computation: CRT computations are distributed: edges handle computations $\bmod\,p^2$ , master handles $\bmod\,q^2$ , and the result is combined such that neither party ever operates directly on the full $n^2$ modulus.

This strategy achieves substantial speedup: with a 4096-bit Paillier key, GPU throughput for modular exponentiation is approximately 20 $\times$ that of a 64-core CPU (Xia et al., 21 Jan 2026).

6. Computational Complexity and Solution Quality

The chief computational bottleneck is large-integer modular exponentiation (ModExp), which under FFT-based multiplication has complexity $O(L\log L)$ for $L$ -digit integers, with each exponentiation requiring $O(\log n)$ such multiplications—yielding overall costs per ciphertext of $O(\log n \times L\log L)$ .

Per ADMM iteration, computation is dominated by the $O(KN_k)$ ciphertext operations plus fixed $O(KN_k^3)$ for local solves. GPU acceleration yields a per-ModExp throughput more than 20 times higher than CPU implementation on long keys.

Convergence analysis (as per approximate ADMM theory) holds so long as quantization and data splitting errors remain uniformly bounded ( $O(1/\Delta)$ and $O(1/\sqrt{M})$ respectively). Mean squared error (MSE) deviation from non-private distributed ADMM is on the order of $O(1/\Delta)\approx 10^{-15}$ in typical settings.

Empirically, on LASSO problems with $A\in\mathbb{R}^{3000\times27000}$ and $K=3$ , wall-clock times are:

CPU-based distributed HE-ADMM: 29,800 s (1024-bit), 41,000 s (2048-bit), 79,300 s (4096-bit)
GPU-accelerated 3P-ADMM-PC2: 11,700 s (1024-bit), 20,500 s (2048-bit), 34,900 s (4096-bit) This yields $2$– $3\times$ speedup with matching accuracy (Xia et al., 21 Jan 2026).

7. Experimental Outcomes and Topology Variation

Experiments evaluate accuracy, computational efficiency, and scalability:

Accuracy: 3P-ADMM-PC2 closely tracks non-private distributed ADMM (Dis-ADMM) with MSE error within $10^{-14}$ ; by contrast, DP-ADMM (differential privacy) incurs $\sim$ 0.2 units higher MSE.
Edge node count: For $N=65536$ , $M=10000$ , increasing edge count ( $K=3$ to $K=10$ ) reduces per-iteration wall-clock but marginally increases partition-induced MSE.
GPU latency reduction: Per iteration, node waiting times drop with GPU acceleration—master waiting $\sim12$ s (vs. $\sim$ 30s on CPU), edges $1$–$2$s (vs. $10$s).
Application: Power-network reconstruction: On large-scale MATPOWER (13,569-bus) benchmarks, 3P-ADMM-PC2 achieves AUROC/AUPRC parity with Dis-ADMM, confirming no quality loss.

Combined, these results document that 3P-ADMM-PC2 attains secure, nearly lossless privacy-preserving distributed optimization with significant runtime gains in heterogeneous, large-scale edge network settings (Xia et al., 21 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Parallel Collaborative ADMM Privacy Computing and Adaptive GPU Acceleration for Distributed Edge Networks (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Three-Phase Parallel Collaborative ADMM Privacy Computing (3P-ADMM-PC2).