Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning

Published 30 Dec 2025 in cs.LG, cs.DC, and math.OC | (2512.24069v1)

Abstract: We consider the design of mixing matrices to minimize the operation cost for decentralized federated learning (DFL) in wireless networks, with focus on minimizing the maximum per-node energy consumption. As a critical hyperparameter for DFL, the mixing matrix controls both the convergence rate and the needs of agent-to-agent communications, and has thus been studied extensively. However, existing designs mostly focused on minimizing the communication time, leaving open the minimization of per-node energy consumption that is critical for energy-constrained devices. This work addresses this gap through a theoretically-justified solution for mixing matrix design that aims at minimizing the maximum per-node energy consumption until convergence, while taking into account the broadcast nature of wireless communications. Based on a novel convergence theorem that allows arbitrarily time-varying mixing matrices, we propose a multi-phase design framework that activates time-varying communication topologies under optimized budgets to trade off the per-iteration energy consumption and the convergence rate while balancing the energy consumption across nodes. Our evaluations based on real data have validated the efficacy of the proposed solution in combining the low energy consumption of sparse mixing matrices and the fast convergence of dense mixing matrices.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a multi-phase framework for adaptively varying mixing matrices to minimize maximum per-node energy consumption in decentralized federated learning.
It establishes a new convergence theorem that links time-varying spectral gaps to convergence rates, enabling precise energy vs. performance trade-offs.
Empirical results on CIFAR-10 with realistic wireless models demonstrate that adaptive node activation achieves balanced energy use with comparable accuracy.

Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning

Problem Formulation and Motivation

The paper "Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning" (2512.24069) addresses the optimization of communication patterns in decentralized federated learning (DFL), with particular focus on minimizing maximum per-node energy consumption under practical broadcast wireless communication models. Traditional DFL research predominantly optimizes for communication time, neglecting node-level energy heterogeneity, unbalanced energy depletion, or the peculiarities of wireless broadcast. As energy constraints govern the operational longevity of edge devices, this paper prioritizes balanced and efficient energy dissipation across all agents.

The key operational parameter in DFL is the mixing matrix, which dictates communication topology and aggregation weights. Conventional approaches adopt static or periodically-varying mixing matrices, largely overlooking the potential benefits of adaptively and arbitrarily varying this matrix to dynamically trade off between per-iteration energy and overall convergence.

Theoretical Contributions

A central theoretical innovation is the development of a general convergence theorem for D-PSGD under arbitrarily time-varying (potentially random) mixing matrices—removing assumptions of fixed, periodic, or i.i.d.-drawn topologies that dominate prior analyses [MATCHA22, Koloskova20ICML]. The new convergence result characterizes the number of iterations to convergence as an explicit functional of a sequence of mixing-matrix spectral gaps, enabling optimization of communication schedules for non-convex and convex settings. The analysis leverages ergodic measures (i.e., functionals $\Pi_1(T)$ and $\Pi_2(T)$ ) that aggregate the sequence of spectral contractions over time, permitting non-uniform, multi-phase evolution of communication topologies.

Notably, the derived bounds do not require a uniform lower bound on the spectral gap, and thus encompass highly heterogeneous or intermittent communication patterns that become attractive under energy constraints.

Multi-phase and Budgeted Design Framework

Building on the theoretical result, the paper proposes a multi-phase design framework. The training epochs are partitioned into $K$ phases, with each phase employing a randomized mixing matrix under an optimized energy budget. This trilevel optimization comprises:

Upper-level: Selecting phase count $K$ .
Intermediate-level: Allocating budgets and durations for each phase to minimize the overall (worst-case) per-node energy until convergence.
Lower-level: Instantiating, for broadcast or unicast communications, mixing matrix distributions satisfying the specified energy budgets.

The lower-level algorithm for broadcast networks utilizes probabilistic node activation governed by per-agent energy budgets, with Metropolis-Hastings weighting within active subgraphs. This randomization allows nodes to remain quiescent when over-constrained, balancing load and minimizing outage risk. Theoretically, for homogeneous costs and fully-connected graphs, the spectral gap is tightly controlled as a function of the activation probability, leading to a closed-form performance characterization.

Unicast cases are handled via a meta-algorithm based on sampling random regular subgraphs (e.g., via Ramanujan or expander graphs) and using SDP to optimize over their convex combinations, extending and generalizing approaches from MATCHA [MATCHA22], BASS [Herrera25OJCS], and Laplacian Matrix Sampling [Chiu23JSAC].

Empirical Evaluation

Comprehensive experiments are performed on standard image recognition tasks (e.g., CIFAR-10 with ResNet) in realistic wireless network settings (clique and Roofnet topologies), using computational and communication energy parameters reflecting modern edge hardware. Competing methods include:

Vanilla D-PSGD: Maximal activation, minimal convergence delay, maximal energy use.
AdaPC: Periodically scheduled communication.
BASS: State-of-the-art broadcast DFL with minimal slot scheduling (not energy-aware or energy-balanced).
SkipTrain and Max Success: Alternative communication reduction strategies.

The results demonstrate that multi-phase, budgeted, time-varying mixing matrix design delivers superior trade-offs. Specifically, activating a sparse subset of nodes in early phases, then adaptively increasing participation as convergence stalls, achieves lower maximal energy per node for comparable accuracy. The framework enabled balanced energy consumption, avoiding early death of high-degree or resource-constrained nodes—a limitation observed in benchmark schemes.

Quantitatively, strong reductions in maximal per-node energy are observed compared to full-density mixing, with only minor impact on convergence time or test accuracy. Gains are particularly pronounced in heterogeneous or irregular topologies, where static designs lead to severe energy imbalance. The framework allows the system integrator to tune the phase and budget schedule to their specific operational constraints.

Implications and Future Directions

This work extends the design space in DFL by systematically exploiting temporal adaptivity and probabilistic communication orchestration, moving beyond topology and compression-centric approaches [liang2020decentralized, Chiu23JSAC, MATCHA22]. The developed convergence theorems open the path for rigorous, analytic cost-performance tradeoff studies under arbitrary time-varying (possibly adversarial) schedules, laying the groundwork for robust and energy-aware learning in volatile or adversarial edge networks.

Practically, the framework suggests that robust, energy-balanced DFL is achievable without sacrificing accuracy or incurring prohibitive communication overhead. This is critical for deploying DFL in wireless IoT, AIoT, or mobile settings, where battery heterogeneity and uncoordinated interference are the norm.

The paper leaves real-time adaptation to exogenous dynamics (e.g., channel fading, link drop, application-driven accuracy constraints) for future work. Further, robustification to non-stationary data or model drift, as well as combining communication pattern adaptation with content-level adaptation (e.g., gradient pruning, quantization), are promising avenues—especially in federated multi-task or personalized settings.

Conclusion

The paper establishes a new paradigm for energy-efficient DFL by leveraging arbitrarily time-varying, multi-phase mixing matrix designs optimized for per-node energy constraints under broadcast-dominated networks. It combines rigorous convergence analysis, algorithmic innovation, and strong empirical support, and serves as a foundation for further exploration of dynamic, energy-aware collaboration in federated and decentralized learning systems (2512.24069).

Markdown Report Issue