Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Decentralised Resilient Community Cloud Infrastructures

Published 22 Sep 2017 in cs.DC | (1709.07688v1)

Abstract: Recent years have seen a trend towards decentralisation - from initiatives on decentralized web to decentralized network infrastructures (e.g community networks). In this position paper, we present an architectural vision for decentralising cloud service infrastructures. Our vision is on the notion of community cloud infrastructures on top of decentralised access infrastructures i.e. community networks, using resources pooled from the community. Our architectural vision takes into consideration some of the fundamental challenges of integrating the current state of the art virtualisation technologies such as Software Defined Networking (SDN) into community infrastructures which are highly unreliable. Our proposed design goal is to include lightweight virtualization and fault tolerance mechanisms into the architecture to ensure sufficient level of reliability to support critical applications.

Summary

  • The paper outlines an architecture combining SDN, NFV, and virtualization to ensure resilience and fault tolerance in community cloud infrastructures.
  • It demonstrates resource slicing and dynamic reconfiguration via a Cloud Service Manager to mitigate network unreliability and heterogeneous resource quality.
  • The work introduces a layered SDN control plane with replicated controllers for consistent reconfiguration, highlighting open challenges in decentralized cloud design.

Decentralised Resilient Community Clouds: Architectural Vision and Research Trajectory

Motivation for Decentralisation and Community Clouds

The centralisation of contemporary Internet infrastructures has led to dependence on a handful of economically motivated corporations, particularly for access and cloud services. Such structural imbalances restrict connectivity, especially in neglected and rural areas, and hinder the democratization of digital resources. In response, bottom-up initiatives have fostered the rise of Community Networks (CNs), exemplified by Guifi.net, which aggregates heterogeneous, self-organized resources into large-scale wireless networks. Community clouds—cloud services deployed atop these CNs using pooled local resources—represent an alternative paradigm with the potential for locality, privacy preservation, data sovereignty, and independence from vendor lock-in.

However, unlike public or commercial clouds, community clouds must contend with pronounced infrastructure unreliability and heterogeneity. The paper "Towards Decentralised Resilient Community Cloud Infrastructures" (1709.07688) articulates a comprehensive architectural approach to addressing these challenges, integrating lightweight virtualization, SDN, and NFV for resilient and fault-tolerant service provisioning.

Characterization of Community Network Infrastructures

Community Networks such as Guifi.net are inherently organic, displaying highly variable node availability, resource diversity, and ad-hoc mesh topologies. Empirical measurements reveal that a significant fraction of CN nodes exhibit suboptimal reliability, with only 60% of core-graph nodes reachable more than 90% of the time. User-driven service provisioning is similarly diversified: while proxies and Internet access dominate, sustained growth is observed across both user- and network-oriented application domains.

This heterogeneity undermines the applicability of conventional centralised cloud management and orchestration protocols. The architecture must therefore be robust to intermittent connectivity, asymmetric and partitionable topologies, and widely variable computational, network, and storage resource qualities.

Architectural Overview and Resource Slicing

The proposed architecture orchestrates virtualized resource pools using resource slicing and elasticity, exposing each slice as a composite of compute, storage, and networking primitives subject to dynamic reconfiguration. Each slice retains its own SDN-based virtual network, controllers to manage configuration, and is insulated against substrate-level failures through replication and migration. Figure 1

Figure 1: Architecture of a resilient local community cloud integrating SDN controllers, resource slices, and distributed compute/storage resources.

Slices are mapped to underlying substrate resources across multiple independent unreliable community networks. Critical architectural components include community boxes (decentralised nodes providing compute, storage, and communication), multiple SDN controllers per slice (for redundancy and failover), and a Cloud Service Manager (CSM) which continuously monitors resource pools and dynamically allocates, migrates, or reconfigures application VMs and network paths to optimize resilience and QoS.

When congestion or failures are detected (e.g., link-level congestion or node overload), the CSM orchestrates live VM migration, alternate path routing, or controller failover to maintain liveness and safety properties. This design aims to shield user-facing applications from the unreliability of the substrate, providing the strongest QoS guarantees achievable in context.

Controller Design for Resilience

The SDN control plane is a pivotal enabler of resilience. The architecture employs distributed, geographically dispersed controllers organized in a modular, multi-layered structure to support consistent reconfiguration, controller replication, and programmable fault tolerance. Figure 2

Figure 2: Controller software layers, including modular abstractions for failure detection, consensus, and reconfiguration algorithms.

Structuring the controller in explicit layers (L1–L4) abstracts critical distributed algorithms—failure detection, consensus (e.g., Paxos, uniform/approximate/real-time), and state replication—from higher-level orchestration scripts. This modularity facilitates re-use, composition, and adaptation: for example, controllers can choose consensus algorithms with the required strength (strong vs. eventual consistency) tailored to the ongoing reconfiguration scenario or managed slice.

Controllers interact via unreliable links, requiring their protocols to tolerate network partitions and repeated component failures. Strong safety properties such as atomic topology updates or leader election are supported where required, while weaker synchronisation suffices for less critical monitoring or auxiliary functions.

Research Challenges and Open Questions

The integration of virtualization and SDN into inherently unreliable, distributed community clouds surfaces several unresolved research challenges:

  • Distributed SDN Control Plane: Implementing logically centralised control across unreliable links and heterogeneous resources presents open questions in scalability, consensus protocol design, and error containment.
  • Dynamic Fault-Tolerance Mechanisms: Traditional failover, replication, and monitoring techniques must be re-conceptualized for environments where nodes, controllers, and VMs can be live-migrated or instantiated on-demand.
  • Structuring for Resilience: Error containment and propagation analysis, leveraging structured separation between normal and abnormal operations, is required to prevent cascading failures and quantify robustness improvements.
  • Verification and Formal Modelling: The reactive system nature of controller-slice-vmonitor interactions complicates formal reasoning about correctness, especially for coordinated distributed reconfiguration in the presence of faults.

Addressing these challenges necessitates development beyond current practice, particularly in protocol design, network topology visualization for state introspection, and algorithm libraries for adaptive deployment scenarios.

Implications and Future Directions

The architectural vision set forth in this work provides a foundation for resilient, decentralised, community-owned clouds that can serve critical applications with best-effort QoS in unreliable environments. The practical implications extend to localized service autonomy, privacy, and tailored user-driven application deployments, with particular relevance for regions underserved by commercial providers.

From a theoretical perspective, this work calls for a re-examination of distributed coordination and resilience concepts in the context of virtualized, highly heterogeneous substrate networks. Progress in this domain could broaden the applicability of decentralised clouds to edge computing, IoT, and smart city infrastructure, with potentially systemic impacts on networked system design and digital sovereignty.

Conclusion

This paper advances the architectural state of the art in community cloud computing by delineating how modern SDN, NFV, and containerization technologies can be systematically integrated to compensate for the unreliability and heterogeneity inherent in community networks. The resilience mechanisms and layered controller abstractions provide a systematic framework for further research and implementation. While numerous open technical challenges remain—especially in the realization of scalable, adaptive, and formally robust control planes—this vision is a significant contribution toward decentralised, participatory cloud infrastructure for resilient communities.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.