Resilient Microservices: A Systematic Review of Recovery Patterns, Strategies, and Evaluation Frameworks
Abstract: Microservice based systems underpin modern distributed computing environments but remain vulnerable to partial failures, cascading timeouts, and inconsistent recovery behavior. Although numerous resilience and recovery patterns have been proposed, existing surveys are largely descriptive and lack systematic evidence synthesis or quantitative rigor. This paper presents a PRISMA aligned systematic literature review of empirical studies on microservice recovery strategies published between 2014 and 2025 across IEEE Xplore, ACM Digital Library, and Scopus. From an initial corpus of 412 records, 26 high quality studies were selected using transparent inclusion, exclusion, and quality assessment criteria. The review identifies nine recurring resilience themes encompassing circuit breakers, retries with jitter and budgets, sagas with compensation, idempotency, bulkheads, adaptive backpressure, observability, and chaos validation. As a data oriented contribution, the paper introduces a Recovery Pattern Taxonomy, a Resilience Evaluation Score checklist for standardized benchmarking, and a constraint aware decision matrix mapping latency, consistency, and cost trade offs to appropriate recovery mechanisms. The results consolidate fragmented resilience research into a structured and analyzable evidence base that supports reproducible evaluation and informed design of fault tolerant and performance aware microservice systems.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.