Bridging Temporal and Textual Modalities: A Multimodal Framework for Automated Cloud Failure Root Cause Analysis
Abstract: Root cause analysis in modern cloud infrastructure demands sophisticated understanding of heterogeneous data sources, particularly time-series performance metrics that involve core failure signatures. While LLMs demonstrate remarkable capabilities in textual reasoning, their discrete token-based architecture creates fundamental incompatibilities with continuous numerical sequences exhibiting temporal dependencies. Current methodologies inadequately address this modality mismatch, constraining the potential of LLM-driven automation in incident management workflows. This paper presents a multimodal diagnostic framework that harmonizes time-series representations with pretrained LLM embedding spaces. Our approach contributes three technical advances: (1) a semantic compression technique that distills temporal segments into single-token abstractions while preserving pattern semantics, (2) an alignment encoder utilizing gated cross-attention to project time-series features into LLM latent space, and (3) a retrieval-augmented diagnostic pipeline that synthesizes aligned embeddings with historical incident knowledge for expert-level failure attribution. Comprehensive evaluation across six cloud system benchmarks demonstrates that our framework achieves leading performance, reaching 48.75% diagnostic accuracy with notable improvements on scenarios involving compound failure modes. The results validate embedding-space alignment as an effective strategy for enabling LLMs to reason over multimodal telemetry data in production incident response contexts.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.