HeDA: An Intelligent Agent System for Heatwave Risk Discovery through Automated Knowledge Graph Construction and Multi-layer Risk Propagation Analysis

Published 29 Sep 2025 in cs.AI and cs.MA | (2509.25112v1)

Abstract: Heatwaves pose complex cascading risks across interconnected climate, social, and economic systems, but knowledge fragmentation in scientific literature hinders comprehensive understanding of these risk pathways. We introduce HeDA (Heatwave Discovery Agent), an intelligent multi-agent system designed for automated scientific discovery through knowledge graph construction and multi-layer risk propagation analysis. HeDA processes over 10,247 academic papers to construct a comprehensive knowledge graph with 23,156 nodes and 89,472 relationships, employing novel multi-layer risk propagation analysis to systematically identify overlooked risk transmission pathways. Our system achieves 78.9% accuracy on complex question-answering tasks, outperforming state-of-the-art baselines including GPT-4 by 13.7%. Critically, HeDA successfully discovered five previously unidentified high-impact risk chains, such as the pathway where a heatwave leads to a water demand surge, resulting in industrial water restrictions and ultimately causing small business disruption, which were validated through historical case studies and domain expert review. This work presents a new paradigm for AI-driven scientific discovery, providing actionable insights for developing more resilient climate adaptation strategies.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a multi-agent AI system that constructs extensive knowledge graphs from over 10,000 papers to analyze heatwave risks.
It demonstrates robust performance with 78.9% QA accuracy and identifies novel multi-layer risk propagation chains across physical, social, and economic domains.
The methodology’s integration of multi-layer propagation and a mathematical novelty score provides actionable insights for climate adaptation policies.

HeDA: An Intelligent Agent System for Automated Heatwave Risk Discovery

Introduction

HeDA (Heatwave Discovery Agent) presents a comprehensive, multi-agent AI system for automated scientific discovery in the domain of heatwave risk analysis. The system addresses the challenge of fragmented knowledge across climate, social, and economic domains by constructing a large-scale, high-fidelity knowledge graph from over 10,000 academic papers and applying a mathematically formalized multi-layer risk propagation analysis. HeDA is designed to autonomously identify previously overlooked risk transmission pathways, providing actionable insights for climate adaptation policy and advancing the methodological state-of-the-art in AI-driven scientific discovery.

Figure 1: HeDA Multi-Agent System Architecture. The Master Agent coordinates specialized sub-agents through a sophisticated workflow management system with checkpoint recovery and error handling capabilities.

System Architecture and Methodology

Multi-Agent System Design

HeDA employs a hierarchical multi-agent architecture, orchestrated by a Master Agent that coordinates four specialized sub-agents: Data Processing, Knowledge Graph, QA Engine, and Evaluation. The architecture supports dynamic task scheduling, checkpoint management, and adaptive error recovery, enabling robust, scalable processing of large literature corpora.

Data Processing Agent: Executes a seven-stage pipeline for relation extraction, entity standardization, semantic clustering, and canonicalization, leveraging LLMs and FAISS-based vector indexing.
Knowledge Graph Agent: Manages Neo4j-based graph construction, schema enforcement, and index optimization.
QA Engine Agent: Implements hybrid KGQA, combining semantic similarity, Cypher-based multi-hop retrieval, and LLM-based response generation with provenance tracking.
Evaluation Agent: Conducts automated benchmarking, ablation studies, and failure mode analysis.

Automated Knowledge Graph Construction

HeDA processes 10,247 academic papers, extracting 127,834 triplets and constructing a knowledge graph with 23,156 nodes and 89,472 relationships. Entity standardization achieves 91.3% clustering accuracy, and relationship extraction maintains 87.6% precision, validated by expert review.

Figure 2: Distribution of entities across Physical, Social, and Economic layers in the HeDA knowledge graph, showing the relative representation of different risk domains.

Multi-layer Risk Propagation Analysis

Entities are categorized into Physical, Social, and Economic layers, enabling systematic cross-domain risk analysis. The core algorithm employs constrained BFS to discover multi-hop pathways traversing these layers, with a mathematically defined novelty score:

$\text{NoveltyScore}(P) = \alpha \cdot \text{LF}(P) + \beta \cdot \text{CLC}(P) + \gamma \cdot \text{IP}(P)$

where $\text{LF}(P)$ penalizes literature frequency, $\text{CLC}(P)$ rewards cross-layer transitions, and $\text{IP}(P)$ incorporates centrality and severity. Parameters $(\alpha, \beta, \gamma) = (0.5, 0.3, 0.2)$ are empirically determined. The algorithm ensures completeness up to a maximum path length, with early termination and parallelization for scalability.

Empirical Results

Knowledge Graph QA and Multi-hop Reasoning

HeDA achieves 78.9% accuracy on a 500-question benchmark, outperforming GPT-4 by 13.7% and other KGQA baselines by 4.8–12.5%. The system demonstrates robust performance on complex multi-hop queries, with 2-hop, 3-hop, and 4+ hop accuracies of 84.7%, 76.3%, and 68.9%, respectively, compared to significant degradation in baseline systems.

Figure 3: Performance comparison across different question complexity levels, showing HeDA's consistent superiority over baseline methods, particularly for complex multi-hop queries requiring cross-domain reasoning.

Ablation studies confirm the critical contribution of multi-layer analysis (+4.7% accuracy), entity standardization (+7.6%), and master agent orchestration (+9.1%).

Discovery of Novel Risk Propagation Chains

HeDA identifies five high-impact, previously undocumented risk chains with novelty scores >0.75 and literature frequencies <0.05%. These include:

Urban Water-Industrial Cascade: Heatwave → Water demand surge → Industrial water restrictions → Small business disruption → Economic instability.
Transportation-Supply Disruption: Extreme heat → Railway deformation → Freight delays → Supply chain disruption → Food price volatility.
Energy-Healthcare Vulnerability: High temperatures → Peak electricity demand → Grid instability → Hospital equipment failures → Critical care interruption.
Agricultural-Migration Pressure: Prolonged heat → Crop yield reduction → Rural income decline → Urban migration → Housing market strain.
Educational-Productivity Impact: School heat exposure → Reduced cognitive performance → Achievement gaps → Productivity impacts → Intergenerational inequality.

Validation combines quantitative literature analysis, expert review (Cronbach’s $\alpha = 0.82$ ), and historical case study verification (correlation $r > 0.65$ for three major heatwave events).

Figure 4: Cross-layer risk propagation network showing discovered pathways connecting Physical (blue), Social (green), and Economic (red) layers. Node sizes represent centrality scores, and edge thickness indicates relationship strength.

Temporal Risk Propagation Patterns

Temporal analysis reveals phase-dependent risk propagation: physical impacts dominate acutely (0–3 days), social impacts peak subacutely (3–14 days), and economic impacts intensify chronically (14+ days), supporting the theoretical framework of cascading risk transmission.

System Performance and Scalability

HeDA processes 2.3 papers/minute with linear scaling to CPU cores, maintains sub-5s query response times for 95% of queries, and achieves 89% automatic recovery from processing failures. Memory and computational requirements are significant (32-core CPU, 64GB RAM), with scalability currently limited to ~10,000 papers.

Discussion

Scientific and Policy Implications

HeDA demonstrates that autonomous AI systems can systematically uncover latent, cross-domain risk pathways that are not apparent through traditional literature review or sectoral analysis. The identification of risk chains with low literature frequency but high real-world impact highlights critical gaps in current adaptation strategies and underscores the necessity of integrated, cross-sectoral policy frameworks.

The mathematical formalization of novelty and cross-layer connectivity provides a reproducible methodology for vulnerability discovery in complex systems, addressing the challenge of "unknown unknowns" in climate risk analysis.

Methodological Innovations

Key innovations include autonomous multi-agent orchestration, multi-stage knowledge extraction validation, scalable parallel processing, and interpretable, provenance-tracked results. The explicit modeling of cross-domain risk propagation via layer mapping and indicator functions operationalizes cascading risk analysis in a mathematically rigorous manner.

Limitations

HeDA’s reliance on English-language, peer-reviewed literature introduces geographic and temporal biases, with underrepresentation of Global South contexts and recent developments. The system identifies correlations, not causality, and lacks probabilistic risk quantification. Validation is limited by the number of experts and historical events analyzed. Scalability to larger corpora requires further algorithmic and infrastructure enhancements.

Future Directions

Future work should integrate causal inference (e.g., do-calculus), probabilistic risk modeling (Bayesian networks), and multi-modal data sources (satellite, sensor, policy documents). Real-time, adaptive systems and cross-domain generalization to other hazards (pandemics, technological risks) are promising extensions.

Conclusion

HeDA establishes a new paradigm for AI-driven scientific discovery, demonstrating that autonomous agent systems can systematically synthesize fragmented scientific knowledge and identify critical, previously overlooked risk pathways. The system’s robust multi-agent architecture, mathematically grounded risk propagation analysis, and empirical validation across technical and scientific dimensions position it as a foundational tool for integrated climate risk assessment and policy development. The methodological advances and demonstrated generalizability of HeDA have immediate implications for a broad range of complex, interconnected risk domains, marking a significant step toward autonomous, evidence-based scientific discovery and decision support.

Markdown Report Issue