Unresolved challenges in AI-driven cybersecurity

Investigate and resolve four unresolved problems in AI-driven cybersecurity: (1) establish scalable methods for attack graph generation that can operate beyond manual curation or static approaches to handle dynamic, large-scale environments; (2) develop standardized, gold-standard datasets and benchmarks to rigorously evaluate large language models’ ability to understand and model cybersecurity exercises, including LLM-driven attack graph generation and reasoning; (3) integrate and validate game-theoretic frameworks (such as Cut-the-Rope) with LLM-based automation in practical cybersecurity tooling; and (4) design automated systems and workflows that keep pace with rapidly evolving AI-driven cybersecurity tasks while maintaining accuracy and interpretability, thereby reducing reliance on manual human annotation.

Background

The paper surveys the current landscape of AI-based penetration testing and attack graph analysis, noting that existing systems either produce overwhelming unstructured outputs or require manually constructed models for game-theoretic analysis. The authors identify a set of pressing gaps that hinder end-to-end integration of AI automation with strategic reasoning: scaling attack graph generation, establishing rigorous evaluation datasets and benchmarks for LLMs in cybersecurity, fusing game-theoretic models with LLM-driven tooling, and closing the workflow gap between rapid AI operations and human annotation capabilities.

These unresolved issues motivate the proposed Generative Cut-the-Rope (G-CTR) framework, which automatically extracts attack graphs from AI security logs, computes Nash equilibria, and feeds strategic digests back into agent prompts. Addressing the enumerated challenges would enable more reliable and strategically informed AI-driven security operations across both red-team and blue-team contexts.

References

While AI and LLMs have seen growing adoption in cybersecurity, especially in automating penetration testing, several critical challenges remain unresolved:

  • Limited Scalability of Attack Graphs. Existing attack graph methodologies rely heavily on manual curation or static generation approaches, which struggle to scale with the complexity and dynamism of modern network environments. This limits their practical use in continuous, large-scale cybersecurity operations.
  • Lack of Comprehensive Evaluation of LLMs in Cybersecurity. Despite the rapid development of LLMs, their capabilities for understanding and modeling cybersecurity exercises remain poorly characterized. There is an absence of standardized, gold-standard datasets or benchmarks to rigorously assess LLM-driven attack graph generation and reasoning.
  • Insufficient Integration of Game-Theoretic Models with AI Automation. Game theory offers powerful frameworks for risk assessment and strategic defense in cybersecurity, yet its fusion with LLM-based automation has not been thoroughly explored or validated in practical tooling.
  • Gap Between Fast-Evolving AI Capabilities and Human Annotation Workflows. The accelerating pace of AI-driven cybersecurity tasks challenges traditional human annotation and analysis methods, creating a need for automated systems that can keep up without sacrificing accuracy or interpretability.
Cybersecurity AI: A Game-Theoretic AI for Guiding Attack and Defense  (2601.05887 - Mayoral-Vilches et al., 9 Jan 2026) in State of the Art subsection