Comparative evaluation of agent architectures on CTI-REALM
Determine the comparative performance of multiple agent frameworks, including plan-and-execute and tree-of-thought architectures, on the CTI-REALM benchmark tasks to assess how agent framework choice impacts detection engineering outcomes.
References
We use a single agent architecture to isolate model capability differences under controlled conditions; comparing multiple agent frameworks (e.g., plan-and-execute, tree-of-thought) is left to future work.
— CTI-REALM: Benchmark to Evaluate Agent Performance on Security Detection Rule Generation Capabilities
(2603.13517 - Chakraborty et al., 13 Mar 2026) in Experimental Setup, Agent Architecture