Capabilities Needed for Deploying Agents in Extended Real-World Workflows
Identify the specific capabilities required for deploying AI agent systems in real-world settings such that they can complete tasks independently or effectively support humans across extended workflows.
References
Agent benchmarks therefore more directly probe the capabilities needed for deploying AI systems that can complete tasks independently or support humans across extended workflows—an important open question for real-world use.
— How Well Does Agent Development Reflect Real-World Work?
(2603.01203 - Wang et al., 1 Mar 2026) in Appendix B, Agentic Benchmark Selection