- The paper demonstrates that inducing verifiable programmatic skills significantly improves agent performance, with up to 23.5% higher success rates over static approaches.
- The ASI method leverages executable program representations to ensure skill correctness and composability, enabling efficient adaptation across diverse web tasks.
- Experimental results on the WebArena benchmark validate ASI’s effectiveness in reducing task steps, supporting scaled-up activities, and generalizing across websites.
Inducing Programmatic Skills for Agentic Tasks
This paper addresses the challenge of enabling agents to perform specialized digital tasks, such as web navigation, by inducing programmatic skills that adapt to various environments. The proposed method, Agent Skill Induction (ASI), demonstrates improved success rates and efficiency compared to existing approaches, particularly by utilizing executable programs as skill representations.
Overview of ASI
ASI serves as a dynamic mechanism for agents to learn and apply skills during web task interactions. These skills are represented as executable programs, allowing for verification during the induction phase. ASI's programmatic approach provides significant advantages over text-based skill representations by ensuring skill correctness and composability, contributing to a 23.5% and 11.3% improvement in success rate over static and text-skill agents, respectively.
Figure 1: Inducing programmatic skills and rewriting the trajectory from an episode.
The ASI framework operates by first generating action trajectories from natural language queries. It then induces higher-level programmatic skills, such as search_product(name), through a verification process that ensures their functional validity. These verified skills are integrated into the agent's action space, enabling more efficient task resolution in future interactions.
Experimental Evaluation
WebArena Benchmark
The WebArena benchmark is employed to evaluate ASI's performance, involving a variety of web navigation tasks across different domains. ASI outperforms both static and adaptive agents by leveraging its programmatic skill induction, which streamlines task-solving procedures by abstracting complex actions into concise programmatic calls.
Scaled-Up Activities
In scenarios involving extended task sequences, ASI maintains efficiency by reducing the steps required to complete tasks. This efficiency is particularly noted in tasks that involve repetitive procedures, where program entropy offers significant advantages over traditional text-based memory augmentation methods.
Figure 2: Example scaled-up task of updating multiple addresses on shopping website.
Cross-Website Generalization
ASI's skills effectively transfer across websites within similar domains, though some skills require adaptation to new webpage designs. The programmatic structure of skills enables ASI to quickly refine or create new skills, demonstrating flexibility and robustness in diverse web environments.
Implications and Future Work
The research highlights the potential of programmatic skills in enhancing agent efficiency and success across varied digital tasks. Future developments could explore the optimal granularity of skills, stability in online evolution, and further comparisons to human expert benchmarks. Overall, ASI contributes a significant step towards adaptive agent design, with various practical and theoretical implications in AI research.
Conclusion
ASI significantly improves web agent performance through the induction of verifiable programmatic skills, showcasing greater efficiency and adaptability in both standard and scaled-up web tasks. Its ability to generalize skills across different websites underscores the potential of programmatic representations in developing autonomous digital agents.
This research opens avenues for further exploration into the dynamics of skill acquisition and application, promising advances in the efficiency and versatility of AI agents.