Create a Video View Paper

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

This presentation explores SkillRL, a novel framework that transforms how Large Language Model agents learn from experience. Instead of storing raw interaction histories or discarding failures, SkillRL distills both successes and failures into compact, reusable skills organized in a hierarchical library. Through recursive co-evolution—where skills dynamically expand alongside policy learning—the framework achieves dramatic efficiency gains and superior performance across embodied and web-based environments, enabling smaller open-source models to rival or exceed closed-source alternatives on complex multi-step reasoning tasks.

Script

What if AI agents could learn not just from success, but distill wisdom from every failure, building a living library of reusable strategies that evolves alongside their performance? That's the promise of SkillRL, a framework that reimagines how language model agents transform raw experience into generalizable knowledge.

To understand why this matters, let's first examine the limitations agents face today.

Building on that challenge, current approaches fall into two camps, both flawed. Memory-based methods drown in verbose, redundant trajectories, while vanilla reinforcement learning throws away the very failures that could teach the most, leaving agents without structured pathways to transfer knowledge across tasks.

SkillRL bridges this gap with a fundamentally different architecture.

The key insight is experience-based skill distillation. A teacher model transforms raw trajectories—including failures—into compact skills organized hierarchically: general strategies applicable everywhere, and task-specific guides for nuanced scenarios. These skills then co-evolve recursively with the policy itself, creating a dynamic curriculum that adapts as new failure modes emerge during reinforcement learning.

The resulting SkillBank divides knowledge into two complementary layers. General skills provide universally applicable heuristics for exploration and verification, while task-specific skills capture the procedural nuances and failure patterns unique to particular environments, enabling compositional reuse with minimal redundancy.

This diagram reveals the complete learning cycle. The framework collects trajectories, distills them into hierarchical skills, performs an initial supervised fine-tuning phase to teach the policy how to use these skills, then enters the core loop: reinforcement learning coupled with validation checkpoints that trigger targeted skill refinement whenever failures expose knowledge gaps, ensuring the skill library stays synchronized with the evolving policy frontier.

Now let's examine how this translates to real-world performance gains.

The results are compelling across three diverse benchmark suites. On embodied manipulation tasks in ALFWorld, SkillRL achieves nearly 90 percent success, surpassing even closed-source large language models despite using smaller open-source models. Critically, the framework compresses context by 10 to 20 fold compared to raw memory retrieval, enabling richer reasoning within fixed context windows while maintaining superior performance.

Ablation studies reveal the architecture is tightly integrated: removing the hierarchy or skipping the supervised bootstrapping phase each degrades performance substantially, while the recursive evolution mechanism provides an additional boost by preventing knowledge staleness, enabling the agent to converge faster and reach higher asymptotic performance on complex multi-step tasks.

SkillRL demonstrates that abstraction is the key to agentic intelligence—transforming every experience, success or failure, into reusable strategic knowledge that evolves in lockstep with learning itself. To dive deeper into this framework and explore how recursive skill evolution could reshape Large Language Model agent design, visit EmergentMind.com.