Appropriateness of tool-schema-based skill learning for dialogue benchmarks

Determine whether experience learning centered on tool-schema-based skills is the most appropriate formulation for user-centric, dialogue-focused benchmarks such as τ²-Bench.

Background

The authors omit certain ablations on τ²-Bench because the benchmark is highly user-interactive with relatively simple tool schemas and substantial training coverage of task patterns.

They explicitly raise an open question about whether a tool-schema-centered, skill-based experience learning formulation is appropriate for such dialogue-centric settings.

References

More broadly, for user-centric benchmarks of this type (e.g., dialogue benchmarks), it remains an open question whether experience learning centered around tool-schema-based skills is the most appropriate formulation.

SkillX: Automatically Constructing Skill Knowledge Bases for Agents  (2604.04804 - Wang et al., 6 Apr 2026) in Ablation Study on Three Components of AutoSkills