Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Evaluation of Multi-Task Robot Policies With Active Experiment Selection

Published 14 Feb 2025 in cs.RO, cs.AI, and cs.LG | (2502.09829v1)

Abstract: Evaluating learned robot control policies to determine their physical task-level capabilities costs experimenter time and effort. The growing number of policies and tasks exacerbates this issue. It is impractical to test every policy on every task multiple times; each trial requires a manual environment reset, and each task change involves re-arranging objects or even changing robots. Naively selecting a random subset of tasks and policies to evaluate is a high-cost solution with unreliable, incomplete results. In this work, we formulate robot evaluation as an active testing problem. We propose to model the distribution of robot performance across all tasks and policies as we sequentially execute experiments. Tasks often share similarities that can reveal potential relationships in policy behavior, and we show that natural language is a useful prior in modeling these relationships between tasks. We then leverage this formulation to reduce the experimenter effort by using a cost-aware expected information gain heuristic to efficiently select informative trials. Our framework accommodates both continuous and discrete performance outcomes. We conduct experiments on existing evaluation data from real robots and simulations. By prioritizing informative trials, our framework reduces the cost of calculating evaluation metrics for robot policies across many tasks.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 103 likes about this paper.