Interactive Learning from Activity Description

Published 13 Feb 2021 in cs.CL, cs.AI, cs.HC, and cs.LG | (2102.07024v2)

Abstract: We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities. Unlike imitation learning (IL), our protocol allows the teaching agent to provide feedback in a language that is most appropriate for them. Compared with reward in reinforcement learning (RL), the description feedback is richer and allows for improved sample complexity. We develop a probabilistic framework and an algorithm that practically implements our protocol. Empirical results in two challenging request-fulfilling problems demonstrate the strengths of our approach: compared with RL baselines, it is more sample-efficient; compared with IL baselines, it achieves competitive success rates without requiring the teaching agent to be able to demonstrate the desired behavior using the learning agent's actions. Apart from empirical evaluation, we also provide theoretical guarantees for our algorithm under certain assumptions about the teacher and the environment.

Abstract PDF Upgrade to Chat

Citations (34)

View on Semantic Scholar

Summary

The paper introduces Iliad, a novel protocol that trains agents using only verbal descriptions of activities, bypassing the need for explicit demonstrations or reward functions.
The Activity-Description Explorative Learner (Adel) algorithm is presented to operationalize Iliad, addressing challenges in generating useful executions and grounding language descriptions into tasks.
Empirical evaluations show the method is more sample-efficient than reinforcement learning and competitive with imitation learning, demonstrating the potential of language-based feedback.

Overview of "Interactive Learning from Activity Description"

The paper "Interactive Learning from Activity Description" introduces a novel protocol aimed at training request-fulfilling agents through a system exclusively relying on verbal descriptions of activities. This protocol is termed as Interactive Learning from Activity Description (Iliad), and it represents a departure from traditional learning mechanisms such as Imitation Learning (IL), which relies on explicit demonstrations, and Reinforcement Learning (RL), which requires a defined reward function.

Key Contributions

Learning Protocol: The Iliad protocol enables training without direct demonstrations or reward functions. Instead, it utilizes natural language descriptions provided by a teacher, making the training process accessible even when precise control over the agent or the definition of a reward function is not feasible.
Algorithmic Framework: The paper introduces an algorithm called Activity-Description Explorative Learner (Adel) which operationalizes the Iliad protocol. This algorithm addresses two main challenges:
- Exploration Problem: How to generate effective executions that yield useful descriptions.
- Grounding Problem: How to ground activity descriptions effectively into actionable tasks.
Empirical and Theoretical Analysis: The authors provide empirical results indicating that their approach is more sample-efficient compared to RL and competitive with IL, especially when collecting ground-truth demonstrations is challenging. Additionally, they offer theoretical guarantees for the convergence of their algorithm under specific conditions.

Methodology

The approach leverages a probabilistic framework where the agent policy iteratively improves by sampling executions based on a defined mixture of approximations. The execution samples are then translated into activity descriptions through a probabilistic model, which are used to update the agent's policy. This differs significantly from conventional methods which require either labeled data or a predefined reward function.

Empirical Evaluation

Adel was tested on two tasks: vision-language navigation and word modification using regular expressions. The empirical evaluation showed that the algorithm significantly outperforms RL baselines in sample efficiency and policy quality, while achieving success rates close to those of the IL baselines. These findings highlight the potential of language-based feedback as a rich and informative medium of instruction.

Implications and Future Directions

This work has substantial implications for learning in environments where human feedback is more intuitively given in language rather than explicit demonstrations or scalar rewards. It opens a pathway for utilizing richer, context-sensitive feedback in agent training. Future research could explore refining language understanding in agents, optimizing learning in even more complex environments, and addressing scalability limitations when deploying such protocols in real-world scenarios.

In conclusion, while not claiming superiority over all existing methods, the presented approach offers a compelling alternative for situations where direct demonstrations are infeasible and reward definition is challenging, thus broadening the applicability of interactive learning systems in AI.

Markdown Report Issue