Papers
Topics
Authors
Recent
Search
2000 character limit reached

Avoiding Confusion between Predictors and Inhibitors in Value Function Approximation

Published 19 Dec 2013 in cs.AI | (1312.5714v2)

Abstract: In reinforcement learning, the goal is to seek rewards and avoid punishments. A simple scalar captures the value of a state or of taking an action, where expected future rewards increase and punishments decrease this quantity. Naturally an agent should learn to predict this quantity to take beneficial actions, and many value function approximators exist for this purpose. In the present work, however, we show how value function approximators can cause confusion between predictors of an outcome of one valence (e.g., a signal of reward) and the inhibitor of the opposite valence (e.g., a signal canceling expectation of punishment). We show this to be a problem for both linear and non-linear value function approximators, especially when the amount of data (or experience) is limited. We propose and evaluate a simple resolution: to instead predict reward and punishment values separately, and rectify and add them to get the value needed for decision making. We evaluate several function approximators in this slightly different value function approximation architecture and show that this approach is able to circumvent the confusion and thereby achieve lower value-prediction errors.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.