Papers
Topics
Authors
Recent
Search
2000 character limit reached

Optimality-based reward learning with applications to toxicology

Published 5 Apr 2024 in stat.ME | (2404.04406v1)

Abstract: In toxicology research, experiments are often conducted to determine the effect of toxicant exposure on the behavior of mice, where mice are randomized to receive the toxicant or not. In particular, in fixed interval experiments, one provides a mouse reinforcers (e.g., a food pellet), contingent upon some action taken by the mouse (e.g., a press of a lever), but the reinforcers are only provided after fixed time intervals. Often, to analyze fixed interval experiments, one specifies and then estimates the conditional state-action distribution (e.g., using an ANOVA). This existing approach, which in the reinforcement learning framework would be called modeling the mouse's "behavioral policy," is sensitive to misspecification. It is likely that any model for the behavioral policy is misspecified; a mapping from a mouse's exposure to their actions can be highly complex. In this work, we avoid specifying the behavioral policy by instead learning the mouse's reward function. Specifying a reward function is as challenging as specifying a behavioral policy, but we propose a novel approach that incorporates knowledge of the optimal behavior, which is often known to the experimenter, to avoid specifying the reward function itself. In particular, we define the reward as a divergence of the mouse's actions from optimality, where the representations of the action and optimality can be arbitrarily complex. The parameters of the reward function then serve as a measure of the mouse's tolerance for divergence from optimality, which is a novel summary of the impact of the exposure. The parameter itself is scalar, and the proposed objective function is differentiable, allowing us to benefit from typical results on consistency of parametric estimators while making very few assumptions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Takeshi Amemiya. Advanced econometrics. Harvard University Press, 1985.
  2. Dynamic inverse reinforcement learning for characterizing animal behavior. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=nosngu5XwY9.
  3. PB Dews. Studies on responding under fixed-interval schedules of reinforcement: Ii. the scalloped pattern of the cumulative record. Journal of the Experimental Analysis of Behavior, 29(1):67–75, 1978.
  4. Neonatal exposure to ultrafine iron but not combined iron and sulfur aerosols recapitulates air pollution-induced impulsivity in mice. NeuroToxicology, 94:191–205, 2023.
  5. Schedules of reinforcement. 1957.
  6. John Gibbon. Scalar expectancy theory and weber’s law in animal timing. Psychological review, 84(3):279, 1977.
  7. Steven G Gilbert. A small dose of toxicology: the health effects of common chemicals. CRC Press, 2004.
  8. William Huber. Distribution of minimizer of sum of squared pairwise differences. Cross Validated, 2022. URL https://stats.stackexchange.com/q/579953. URL:https://stats.stackexchange.com/q/579953 (version: 2022-10-07).
  9. Armando Machado. Learning the temporal dynamics of behavior. Psychological review, 104(2):241, 1997.
  10. Whitney K Newey. Uniform convergence in probability and stochastic equicontinuity. Econometrica: Journal of the Econometric Society, pages 1161–1167, 1991.
  11. Yael Niv. Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3):139–154, 2009.
  12. Stuart Russell. Learning agents for uncertain environments. In Proceedings of the eleventh annual conference on Computational learning theory, pages 101–103, 1998.
  13. Henry Scheffe. The analysis of variance, volume 72. John Wiley & Sons, 1999.
  14. Bruce A Schneider. A two-state analysis of fixed-interval responding in the pigeon 1. Journal of the Experimental Analysis of Behavior, 12(5):677–687, 1969.
  15. Inverse optimal control adapted to the noise characteristics of the human sensorimotor system. Advances in Neural Information Processing Systems, 34:9429–9442, 2021.
  16. BF Skinner. The behavior of organisms: an experimental analysis. 1938.
  17. Sex-specific enhanced behavioral toxicity induced by maternal exposure to a mixture of low dose endocrine-disrupting chemicals. Neurotoxicology, 45:121–130, 2014.
  18. Reinforcement learning: An introduction. 2018.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.