2000 character limit reached
Enhancing RL Safety with Counterfactual LLM Reasoning
Published 16 Sep 2024 in cs.LG | (2409.10188v1)
Abstract: Reinforcement learning (RL) policies may exhibit unsafe behavior and are hard to explain. We use counterfactual LLM reasoning to enhance RL policy safety post-training. We show that our approach improves and helps to explain the RL policy safety.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.