2000 character limit reached
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Published 1 Jan 2023 in cs.CL, cs.AI, and cs.CY | (2301.00355v2)
Abstract: We present Second Thought, a new learning paradigm that enables LMs to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.