Pluralistic alignment: satisfying diverse moral preferences

Develop methods for building artificial agents that satisfy the moral preferences of a wide range of individuals, operationalizing pluralistic alignment, for example by integrating multiple moral value signals within a single learning agent.

Background

The authors propose intrinsic moral rewards for aligning LLM agents and note that multi-objective formulations could encode multiple moral values within a single agent.

They explicitly state that building agents capable of satisfying a broad diversity of moral preferences remains an open problem in alignment, suggesting their approach as a possible direction.

References

This may provide a promising direction for building agents that are able to satisfy the moral preferences of a wide range of individuals, which currently remains an open problem in alignment \citep{anwar2024foundational,ji2024aialignmentcomprehensivesurvey}.

— Moral Alignment for LLM Agents (2410.01639 - Tennant et al., 2024) in Section 6 (Discussion)

Pluralistic alignment: satisfying diverse moral preferences

Background

References

Related Problems