Specifying steering targets and decision authority in pluralistic alignment
Determine principled procedures for specifying the target behaviors that large language model steering should aim to achieve in value‑pluralistic settings and establish who should be responsible for deciding these steering targets.
References
Relatedly, steerability is generally considered to be a reasonable target for creating value pluralistic systems, but its not yet entirely clear in the community how the steering target should be specified (nor by who).
— AI Steerability 360: A Toolkit for Steering Large Language Models
(2603.07837 - Miehling et al., 8 Mar 2026) in Section: Ethical Considerations.