Specifying steering targets and decision authority in pluralistic alignment

Determine principled procedures for specifying the target behaviors that large language model steering should aim to achieve in value‑pluralistic settings and establish who should be responsible for deciding these steering targets.

Background

The paper discusses ethical considerations of providing tools for steering generative models, noting that steerability is considered useful for value‑pluralistic systems. However, the authors point out a lack of consensus on how steering targets should be specified and who should make these decisions.

Clarifying both the methodology for specifying targets and the governance of these decisions is necessary to ensure responsible and transparent deployment of steering in real-world settings.

References

Relatedly, steerability is generally considered to be a reasonable target for creating value pluralistic systems, but its not yet entirely clear in the community how the steering target should be specified (nor by who).

— AI Steerability 360: A Toolkit for Steering Large Language Models (2603.07837 - Miehling et al., 8 Mar 2026) in Section: Ethical Considerations.

Specifying steering targets and decision authority in pluralistic alignment

Background

References

Related Problems