STAMP/STPA informed characterization of Factors Leading to Loss of Control in AI Systems
Abstract: A major concern amongst AI safety practitioners is the possibility of loss of control, whereby humans lose the ability to exert control over increasingly advanced AI systems. The range of concerns is wide, spanning current day risks to future existential risks, and a range of loss of control pathways from rapid AI self-exfiltration scenarios to more gradual disempowerment scenarios. In this work we set out to firstly, provide a more structured framework for discussing and characterizing loss of control and secondly, to use this framework to assist those responsible for the safe operation of AI-containing socio-technical systems to identify causal factors leading to loss of control. We explore how these two needs can be better met by making use of a methodology developed within the safety-critical systems community known as STAMP and its associated hazard analysis technique of STPA. We select the STAMP methodology primarily because it is based around a world-view that socio-technical systems can be functionally modeled as control structures, and that safety issues arise when there is a loss of control in these structures.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.