Training Compute Thresholds for Dangerous Capability Onset

Determine the minimum training compute (measured in FLOP) at which models reach particularly dangerous capability levels, to inform conservative but workable FLOP thresholds for approvals, monitoring, and prohibitions under an international ASI-prevention agreement.

Background

The agreement operationalizes training limits with FLOP thresholds, but the authors acknowledge that the capability level at which systems become dangerously capable is unknown. Setting thresholds too high creates catastrophic risk; setting them too low imposes unnecessary costs.

Clarifying capability-onset thresholds would allow policymakers to set limits that better balance safety and utility, and reduce reliance on precautionary margins necessitated by current uncertainty.

References

First, nobody knows what scale of training would reach particularly dangerous AI capability levels; given this uncertainty and the massive negative effect of setting thresholds too high, we suggest a conservative approach.

— An International Agreement to Prevent the Premature Creation of Artificial Superintelligence (2511.10783 - Scher et al., 13 Nov 2025) in Section: The Agreement — AI Training thresholds discussion

Training Compute Thresholds for Dangerous Capability Onset

Background

References

Related Problems