Continuous Benchmarking of AI-Driven Political Persuasion Over Time

Develop continuous benchmarking frameworks to track how AI-driven political persuasion evolves over time as frontier large language model capabilities advance.

Background

The paper documents rapid capability increases in frontier LLMs and shows substantial persuasive effects across multiple models and issues. Given the pace of model improvement, the authors emphasize that one open extension is to continuously monitor how political persuasion changes as models advance.

They explicitly call for ongoing benchmarking to capture temporal dynamics in AI-mediated persuasion, which would inform risk assessment and policy responses as new generations of models are deployed.

References

Besides the limitations we discuss above, several extensions remain open. Third, as model capabilities advance, continuous benchmarking will be necessary to track how AI-driven political persuasion evolves over time.

Benchmarking Political Persuasion Risks Across Frontier Large Language Models  (2603.09884 - Chen et al., 10 Mar 2026) in Conclusion