PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System

Published 29 Sep 2025 in cs.AI | (2509.24855v1)

Abstract: Physics is central to understanding and shaping the real world, and the ability to solve physics problems is a key indicator of real-world physical intelligence. Physics Olympiads, renowned as the crown of competitive physics, provide a rigorous testbed requiring complex reasoning and deep multimodal understanding, yet they remain largely underexplored in AI research. Existing approaches are predominantly single-model based, and open-source MLLMs rarely reach gold-medal-level performance. To address this gap, we propose PhysicsMinions, a coevolutionary multi-agent system for Physics Olympiad. Its architecture features three synergistic studios: a Visual Studio to interpret diagrams, a Logic Studio to formulate solutions, and a Review Studio to perform dual-stage verification. The system coevolves through an iterative refinement loop where feedback from the Review Studio continuously guides the Logic Studio, enabling the system to self-correct and converge towards the ground truth. Evaluated on the HiPhO benchmark spanning 7 latest physics Olympiads, PhysicsMinions delivers three major breakthroughs: (i) Strong generalization: it consistently improves both open-source and closed-source models of different sizes, delivering clear benefits over their single-model baselines; (ii) Historic breakthroughs: it elevates open-source models from only 1-2 to 6 gold medals across 7 Olympiads, achieving the first-ever open-source gold medal in the latest International Physics Olympiad (IPhO) under the average-score metric; and (iii) Scaling to human expert: it further advances the open-source Pass@32 score to 26.8/30 points on the latest IPhO, ranking 4th of 406 contestants and far surpassing the top single-model score of 22.7 (ranked 22nd). Generally, PhysicsMinions offers a generalizable framework for Olympiad-level problem solving, with the potential to extend across disciplines.

Abstract PDF Upgrade to Chat

Summary

The paper presents PHYSICSMINIONS, a coevolutionary multimodal multi-agent system that integrates Visual, Logic, and Review Studios to tackle complex physics Olympiad challenges.
It achieves a remarkable open-source Pass@32 score of 26.8 out of 30 and ranks 4th among 406 competitors, setting a new performance benchmark.
The system leverages iterative refinement and dual-stage verification to outperform single-model baselines and existing frameworks.

PHYSICSMINIONS: ADVANCING MULTI-AGENT SYSTEMS FOR PHYSICS OLYMPIAD CHALLENGES

Introduction

The paper "PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System" (2509.24855) presents a novel approach to overcoming the limitations of single-model systems in solving complex physics Olympiad problems. Physics Olympiads are prestigious competitions that demand advanced reasoning and multimodal understanding, yet existing AI solutions have struggled to achieve high performance, particularly at the open-source level. The paper introduces PHYSICSMINIONS, a coevolutionary multimodal multi-agent system, composed of three synergistic studios—Visual Studio, Logic Studio, and Review Studio—that collectively enhance the capability to solve rigorous physics Olympiad tasks.

Architecture and Functionality

PHYSICSMINIONS operates through a coevolutionary loop involving its three studios:

Visual Studio: This component processes multimodal inputs such as diagrams and plots, transforming them into structured JSON representations. This conversion from raw images to structured data reduces ambiguity and supports effective reasoning.
Logic Studio: Utilizing the structured inputs from the Visual Studio, the Logic Studio generates initial solutions and iteratively refines them. This studio implements a solver and introspector system that focuses on structured solution formats to improve logic and correctness systematically.
Review Studio: This studio conducts dual-stage verification, using a Physics-Verifier for domain-specific checks and a General-Verifier for broader logical scrutiny. This layered verification process ensures both physics consistency and reasoning accuracy.

The iterative refinement loop enables PHYSICSMINIONS to progressively approach the correct solution, leveraging feedback from the Review Studio to guide the Logic Studio's adjustments.

Empirical Breakthroughs

Evaluating PHYSICSMINIONS against the HiPhO benchmark, which includes results from seven recent physics Olympiads, the paper reports significant advancements:

Generalization and Performance Gains: PHYSICSMINIONS consistently boosts performance across both open-source and closed-source models, overcoming the single-model baseline limitations.
Historic Achievements: The system elevates open-source models to gold performance levels in the latest International Physics Olympiad (IPhO), marking the first time open-source models achieve gold in this competition.
Scaling to Human Expert Level: With an open-source Pass@32 score reaching 26.8 out of 30 in IPhO, PHYSICSMINIONS ranks 4th among 406 competitors—far surpassing the top single-model score of 22.7, which was ranked 22nd.

Comparative Analysis

The paper conducts a comparison with other frameworks such as Best-of-N, Self-MoA, and Self-Refine. PHYSICSMINIONS demonstrates superior performance due to its coevolutionary system, which integrates structured verification and reflection processes. Best-of-N adopts a strategy of selecting the best output from multiple runs but remains inherently single-model. Self-MoA and Self-Refine offer improvements but lack the integrated dual-stage verification and iterative enhancement that characterize PHYSICSMINIONS.

Implications and Future Directions

The implications of PHYSICSMINIONS extend beyond physics Olympiads, suggesting potential applications in other disciplines requiring multimodal problem-solving capabilities. Future research could focus on:

Enhancing visual understanding and multimodal perception within Visual Studio.
Integrating external solvers and domain-specific tools to reinforce solution refinement.
Expanding the coevolutionary paradigm to other Olympiad-level domains beyond physics.

Conclusion

PHYSICSMINIONS represents a significant advancement in the field of AI-driven problem-solving frameworks. By leveraging a coevolutionary multimodal multi-agent system, it has successfully overcome performance barriers in solving high-level physics Olympiad tasks, offering a generalizable framework that could extend to various fields requiring sophisticated reasoning. The paper provides a compelling case for the efficacy of integrated agent collaboration and iterative refinement in pushing AI capabilities toward domain excellence.

Markdown Report Issue