MiniMax-M1: Verified Search & Hybrid Reasoning
- The paper introduces a formally verified Marsland-style minimax algorithm that uses transposition tables and fail-soft pruning to ensure correct alpha–beta search results.
- MiniMax-M1 is a hybrid reasoning model that integrates Mixture-of-Experts with Lightning Attention to support native 1 million-token contexts and scalable RL training.
- Experimental results indicate that MiniMax-M1 excels in long-context reasoning and tool use, outperforming similar models in efficiency and benchmark performance.
MiniMax-M1 refers to two distinct but influential algorithms: a formally verified Marsland-style minimax variant in the study of search and verification, and a large-scale, open-weight, hybrid-attention reasoning model in deep learning. The former, detailed in the context of the Dafny verification system, provides a rigorous foundation for understanding depth-limited, transposition-table-based minimax search. The latter, developed as MiniMax-M1 by MiniMax-AI, is a state-of-the-art LLM emphasizing compute efficiency, long-context reasoning, and reinforcement learning (RL) scalability through hybrid Mixture-of-Experts architectures and Lightning Attention. Both represent foundational advances in their respective fields, emphasizing correctness, efficiency, and scalability (Wesselink et al., 24 Sep 2025, MiniMax et al., 16 Jun 2025).
1. Definition and Historical Positioning
In algorithmic game search, MiniMax-M1 describes a depth-limited, transposition-table (TT) variant of the recursive minimax (negamax) framework, corresponding to the original Marsland-style M-variant with fail-soft window narrowing (Wesselink et al., 24 Sep 2025). This variant extends basic minimax by: enforcing a fixed search depth; using hash-mapped transposition tables for result caching; employing previously computed bounds to refine the – search window; and enabling both transposition- and –-based pruning.
In large-scale machine learning, MiniMax-M1 is the first open-weight, hybrid-attention reasoning model to efficiently scale test-time compute, enabling native 1 million-token context and extensible “thinking budgets” (maximum token generation per inference or rollout), meeting the demands of long-context reasoning and complex tool use (MiniMax et al., 16 Jun 2025). Built upon MiniMax-Text-01, this model combines Mixture-of-Experts (MoE) with Lightning Attention to balance parameter scale and inference efficiency.
2. Algorithmic Architecture and Formal Specification
Game Search: Marsland-Style Minimax (M1)
MiniMax-M1 is specified as a depth-limited, negamax-algorithm incorporating transposition-table lookups and Fishburn’s fail-soft window-narrowing. Its formal Dafny signature is:
1 2 3 4 5 6 7 8 9 10 11 |
method MinimaxM1(u: Node,
alpha0: bounded_int,
beta0: bounded_int,
depth: nat)
returns (result: bounded_int)
modifies this.T
requires alpha0 < beta0
requires turn_based()
requires is_valid_table(T)
ensures is_negamax_tt_result(result, u, alpha0, beta0, depth)
ensures is_valid_table(T) |
Large-Scale Reasoning Model: Hybrid MoE and Lightning Attention
MiniMax-M1’s neural architecture comprises:
- 456 billion total parameters split into 32 experts with per-token activation of 45.9 billion parameters (top-2 gating).
- Hybrid block structure: Seven TransNormer blocks (implementing Lightning Attention) are interleaved with one Transformer block (softmax attention). Lightning Attention provides an I/O-aware linear-attention mechanism with time and memory, a significant scaling improvement over standard attention.
- MoE auxiliary loss with a scaled load-balancing term, adjusted in continual pretraining to support large micro-batch sizes.
- Native 1 million-token context support, achieved by incremental sequence-length scaling and streaming I/O to prevent memory explosion.
3. Training Methodologies and Optimization
Marsland-Style Minimax Verification
The algorithm’s correctness is established by the Dafny system, which mechanizes witness-based proof obligations:
- The postcondition requires the existence of a node expansion and corresponding negamax alpha–beta result.
- Table-entry validity ensures each cached value matches the required lower/upper/exact semantics for admissible pruning.
- Key lemmas (e.g., TableLookupReturnLemma, LoopBreakLemma, TableUpdateLemma) are instantiated to verify table reuse, pruning, and recursive invariants automatically.
In practice, M1’s algorithm is formally proved except in witness-violating scenarios where lower-bound reuse leads to counterexamples (Wesselink et al., 24 Sep 2025).
Hybrid Reasoning Model RL and CISPO
MiniMax-M1 employs large-scale RL over ~161,000 demanding tasks, including mathematical competition problems, logic puzzles, competitive programming, and real-world software engineering (with execution-based rewards in containerized sandboxes).
- CISPO (Clipped IS-weight Policy Optimization): Instead of token-level clipping (as in PPO), CISPO clips importance-sampling weights within REINFORCE, providing unbiased gradients and ensuring no tokens are dropped—even for rare “fork” cases. The CISPO objective is:
where and is a group-relative advantage.
- Curriculum learning schedules intensive reasoning followed by general-domain RL to avoid catastrophic forgetting.
- Hardware/cost efficiency: Training on 512 H800 GPUs for three weeks (US\$534,700), utilizing lightning attention to realize+ rollout efficiency at 100K tokens.
4. Experimental Results and Benchmark Performance
Marsland-Style Minimax (M1)
Worst-case time complexity remains , where is branching factor and is search depth. However, tree size can shrink dramatically due to table and – pruning. The Dafny artifacts comprise ~600 lines of main code and 250 lines of proofs, verifying that M1 correctly prunes only when the witness criterion is met—pinpointing specific lower-bound table reuse that violates soundness (Wesselink et al., 24 Sep 2025).
MiniMax-M1 Reasoning Model
MiniMax-M1-80k demonstrates strong open-weight performance across diverse long-context and agentic tool use benchmarks:
| Task/Benchmark | Metric | MiniMax-M1-80k Result |
|---|---|---|
| AIME 2024 | pass@N | 86.0% (2nd among open weights) |
| SWE-bench Verified | exec. rate | 56.0% |
| OpenAI-MRCR 128K | accuracy | 73.4% |
| MRCR 1M | accuracy | 56.2% |
| TAU-bench (airline) | success | 62.0% |
MiniMax-M1 consistently outperforms or matches DeepSeek-R1 and Qwen3-235B models, particularly excelling in software engineering, tool use, and multi-step reasoning at large context scales (MiniMax et al., 16 Jun 2025). Ablations confirm performance increases as the thinking budget scales from 40,000 to 80,000 tokens.
5. Public Releases, Deployment, and Engineering Considerations
MiniMax-M1 is released in two variants corresponding to their thinking budgets:
- MiniMax-M1-40k: max 40,000 tokens per inference/generation (intermediate RL checkpoint).
- MiniMax-M1-80k: max 80,000 tokens (full RL-trained model).
“Thinking budget” designates the token ceiling for chained reasoning at inference or RL rollout. Native framework integration is provided for vLLM and Transformers, including custom Lightning Attention kernels and streaming I/O-aware interfaces. Both code and weights are available on GitHub and Hugging Face, with commercial deployments accessible via API (https://minimax.io). Optimization strategies, such as FP32 LM head for reward-probability alignment, AdamW tuning, and early truncation for pathological sequence control, enhance stability and alignment.
6. Impact, Verification, and Open Questions
MiniMax-M1 in formal algorithmic search provides a verified reference for understanding subtle transposition table interactions in fail-soft minimax algorithms. It demonstrates the concrete limits of lower-bound table reuse by isolating witness violations, offering a template for mechanized verification of more complex search variants (Wesselink et al., 24 Sep 2025).
In neural reasoning, MiniMax-M1 showcases the viability of hybrid MoE-Lightning Attention architectures for tractable test-time compute at unprecedented context scales. The model’s efficiency, extensibility, and open-weight release establish a new baseline for long-context and tool-reasoning models. The conceptual integration of large-scale machine-verified correctness and deployable deep reasoning agents remains a significant direction for future research.
References
- [Formal Verification of Minimax Algorithms, (Wesselink et al., 24 Sep 2025)]
- [MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention, (MiniMax et al., 16 Jun 2025)]