Kimi K2.5: Agentic AI & Neon Triple Point

Updated 3 February 2026

Kimi K2.5 is a multimodal agentic intelligence system that integrates a trillion-parameter MoE LLM with a MoonViT-3D vision encoder for joint text‐vision optimization across diverse benchmarks.
It employs joint pre-training, zero-vision supervised fine-tuning, and token-level PPO reinforcement learning, enabling efficient task decomposition through its innovative Agent Swarm framework.
Simultaneously, the CCT-K2.5 comparison refines the ITS-90 neon triple point by applying isotopic corrections, reducing temperature uncertainty by up to 70% in high-precision metrology.

K2.5 encompasses significant, unrelated concepts at the leading edge of two research domains: (1) the Kimi K2.5 open-source multimodal agentic intelligence system in artificial intelligence, and (2) the CCT-K2.5 key comparison for the realization of the ITS-90 neon triple point in temperature metrology. Each usage reflects the technical and methodological advances in its respective field.

1. Kimi K2.5: Agentic Multimodal Foundation Model

Kimi K2.5 is an open-source, trillion-parameter multimodal agentic model designed to advance general agentic intelligence through synergistic joint text-vision optimization and self-directed agentic orchestration (Team et al., 2 Feb 2026). The system emphasizes native multimodal fusion, emergent cross-modal capabilities, flexible tool use, and parallel agent coordination.

Model Architecture

Kimi K2.5 builds on the Kimi K2 Mixture-of-Experts LLM (MoE LLM), integrating a MoonViT-3D vision encoder and agentic reasoning interfaces:

Vision Encoder (MoonViT-3D): Utilizes a patch-n-pack (NaViT) strategy for handling diverse image resolutions without cropping or splicing, and extends to video via grouping and temporal pooling of image patches across consecutive frames.
MLP Projector: Transforms the visual patch embeddings into the token embedding space of the LLM.
MoE LLM: 1.04 trillion parameters, 384 experts (8 active per token), pre-trained on 15 trillion text tokens with the MuonClip optimizer.
Cross-modal Fusion: Implements early fusion during pre-training, maintaining a fixed ratio of 10% vision tokens and 90% text tokens throughout. Early fusion (from step 0) provides optimal performance-efficiency trade-offs under a fixed joint token budget.
Agentic Interfaces: Provides tool-calling APIs for search, browser, code interpreter, and capabilities for creating and assigning subtasks to “subagents” through special tokens parsed natively by the LLM.

Pre-training and Optimization

Joint Text–Vision Pre-training: Conducted on a large-scale multimodal corpus (≈15T mixed tokens). The loss combines text-only cross-entropy ( $L_\text{text}$ ), vision captioning loss ( $L_\text{vision}$ ), and an optional cross-modal alignment loss ( $L_\text{cross}$ , set to zero in K2.5).
Zero-Vision Supervised Fine-Tuning (SFT): Applies instruction fine-tuning on text-only data, where visual tool uses are inferred implicitly via code traces (e.g., IPython, PIL) not requiring explicit image supervision. This “zero-vision” stage enables vision-grounded tool code generation without overfitting to visual annotation distributions.
Joint Text–Vision Reinforcement Learning (RL): Employs token-level PPO-style RL to refine reasoning over mixed-modality tasks, with mixed reward signals for task outcome, brevity (via budget-control rewards), vision-specific criteria (F1, IoU, segmentation, edit distance), and open-ended scoring using rubric-based Generative Reward Models (GRMs). The Toggle mechanism alternates between concise and unconstrained reasoning by switching token budgets per task.

2. Agentic Orchestration: The Agent Swarm Framework

Kimi K2.5 introduces Agent Swarm, a parallel-agent orchestration system leveraging Parallel-Agent Reinforcement Learning (PARL) to dynamically decompose tasks and assign them concurrently to frozen subagents (Team et al., 2 Feb 2026). The orchestrator chooses decompositions, spawns subagents (using “create_subagent”), assigns sub-problems (“assign_task” APIs), and aggregates results.

Reward Structure: The PARL reward comprises final solution accuracy ( $r_\text{perf}$ ), bonus for exploiting parallelism ( $r_\text{parallel}$ ), and a reward for helpful task completion ( $r_\text{finish}$ ), with hyperparameters $\lambda_1,\lambda_2$ gradually annealed to shift focus to overall accuracy.
Critical Steps Metric: Wall-clock efficiency is measured as the sum of the main sequence steps and maximal steps of all subagents per inference episode, encouraging the orchestrator to minimize total computational latency.
Inference Workflow: At runtime, the model determines whether to solve tasks serially or to spawn specialized subagents for subtasks, returning an aggregate solution through a hierarchical deliberation.

3. Evaluation and Domain-Spanning Performance

Kimi K2.5 is evaluated on over 100 benchmarks spanning reasoning, coding, vision, video, agentic search, and complex computer-use environments. Evaluations directly compare single-agent and parallel-agent (Agent Swarm) modalities with competitive and state-of-the-art foundation models. Performance highlights include:

Domain / Benchmark	K2.5 Score	SOTA or Baseline	Agent Swarm Impact
Reasoning (MMLU-Pro, GPQA)	87.1%, 87.6%	GPT-5.2: 86.7%, 92.4%	—
Coding (SWE-Bench Verified, LiveCodeBench)	76.8%, 85.0%	Gemini 3 Pro: 76.2%, 87.4%	—
Vision (MMMU-Pro, OCRBench, MathVision)	78.5%, 92.3%, 84.2%	Outperforms Qwen3-VL, DeepSeek-V3.2	—
Video (VideoMMMU, LongVideoBench)	86.6%, 79.8%	—	—
Agentic Search (BrowseComp, WideSearch)	Sngl: 60.6%, 72.7%<br>Swarm: 78.4%, 79.0%	GPT-5.2: 65.8%	Swarm achieves 3–4.5× speedup in wall-clock “Critical Steps”
Computer Use (OSWorld-Verified, WebArena)	63.3%, 58.9%	Claude Opus 4.5: 66.3%, 63.4%	—

Agent Swarm confers significant parallelism-based wall-clock speedup (up to 4.5×) over the single-agent baseline while improving task-F1 in agentic search domains.

4. Model Release, Code Access, and Usage

Kimi K2.5 is released as a publicly available checkpoint (“moonshotai/Kimi-K2.5”) encompassing MoE LLM weights, the MoonViT-3D vision encoder and projector, and the trained Agent Swarm orchestration policy.

Quick start via HuggingFace-style interface:

from transformers import AutoProcessor, AutoModelForCausalLM
import torch

processor = AutoProcessor.from_pretrained("moonshotai/Kimi-K2.5")
model     = AutoModelForCausalLM.from_pretrained("moonshotai/Kimi-K2.5", torch_dtype=torch.float16).eval().to("cuda")
img = processor.images_processor(image=Image.open("chart.png"), return_tensors="pt").pixel_values
txt = processor.text_processor("What is the percentage of category A? ", return_tensors="pt").input_ids
out = model.generate({"pixel_values": img, "input_ids": txt}, max_new_tokens=256)
print(processor.decode(out[0], skip_special_tokens=True))

To invoke Agent Swarm at inference, use the orchestrator special token in the prompt.

5. CCT-K2.5: Neon Triple Point Comparison in ITS-90

In the domain of thermometry, "K2.5" refers to the CCT-K2.5 key comparison for the ITS-90 neon triple point (Pavese et al., 2017). The objective was to harmonize realized fixed points across laboratories by instituting a reference isotopic composition for neon.

Reference Isotopic Composition: ITS-90 now fixes $x_{22}(\text{Ne})=0.0925$ , $x_{21}(\text{Ne})=0.0027$ , $x_{20}(\text{Ne})=0.9048$ as the IUPAC standard.
Correction Formula: Temperature realizations are corrected via

$\Delta T_\text{iso} = \sum_{k} A_k [x_k - x_k^{\mathrm{ref}}]$

with $A_k$ representing isotope-specific sensitivity coefficients.

CCT-K2.5 Results: The raw and isotopically corrected temperatures of two neon cells, realized at NMIJ-AIST and INRIM, demonstrate that after correction, between-sample deviations are cut by approximately 50%–70% and uncertainties are reduced by $\sim$ 60%.

Cell	Lab	Uncorrected $\Delta T$ (mK)	Corrected $\Delta T$ (mK)	Corrected Uncertainty (mK)
Ne-5	NMIJ-AIST	+0.600	+0.606	≈0.27
Ec2Ne	INRIM	–0.750	–0.726	≈0.29

The results underscore the necessity of isotopic assay and correction for achieving microkelvin-level consistency in high-accuracy thermometry.

6. Implications and Context

Kimi K2.5 represents a leading-edge integration of multimodal pre-training, self-directed tool use, and efficient parallel agent orchestration, setting new standards in coding, reasoning, vision, and video comprehension benchmarks (Team et al., 2 Feb 2026). The CCT-K2.5 temperature key comparison marks a methodological advance for metrological temperature scale realization, with the isotopic correction paradigm now mandatory for reference-level interlaboratory consistency (Pavese et al., 2017).

Both applications of "K2.5"—one in artificial intelligence, the other in thermometric metrology—reflect the state of the art in systematic, quantitative, and reproducible measurement and inference.

Markdown Report Issue Upgrade to Chat

References (2)

Kimi K2.5: Visual Agentic Intelligence (2026)

The ITS-90 after definition of neon isotopic reference composition. Extent of the isotopic effect on previous inter-comparison results (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kimi K2.5.