Graph-of-Thought: A New Reasoning Paradigm

Updated 5 December 2025

Graph-of-Thought (GoT) is a reasoning paradigm representing intermediate states as nodes in a DAG, allowing flexible branching and convergence of thought processes.
It employs generation, aggregation, and refinement methods to orchestrate parallel hypothesis creation, dynamic selection, and efficient solution synthesis.
Applications include multi-modal reasoning, sequential recommendation, and autonomous systems, demonstrating improved accuracy and reduced computational costs.

A Graph-of-Thought (GoT) is a reasoning paradigm in which the intermediate states of a complex problem-solving process, traditionally managed by LLMs or other AI systems, are represented as nodes in a directed graph, with edges encoding logical, deductive, or generative dependencies. Unlike classical Chain-of-Thought (CoT) or Tree-of-Thought (ToT) prompting—which use linear or tree-like topologies—GoT enables arbitrary directed acyclic graph (DAG) structures, allowing for dynamic branching, aggregation, reuse of partial computations, and richer composition of reasoning paths (Besta et al., 2023, Besta et al., 2024).

1. Formal Definition and Structural Principles

A GoT is formally defined as a directed graph $G = (V, E)$ , where:

$V$ is a set of “thought” nodes; each $v \in V$ encodes a partial solution, subgoal, or intermediate state, typically in natural language or high-dimensional features.
$E \subset V \times V$ is a set of directed edges; an edge $(u \to v)$ signifies that $v$ was generated from $u$ by a reasoning step or transformation.

Key abstractions and primitives include:

Generation ( $T_{\text{gen}}$ ): From a node $v$ , generate $k$ continuations as child nodes. Each continuation may represent a distinct hypothesis, method, or subproblem (Besta et al., 2023, Long et al., 2024).
Aggregation ( $V$ 0): Merge multiple nodes—often representing alternative solutions or sub-aspects—into a single node that synthesizes and resolves the information (Ning et al., 2024, Long et al., 2024).
Refinement/Improvement ( $V$ 1): Iteratively enhance a candidate node, yielding a higher-scoring or more consistent state.

Nodes may hold scores $V$ 2 or richer quality metrics, evaluated by a dedicated function $V$ 3, and selection for expansion or output is typically managed by a ranking function $V$ 4 (Ning et al., 2024).

The GoT structure supports arbitrary in- and out-degree at nodes—enabling both “branching” (parallel hypothesis generation) and “merging” (reuse and aggregation of convergent subgraphs), which is central to achieving non-linear, human-like reasoning (Besta et al., 2024, Besta et al., 2023).

2. Execution Pipeline and Algorithmic Frameworks

The GoT pipeline generally involves:

Initialization: The reasoning process is seeded with a root node reflecting the task prompt or input context.
Expansion:
- At each expansion step, active nodes are selected for the application of transformations (generate, aggregate, improve).
- The chosen node(s) serve as prompt context for LLM invocations or other modules to create child nodes or merged nodes.
Scoring and Evaluation:
- Each candidate node is assigned a score using evaluators (e.g., ROUGE for summaries, logical checkers for math proofs).
- Optionally, aggregation functions combine inputs from multiple parent nodes.
Selection and Pruning:
- Nodes/paths not meeting thresholds are pruned to focus computational resources on promising branches (Ning et al., 2024, Lei et al., 2023).
Termination:
- The process completes when a node satisfies solution criteria or resource constraints are reached (Besta et al., 2023, Li, 2024).

The process can be static—with a predetermined topology—or dynamic, where expansion and pruning decisions adapt on-the-fly based on node quality and utility (Ning et al., 2024, Pandey et al., 7 Feb 2025).

Pseudocode Schema (Generic GoT Expansion)

$V$ 5 (Besta et al., 2023, Ning et al., 2024, Besta et al., 2024)

3. Comparison to Chains, Trees, and Dynamic Extensions

GoT generalizes prior structure-enhanced reasoning paradigms:

Chain-of-Thought (CoT): A path graph; limited to single-sequence reasoning, no reuse.
Tree-of-Thought (ToT): k-ary tree; supports branching but not aggregation of distinct branches.
Graph-of-Thought (GoT): Arbitrary DAG; allows both branching and convergence/aggregation, enabling dynamic programming-style reuse and explicit parallelism (Besta et al., 2024).

Adaptive forms such as Dynamic GoT (DGoT) or Adaptive GoT (AGoT) further prune or expand the GoT at inference based on empirical node quality, reducing useless computation and allocating effort adaptively (e.g., using Gumbel-derived or mean-score thresholds) (Ning et al., 2024, Pandey et al., 7 Feb 2025).

4. Domain-Specific Instantiations and Extensions

GoT methodology underpins a variety of advanced AI workflows beyond text reasoning:

Scientific Abstract Generation: DGoT dynamically prunes expansion when candidate summaries meet quality thresholds, achieving >43% reduction in LLM cost while maintaining or improving ROUGE scores versus static multi-round query prompting (Ning et al., 2024).
Sequential Recommendation: GOT4Rec decomposes recommendation into reasoning about short/long-term interests and collaborative influences, explicitly modeled as separate subgraphs, with aggregation yielding the final recommendations. Experiments yield up to 67% increased recall over baselines (Long et al., 2024).
Multi-modal Reasoning: In Aggregation-GoT for prompt learning, each step comprises a small subgraph with multiple “views,” aggregated via a learned weighting scheme and a flow controller, increasing generalization for image-text retrieval and VQA tasks (Yang et al., 2024).
Multi-agent Cooperation: Composable GoT (CGoT) for vehicle-robot systems merges agent-local GoTs into joint plans, enabling dynamic combination, cooperation, and division of labor in embodied service systems (Nie et al., 25 Oct 2025).
Graph Learning: GCoT realizes GoT as a sequence of stateful prompt-based updates on graph-structured data, achieving 6–11% improvement over prompt learning baselines in few-shot classification (Yu et al., 12 Feb 2025).
Reasoning over Visual Data: GoT-CQA encodes chart question answering as a DAG of operator nodes (localization, numeric, logical), fusing topological information with text and image features for improved compositional reasoning (Zhang et al., 2024).
Autonomous Driving: V2V-GoT structures cooperative perception, prediction, and planning as a DAG of interdependent sub-tasks fused via a multimodal LLM, reducing planning error by >2x compared to non-graph multimodal baselines (Chiu et al., 22 Sep 2025).

5. Empirical and Theoretical Impact

Empirical studies consistently demonstrate GoT’s advantages over linear and tree-structured reasoning:

Task Domain	Improvement over ToT/CoT	Reference
Sorting (n=128)	+62% accuracy, –31% cost (vs ToT)	(Besta et al., 2023)
Logical Reasoning (24-point game)	97% acc. (GoT n=5) vs ToT 74%	(Lei et al., 2023)
Abstract Generation	44% cost vs GoT, higher ROUGE	(Ning et al., 2024)
Recommendation (Food domain)	+67.5% recall vs best baseline	(Long et al., 2024)
Multi-hop Reasoning, Retrieval	+30% in accuracy, +22% EM/F1	(Pandey et al., 7 Feb 2025)
Multi-modal Prompting	+2.7% R@1, +0.7% in generalization	(Yang et al., 2024)

Theoretically, GoT provides strictly greater expressive power through:

Nonlinear reuse of partial solutions
Hybridization of dynamic programming, best-first search, and memory-efficient expansion
Superior trade-offs between search depth/latency and computation (volume), achieving asymptotic improvements in high-complexity domains (Besta et al., 2024, Besta et al., 2023).

6. Best Practices, Limitations, and Research Directions

Design Considerations and Implementation:

Represent the GoT graph in a compact form (e.g., JSON or sets of triples) to manage context window limitations.
Tune branch and aggregation factors to balance diversity, cost, and depth.
Use lightweight evaluation/pruning to maximize cost effectiveness—dynamic thresholding via empirical score distributions or Gumbel models is effective in practice (Ning et al., 2024).

Limitations:

High computational and token cost due to multiple LLM calls per node/edge, though dynamic pruning mitigates this (Ning et al., 2024, Lei et al., 2023).
Labour-intensive prompt engineering for task- and domain-specific transformations (Besta et al., 2024).
Difficulty in debugging or interpreting complex, dynamically constructed graphs (Li, 2024).

Future Directions:

Learning adaptive GoT scheduling and topology (meta-learning, automatic task decomposition) (Besta et al., 2024, Pandey et al., 7 Feb 2025).
Efficient parallel or distributed GoT processing (Besta et al., 2024).
Integration with external tools (symbolic solvers, databases) and modular evaluators (Besta et al., 2023).
Applying GoT principles at scale in multi-agent, multi-modal, or embodied reasoning systems (Nie et al., 25 Oct 2025, Chiu et al., 22 Sep 2025).

7. Applications Across Modalities and Research Frontiers

GoT frameworks have been instantiated in business workflow engines (GoTFlow), scientific text summarization (DGoT), sequential and multi-modal recommendation (GOT4Rec, AGoT), few-shot graph learning (GCoT), chart question answering (GoT-CQA), and cognitive robotics (CGoT, V2V-GoT) (Li, 2024, Ning et al., 2024, Long et al., 2024, Yu et al., 12 Feb 2025, Zhang et al., 2024, Nie et al., 25 Oct 2025, Chiu et al., 22 Sep 2025).

They enable the explicit modeling and manipulation of complex, multi-threaded reasoning, facilitate interpretability via explicit graph states, and have demonstrated consistent improvement over state-of-the-art linear or tree-structured approaches in both output quality and computational cost.

References:

(Besta et al., 2023, Ning et al., 2024, Pandey et al., 7 Feb 2025, Li, 2024, Besta et al., 2024, Lei et al., 2023, Long et al., 2024, Yu et al., 12 Feb 2025, Nie et al., 25 Oct 2025, Chiu et al., 22 Sep 2025, Yang et al., 2024, Zhang et al., 2024)