LogicGraph Perturbation Protocol

Updated 14 January 2026

LogicGraph Perturbation Protocol is a structured framework that formalizes reasoning chains as graphs to inject controlled, plausible textual hallucinations.
It leverages typed graph representations and probability-weighted perturbation operators to quantify error propagation and self-correction in multimodal inference.
Empirical evaluations, including the use of Active Visual-Context Refinement, demonstrate reduced hallucination persistence and improved model accuracy.

The LogicGraph Perturbation Protocol (LPP) is a systematic framework for injecting high-plausibility textual hallucinations into the chain-of-thought reasoning of large multimodal models (LMMs), enabling quantitative analysis of their capacity for self-correction under cross-modal conflicts. Leveraging a structured, typed graph representation—termed "LogicGraph"—LPP formalizes reasoning chains at the granularity of entities, relations, and attributes, and applies precise, probability-weighted perturbations to probe the robustness and flexibility of multimodal inference. This approach establishes new benchmarks for consistency analysis in multimodal video reasoning, focusing on the phenomenon of “textual inertia,” wherein models persist in erroneous textual trajectories even when discordant with visual evidence (Zhu et al., 7 Jan 2026).

1. LogicGraph: Structured Representation of Reasoning Chains

LPP operationalizes reasoning as a directed, typed graph $G = (V, E, \lambda_V, \lambda_E, f_{\text{text}}, f_{\text{vis}})$ , where:

$V = \{v_i^e, v_i^r, v_i^a : i=1\ldots n\}$ denotes nodes corresponding to entities, relations, and attributes in each reasoning step $s_i$ .
$E \subset V \times V$ is the edge set, capturing intra-step (entity-attribute/relation) and inter-step (sequence-preserving) dependencies.
$\lambda_V: V \rightarrow \{\text{Entity}, \text{Relation}, \text{Attribute}\}$ types each node.
$\lambda_E: E \rightarrow \Sigma_E$ labels edges by semantic role, e.g., "has-attribute", "precedes".
$f_{\text{text}}: V \cup E \rightarrow \mathbb{R}^{d_{\text{text}}}$ and $f_{\text{vis}}: V \cup E \rightarrow \mathbb{R}^{d_{\text{vis}}}$ map nodes and edges to their respective textual and visual embedding spaces (e.g., BERT and pooled frame features).

Parsing with GPT-4o isolates logical atoms per step, facilitating fine-grained manipulation and individual annotation. This explicit segregation amplifies the precision of subsequent perturbations and allows longitudinal tracing of reasoning inertia across steps.

2. Perturbation Operators: Mathematical Framework

Controlled hallucinations are injected by replacing a logical atom $g \in \{\text{Entity}, \text{Relation}, \text{Attribute}\}$ at step $s_i$ :

Candidate Generation: GPT-4o produces candidate set $C = \{c_1, ..., c_m\}$ , each visually incorrect yet linguistically plausible.
Probability-Weighted Selection: For multimodal model $P_M$ $P_{M}$ , scoring is performed:
- $P_{\text{token}}(c)$ : average log-probability of tokens for $c$ given history $H$ .
- $P_{\text{sentence}}(c)$ : average log-probability of sentence with $g \rightarrow c$ given $H$ .
- The selected perturbation is $c^* = \arg\max_{c \in C}\,\frac{1}{2}[P_{\text{token}}(c) + P_{\text{sentence}}(c)]$ .
The textual-perturbation operator is:

$P_{\text{text}}(G; i, g): \begin{cases} v_i^g.\text{label} \leftarrow c^* \ \text{Update } \lambda_V, f_{\text{text}}, f_{\text{vis}} \end{cases}$

Selection can be probabilistic through

$P_{\text{inject}}(c|g,H) = \frac{\exp(\alpha [P_{\text{token}}(c) + P_{\text{sentence}}(c)])}{\sum_{c' \in C} \exp(\alpha [P_{\text{token}}(c') + P_{\text{sentence}}(c')])}$

where $\alpha \to \infty$ enforces deterministic selection.

Visual-only perturbation, while not central in the reference work, can be realized by swapping $f_{\text{vis}}(v_i^e)$ with mismatched frame-derived features.

3. Pipeline: Algorithmic Overview

The LPP evaluation pipeline is formalized as follows:

Graph Construction: Chain-of-thought (CoT) reasoning text $R$ is filtered and segmented; each step is parsed, producing entity, relation, and attribute nodes, with intra- and inter-step edges added.
Perturbation Selection: For various steps and logical atom types, nodes are identified for perturbation. GPT-4o supplies $m \approx 5$ candidates, against which $P_{\text{token}}$ and $P_{\text{sentence}}$ are computed and $c^*$ selected for injection.
Evaluation: Perturbed graphs are serialized back to text. For each, $k=3$ sampled continuations are drawn from $M(H̃, V_{\text{raw}})$ . Each continuation yields a final answer $\hat{y}$ and is classified behaviorally: contamination (0), passive reflection (1), explicit reflection/self-correction (2), collapse (3).
Aggregation: Majority votes and metric computation complete the analysis cycle.

4. Quantitative Evaluation and Metrics

Outcomes are quantitatively analyzed using a suite of metrics:

Accuracy: $Acc = \frac{1}{N}\sum_{i=1}^N \mathbb{1}(\hat{y}_i = y_i)$ , task correctness post-perturbation.
Behavior Rates: $R_k = \frac{1}{N}\sum_{i=1}^N \mathbb{1}(b_i = k)$ , where $k$ indexes contamination, passive reflection, explicit self-correction, and collapse.
Self-Correction Rate: $R_2 = \frac{\#\,\text{Explicit Reflection}}{N}$ .
Error Propagation: $R_0 = \frac{\#\,\text{Contextual Contamination}}{N}$ .
Hallucination Amplification:

$Amp = \frac{1}{N \cdot k} \sum_{i=1}^N \sum_{j=1}^k \frac{H_{i,j}}{|\text{continuation}_{i,j}|}$

quantifies persistence of injected hallucinations.

5. Experimental Parameters and Protocols

LPP experimentation employs the STAR dataset re-formulated for open-ended QA, with a curated subset of 100 samples (50 feasibility, 50 prediction) and frame rate set at 5 fps. Models evaluated include native reasoning (Keye‐preview‐8B, Keye‐1.5‐8B, LongVILA‐7B) and prompt-driven (InternVL3‐8B, Qwen2.5‐VL‐7B) architectures. Generation uses pass@3 sampling, temperature 0.7, and maximal CoT extension of 256 tokens. Perturbations target the first three reasoning steps and all atom types, with candidate generation and behavior scoring strictly following protocol specifications.

6. Analysis of Findings: Reflection, Propagation, and Mitigation

Empirical results show explicit reflection/self-correction rates ( $R_2$ ) universally below 10% for baseline LMMs, with error propagation ( $R_0$ ) exceeding 60% on entity perturbations in the initial reasoning step. Passive reflection ( $R_1$ ) constitutes roughly 20–30%. Error propagation abates modestly with later step perturbations, yielding marginal improvements in both accuracy and reflection rates.

Ablation results indicate decreasing hallucination token count reduces contamination ( $\Delta R_0 \approx -5\%$ ) and elevates passive reflection ( $\Delta R_1 \approx +5\%$ ), with negligible effect on explicit correction. Notably, Active Visual-Context Refinement (AVCR)—a training-free scheme incorporating uncertainty-driven frame check (<check>) and reasoning history denoising (<fold>)—substantially increases explicit reflection rates (from 5% to 29% for KeyE‐preview‐8B, from 1% to 31% for Qwen2.5‐VL‐7B), while reducing error propagation and boosting accuracy (all $p<0.01$ ). Component ablation reveals both <check> and <fold> are instrumental for optimal self-correction, with respective removal degrading $R_2$ to ~4% (visual omitted) or ~22% (denoising omitted).

7. Significance and Prospects

LPP establishes a rigorous paradigm to diagnose and quantify “textual inertia” in LMM reasoning, enabling comparative assessment across architectures and prompting strategies. The consistently low rates of self-correction documented suggest robustness deficits in current LMMs when faced with plausibility-optimized, cross-modal reasoning perturbations. The efficacy of AVCR in stifling hallucination propagation and amplifying self-reflection underscores the impact of inference-time, visually-grounded verification. A plausible implication is that systematic graph-based interrogation and multimodal context strategies may be essential for developing future LMMs with resilient reasoning trajectories and reliable cross-modal alignment (Zhu et al., 7 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LogicGraph Perturbation Protocol.

LogicGraph Perturbation Protocol

1. LogicGraph: Structured Representation of Reasoning Chains

2. Perturbation Operators: Mathematical Framework

3. Pipeline: Algorithmic Overview

4. Quantitative Evaluation and Metrics

5. Experimental Parameters and Protocols

6. Analysis of Findings: Reflection, Propagation, and Mitigation

7. Significance and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LogicGraph Perturbation Protocol

1. LogicGraph: Structured Representation of Reasoning Chains

2. Perturbation Operators: Mathematical Framework

3. Pipeline: Algorithmic Overview

4. Quantitative Evaluation and Metrics

5. Experimental Parameters and Protocols

6. Analysis of Findings: Reflection, Propagation, and Mitigation

7. Significance and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research