Papers
Topics
Authors
Recent
Search
2000 character limit reached

RAG Perturbation-Temperature Analysis

Updated 8 December 2025
  • The paper presents a dual-domain framework that quantifies how input perturbations and temperature changes degrade output quality in both NLP and quantum/statistical settings.
  • The methodology leverages semantic similarity, partition function zeros, and coefficient of variation to diagnose system fragility under controlled noise and temperature variations.
  • Practical guidelines advise capping temperature parameters and optimizing retrieval strategies to mitigate performance degradation and improve system robustness.

The RAG Perturbation-Temperature Analysis Framework refers to a class of methodologies, both in retrieval-augmented generation (RAG) systems in NLP and in quantum/statistical physics, that systematically analyze how model or system output quality degrades under the joint influence of external perturbations in the input and variations in “temperature”—a parameter governing sampling stochasticity or the statistical ensemble. Across both domains, these frameworks provide analytic, diagnostic, and practical tools for quantifying, interpreting, and controlling the fragility or convergence properties of complex systems subject to noise and sampling fluctuations.

1. Conceptual Foundations

RAG frameworks are developed to rigorously assess the robustness of systems where outputs result from the interaction between perturbed inputs (e.g., noisy retrieved text, parameter variation, environmental noise) and internal randomness regulated by a temperature-like parameter. In NLP, particularly retrieval-augmented generation, this involves analyzing LLMs’ sensitivity to both errors in retrieved context and variation in generation temperature (Zhou et al., 1 Dec 2025). In quantum/statistical mechanics, “RAG” refers to the analytic structure of perturbation series expansion at finite temperature, where thermodynamic observables may diverge or become ill-defined if singularities in the partition function approach the expansion point (Sun et al., 2022).

2. Methodological Structure

Retrieval-Augmented Generation (RAG) Systems

The RAG Perturbation-Temperature Analysis Framework for RAG systems is comprised of the following pipeline (Zhou et al., 1 Dec 2025):

C0PiCiprompt+TRT(Ci)eval  M(RT(Ci),A)C_0 \xrightarrow{P_i} C_i \xrightarrow{\text{prompt}+T} R_T(C_i) \xrightarrow{\text{eval}\; M(R_T(C_i),A^*)}

  • Start with gold supporting sentences C0={sj}j=1mC_0=\{s_j\}_{j=1}^m.
  • Apply one of three document-level perturbation operators PiP_i to obtain CiC_i, modeling noisy retrieval.
  • For each temperature TT (governing softmax sampling in the LLM), prompt the model with Q,Ci\langle Q, C_i \rangle, generate RT(Ci)R_T(C_i), and evaluate against processed reference answer AA^*.
  • Repeat for multiple runs (R=3R=3) to estimate the mean and variance of semantic similarity metrics.

Quantum/Statistical Physics

The “RAG” analysis prescribes the following canonical steps (Sun et al., 2022):

  1. Define a family of Hamiltonians H(λ)=(0)+λ(1)H(\lambda) = (0) + \lambda(1); analytic continuation in complex λ\lambda.
  2. Compute the partition function Z(λ,T)Z(\lambda, T) and identify its complex zeros {λi(T)}\{\lambda_i(T)\}.
  3. Determine thermodynamic observables like internal energy and free energy, whose singularity structure is determined by the zero locus of Z(λ,T)Z(\lambda, T).
  4. The radius of convergence of perturbation theory is set by R(T)=miniλi(T)R(T) = \min_i |\lambda_i(T)|, and its behavior is temperature-dependent.

3. Perturbation and Temperature Modeling

Input Perturbations

In RAG-NLP, three key perturbation operators model realistic retrieval errors (Zhou et al., 1 Dec 2025):

  • P1P_1 (Sentence Replacement): Replace the latter half of selected sentences with irrelevant content from the same title.
  • P2P_2 (Sentence Removal): Delete the latter half of selected sentences.
  • P3P_3 (NER Replacement): Replace named entities in the latter half of selected sentences with “[MASK]”.

The perturbation set is designed to simulate realistic corruption, deletion, or misalignment of source evidence that occurs in practice.

Temperature Parameterization

  • In NLP, temperature TT modulates the stochasticity of the LLM’s sampling distribution:

p(vkT)=exp(lk/T)iexp(li/T)p(v_k|T) = \frac{\exp(l_k / T)}{\sum_{i} \exp(l_i / T)}

  • In physics, temperature controls the statistical weight assigned to each energy eigenstate in the canonical partition function, directly impacting the analytic continuation and convergence of series expansions.

4. Quantitative Metrics and Analytic Outputs

Retrieval-Augmented Generation (NLP)

Key metrics for analyzing temperature–perturbation interactions include (Zhou et al., 1 Dec 2025):

  • BERTScore-F1: Semantic similarity between model output and gold answer.
  • Absolute degradation: ΔM(T,Pi)=M(T,P0)M(T,Pi)\Delta M(T, P_i) = M(T, P_0) - M(T, P_i).
  • Relative drop (%): D(T,Pi)=ΔM(T,Pi)M(T,P0)×100%D(T, P_i) = \frac{\Delta M(T, P_i)}{M(T, P_0)} \times 100\%.
  • Coefficient of Variation (CV): Measures stochastic sensitivity across repeated generations.

Finite-Temperature Perturbation Theory (Physics)

  • Radius of convergence: R(T)=miniλi(T)R(T) = \min_i |\lambda_i(T)|; expansion is valid for λ<R(T)|\lambda|<R(T).
  • Classification of singularities: Determines non-analyticities in observables, such as poles and branch points in energy and free energy.

5. Experimental Realizations and Principal Findings

Retrieval-Augmented Generation

  • Experiments on HotpotQA with multiple LLMs (gpt-3.5-turbo, gpt-4o, Llama-3.1-8B, Llama-3.2-1B, deepseek-reasoner) demonstrate:
    • Non-linear (often super-linear) amplification of performance degradation as temperature increases.
    • GPT-family models are robust up to T1.4T \approx 1.4, after which degradation accelerates; Llama models exhibit earlier deterioration (T0.6T \approx 0.6).
    • Sentence Removal and Replacement are the most damaging perturbations; NER masking is less harmful.
    • The CV increases sharply beyond model-specific critical temperatures (e.g., Tc1.4T_c \approx 1.4 for GPTs, Tc0.6T_c \approx 0.6 for Llamas).
    • Bridge and comparison question types show similar temperature sensitivity.

Quantum/Statistical Physics

  • As temperature increases, the radius of convergence R(T)R(T) increases, improving the reliability of perturbative results.
  • At zero temperature, degeneracies cause zeros of Z(λ,T)Z(\lambda, T) to coalesce at the origin, invalidating the canonical perturbative expansion (Kohn–Luttinger scenario).
  • There is a unified perspective on the relation between Lee–Yang zeros (finite-TT) and exceptional points (zero-TT), relevant for quantum phase transitions.

6. Practical Guidelines and Theoretical Implications

NLP (RAG Systems)

  • Prefer models (e.g., deepseek-reasoner) that exhibit flat performance as a function of TT for robustness.
  • Cap temperature for GPTs at T1.4T \leq 1.4, Llamas at T0.6T \leq 0.6, to avoid abrupt degradation (“cliff” behavior).
  • When input noise is possible, monitor both mean and CV across a temperature grid and favor lower-temperature generations.
  • Since Sentence Removal and Replacement inflict maximal harm, retrieval filtering should prioritize sentence confidence and NER-aware processing.

Physics

  • The RAG framework yields a six-step algorithm: compute the spectrum, form Z(λ,T)Z(\lambda,T), locate zeros, compute R(T)R(T), classify singularities, and assess convergence for physical coupling.
  • The analytic radius R(T)R(T) quantifies fragility of perturbative expansions: When R(T)<1R(T) < 1, perturbation theory fails at the physical value.

7. Broader Impact and Extensions

  • The methodology generalizes to complex perturbation-temperature tradeoffs in reinforcement learning, probabilistic graphical models, thermodynamic simulations, and physical systems admitting analytic continuation.
  • In quantum field theory, extensions include the Variational Renormalization Group (VRG) and RG-Optimized Perturbation Theory (RGOPT), which further tame scale and temperature dependencies and provide improved convergence for both scalar and gauge theories (Câmara et al., 8 Sep 2025, Kneur et al., 2015).
  • The underlying principle—mapping the system’s “fragility” or viability surface as a function of both perturbation and temperature—offers an actionable diagnostic for system designers aiming to balance creativity (or non-determinism) against robustness to noise.

Summary Table: Primary Components in RAG Perturbation-Temperature Frameworks

Domain Perturbation PiP_i Temperature Control Key Metric/Diagnosis
RAG-NLP (Zhou et al., 1 Dec 2025) Sentence Removal, Replacement, NER Mask LLM Sampling Softmax TT BERTScore, Degradation, CV
Stat/Quantum (Sun et al., 2022) Hamiltonian deformation, eigenvalue crossings Canonical ensemble TT Radius of convergence R(T)R(T), singularities

This unified analytic approach enables principled evaluation, optimization, and diagnostic control of complex systems operating under joint internal and external stochasticity.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RAG Perturbation-Temperature Analysis Framework.