The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics

Published 12 Feb 2026 in cs.LG and cs.AI | (2602.12218v1)

Abstract: Determining whether neural models internalize physical laws as world models, rather than exploiting statistical shortcuts, remains challenging, especially under out-of-distribution (OOD) shifts. Standard evaluations often test latent capability via downstream adaptation (e.g., fine-tuning or high-capacity probes), but such interventions can change the representations being measured and thus confound what was learned during self-supervised learning (SSL). We propose a non-invasive evaluation protocol, PhyIP. We test whether physical quantities are linearly decodable from frozen representations, motivated by the linear representation hypothesis. Across fluid dynamics and orbital mechanics, we find that when SSL achieves low error, latent structure becomes linearly accessible. PhyIP recovers internal energy and Newtonian inverse-square scaling on OOD tests (e.g., $ρ> 0.90$). In contrast, adaptation-based evaluations can collapse this structure ($ρ\approx 0.05$). These findings suggest that adaptation-based evaluation can obscure latent structures and that low-capacity probes offer a more accurate evaluation of physical world models.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper shows that invasive adaptation alters latent physics in self-supervised models, undermining accurate evaluation of encoded physical laws.
It introduces PhyIP, a non-invasive probe using linear decoding and symbolic regression to extract interpretable physical equations from complex simulations.
Empirical results across fluid dynamics and orbital mechanics confirm that non-invasive evaluation reliably recovers physical laws, whereas adaptation-based methods can distort dynamic invariants.

Authoritative Summary of "The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics" (2602.12218)

Motivation and Problem Statement

The paper interrogates whether self-supervised learning (SSL)-driven neural world models genuinely internalize physical laws or whether they exploit spurious statistical shortcuts, particularly under distribution shifts. Conventional evaluation procedures, such as fine-tuning and high-capacity probing, may inadvertently alter latent representations and confound assessments of encoded physics. The authors posit that adaptation-based evaluations act as destructive interventions, potentially corrupting internal physics and misrepresenting the true scientific capacity of learned models.

Methodological Innovations

PhyIP: Non-Invasive Physical Probe

The central methodological contribution is the Non-Invasive Physical Probe (PhyIP), a mechanistic protocol motivated by the Linear Representation Hypothesis (LRH). PhyIP tests the linear decodability of physical quantities using frozen backbone representations, strictly avoiding feature distortion and adaptation. Key steps include:

Linear probing: Time-invariant linear readouts are trained on frozen SSL activations, targeting latent conserved quantities (e.g., energy, force).
Symbolic regression: The probe’s outputs are distilled into interpretable formulas and validated for physical plausibility, especially on OOD data.
Rigorous controls: Additional baselines include raw-input regression, time-dependent linear probes, non-linear MLP probes, and adaptation-based protocols (last-layer fine-tuning, full IBP).

The probe’s effectiveness is theoretically bounded in terms of SSL error ( $\epsilon$ ) and functional curvature ( $K$ ), enforcing strict control against false positives/negatives.

Empirical Results

Fluid Dynamics and Stellar Physics

PhyIP was evaluated on high-dimensional simulations from the "The Well" benchmark, comprising radiative turbulence, red supergiant convection, and supernova explosion regimes. Key findings:

Radiative Turbulence (TRL-2D): PhyIP recovered the ideal gas law, $E \sim 1.48P$ , with less than 2% error in coefficient, robustly generalizing to OOD cooling rates. Correlation for linear encoding exceeded $p=0.83$ .
Red Supergiant (RSG-3D): PhyIP extracted both the pressure law and additional convective kinetic corrections, $E \sim 1.45P + 0.42pv^2$ , validated by dimensional analysis. OOD correlation reached $p=0.91$ .
Supernova (SN-3D): PhyIP correctly diagnosed representational collapse (correlation $p<0.2$ , MAPE $>140\%$ ) when SSL error was high. Invasive probes misleadingly reported low errors (MAPE $<19\%$ ), revealing their tendency to overwrite rather than measure knowledge.

Orbital Mechanics and Inductive Bias Analysis

Replicating a pivotal orbital mechanics experiment, the authors detailed a mechanistic analysis of invasive adaptation:

Non-invasive probes recovered Newton's inverse-square law and linear decodability of vector forces ( $p \sim 0.91$ , MAPE $<25\%$ ).
Inductive Bias Probes (IBP) and last-layer fine-tuning resulted in representational collapse, selectively erasing time-varying physical quantities (e.g., speed, radius) while retaining static identifiers (e.g., mass). OOD correlations approached zero, MAPE increased up to $81.5\%$ , and symbolic regression extracted spurious, artifact-ridden expressions rather than correct force laws.

Mechanistic and Theoretical Implications

Adaptation as Destructive Measurement

A robust mechanistic framework was derived, linking probe error to SSL error and functional curvature. Adaptation-based evaluation induces parameter shifts in deep blocks, triggering representational drift and specifically overwriting dynamic invariants. Layerwise analysis confirmed that stable representations in early blocks are corrupted in deeper layers under fine-tuning. Narrow task distributions further encourage discarding genuine physical dependencies for simplistic statistical shortcuts, substantiating the observer effect analogy.

Linear Representation and Scientific Validity

PhyIP’s success validates the linear representation hypothesis for world models trained on physical data. When SSL error is low, physical quantities are encoded in linearly accessible subspaces, and symbolic regression reveals underlying laws with high precision, even across complex systems. Conversely, adaptation-based evaluators may mask backbone failures, hallucinating competence by learning the probe task from scratch.

Practical and Future Directions

Scientific AI Evaluation Paradigms

The findings necessitate a paradigm shift: fixed, non-invasive measurement instruments are required for valid scientific inquiry within AI, echoing principles of classical experimental design. The use of high-capacity or adaptation-based probes may fundamentally distort the evaluation of latent physical knowledge, risking misinterpretation of model capabilities.

Speculation and Future Work

The paper’s limitations highlight challenges in probing models that encode physical quantities in non-linear or entangled geometries—linear probes may not suffice in these cases. Future exploration should focus on subspace-constrained or weight-preserving adaptation protocols. These would seek to augment task-specific adaptability while strictly protecting linear physical invariants, such as conservation laws, already internalized by the model.

Further research should establish complementary inductive biases (spatial smoothness, temporal locality, stability) to guarantee physical fidelity in broader AI systems, as demonstrated in concurrent work (Liu et al., 6 Feb 2026). The synthesis of robust evaluation and principled modeling could solidify the emergence of implicit physical understanding in foundation models trained on vast and diverse data.

Conclusion

The work fundamentally challenges the validity of adaptation-based evaluation in scientific AI, demonstrating that invasive protocols act as destructive interventions, masking or erasing latent physical knowledge in world models. The non-invasive PhyIP establishes a reliable diagnostic, capable of recovering the internalization of physical laws in SSL models across multiple complex regimes. Theoretical bounds, mechanistic analysis, and empirical validations converge to highlight the necessity for rigorous, non-invasive evaluation frameworks in advancing the scientific utility and fidelity of neural world models.

Markdown Report Issue