Measuring the informational-density ratio ρ for instrumented pretraining
Measure the informational-density ratio ρ by designing a task family 𝒯 of causal, counterfactual, and calibration-aware benchmarks and a matched-compute protocol Π under which N instrumented samples are compared to ρN correlation-only web samples on test loss; determine whether ρ>1 and how it scales with instrumentation depth to test the fewer-but-richer postulate.
References
Measuring \rho on \mathcal{T} is the open question of Section~\ref{sec:openq}.
— Instrumented data for causal scientific machine learning
(2606.07865 - Wilke, 5 Jun 2026) in Section 5.4, Use 4 (long-term, speculative, robustness-sensitive): fewer-but-richer pretraining