Setting synthetic-model hyperparameters without ground-truth LLM features
Determine appropriate values for the hyperparameters used to model large language model representations in the SynthSAEBench synthetic data framework—specifically the number of features, the correlation levels between features, the hierarchy degree, and the superposition level—given the current absence of ground-truth knowledge of LLM features, so that synthetic benchmarks can be calibrated to real neural network behavior.
References
Due to our lack of true ground-truth knowledge of features in LLMs, we do not know how to set hyperparameters like number of features, correlation levels, hierarchy degree, and superposition level.
— SynthSAEBench: Evaluating Sparse Autoencoders on Scalable Realistic Synthetic Data
(2602.14687 - Chanin et al., 16 Feb 2026) in Appendix, Section "Limitations"