Data-driven selection of the number of topics in the linear TAMM

Develop a data-driven procedure for selecting the number of topics K in the Bayesian topic-model-based linear Template-Adapted Mixture Model (TAMM), for example via posterior predictive checks or hierarchical modeling, so that model capacity is calibrated without access to truth-level information.

Background

The linear TAMM uses topics learned from many misspecified simulated distributions to reduce model complexity while retaining key distributional patterns. In the study, the number of topics is chosen by scanning K and evaluating performance against known truth, which is feasible only in controlled studies.

A data-driven criterion to select K is needed for practical applications where truth-level targets are unavailable. The authors suggest that posterior predictive checks or hierarchical models could provide such a mechanism, but they do not implement it in this work.

References

We leave the matter of a data-driven method for selecting the number of topics (via a posterior predictive check or with a hierarchical model) for future work.

Many Wrongs Make a Right: Leveraging Biased Simulations Towards Unbiased Parameter Inference  (2604.02219 - Alvarez et al., 2 Apr 2026) in Section 2.4 (Topic Number Selection and Model Evaluation)