Assess whether macro-set selection heuristics align MathLib data with A_n log-density predictions

Investigate whether filtering the candidate macro set in MathLib by in-degree percentiles, restricting to definition-like elements (e.g., declarations whose resulting type is Sort), or related selection/optimization approaches produces a macro set for which the three measured relationships (log unwrapped length versus depth, wrapped length versus depth, and log unwrapped length versus wrapped length) better agree with the A_n log-density regime predictions.

Background

The authors compare MathLib measurements with predictions from several monoid regimes and find strongest alignment with the A_n log-density case. However, identifying which MathLib elements should count as the operative macro set remains unresolved.

They suggest simple heuristics (e.g., filtering by in-degree or restricting to certain definition-like elements) and an optimization framing for macro-set selection but explicitly note that it is an open question whether these or similar methods achieve better agreement with the theoretical predictions.

References

Whether these or related approaches bring the three metrics into better agreement with the $A_n$ log-density predictions is an open question.

Compression is all you need: Modeling Mathematics  (2603.20396 - Aksenov et al., 20 Mar 2026) in Subsubsection “Summary and Identifying the ‘Macro Set’” within Section 3