Identify the optimal sampling range γ for random parameters in masking-based approximation

Ascertain the value or range of the uniform sampling half-edge length γ for random initialization that optimally balances (i) the number of hidden units required and (ii) the probability of finding a mask that matches a target parameterization within a given tolerance ε, thereby improving hidden-layer width scaling in the masking-based construction used to prove universal approximation with learned biases.

Background

The probabilistic masking argument (Lemma \ref{lem:supp1}) shows that, with sufficiently large width, a random network contains a subnetwork whose parameters lie within an ε-window of the target. The authors note a trade-off: very small γ necessitates many units to achieve sufficient dynamic range, whereas very large γ requires sampling many units to land near target parameters.

This observation suggests there is a γ “sweet spot” that minimizes required width while maintaining high probability of finding suitable masks. Pinning down this optimal γ would yield practical guidance for initializing random networks used in bias-learning and related masking approaches.

References

This suggests the existence of some sweet spot in the value γ, which we leave for future work to explore.

Expressivity of Neural Networks with Random Weights and Learned Biases  (2407.00957 - Williams et al., 2024) in Appendix, Remark 1 on Lemma \ref{lem:supp1}