Identify the optimal sampling range γ for random parameters in masking-based approximation

Ascertain the value or range of the uniform sampling half-edge length γ for random initialization that optimally balances (i) the number of hidden units required and (ii) the probability of finding a mask that matches a target parameterization within a given tolerance ε, thereby improving hidden-layer width scaling in the masking-based construction used to prove universal approximation with learned biases.

Background

The probabilistic masking argument (Lemma \ref{lem:supp1}) shows that, with sufficiently large width, a random network contains a subnetwork whose parameters lie within an ε-window of the target. The authors note a trade-off: very small γ necessitates many units to achieve sufficient dynamic range, whereas very large γ requires sampling many units to land near target parameters.

This observation suggests there is a γ “sweet spot” that minimizes required width while maintaining high probability of finding suitable masks. Pinning down this optimal γ would yield practical guidance for initializing random networks used in bias-learning and related masking approaches.

References

This suggests the existence of some sweet spot in the value γ, which we leave for future work to explore.

— Expressivity of Neural Networks with Random Weights and Learned Biases (2407.00957 - Williams et al., 2024) in Appendix, Remark 1 on Lemma \ref{lem:supp1}

Identify the optimal sampling range γ for random parameters in masking-based approximation

Background

References

Related Problems