LDLT L-Lipschitz Network Weight Parameterization Initialization
Abstract: We analyze initialization dynamics for LDLT-based $\mathcal{L}$-Lipschitz layers by deriving the exact marginal output variance when the underlying parameter matrix $W_0\in \mathbb{R}{m\times n}$ is initialized with IID Gaussian entries $\mathcal{N}(0,σ2)$. The Wishart distribution, $S=W_0W_0\top\sim\mathcal{W}_m(n,σ2 \boldsymbol{I}_m)$, used for computing the output marginal variance is derived in closed form using expectations of zonal polynomials via James' theorem and a Laplace-integral expansion of $(α\boldsymbol{I}_m+S){-1}$. We develop an Isserlis/Wick-based combinatorial expansion for $\operatorname{\mathbb{E}}\left[\operatorname{tr}(Sk)\right]$ and provide explicit truncated moments up to $k=10$, which yield accurate series approximations for small-to-moderate $σ2$. Monte Carlo experiments confirm the theoretical estimates. Furthermore, empirical analysis was performed to quantify that, using current He or Kaiming initialization with scaling $1/\sqrt{n}$, the output variance is $0.41$, whereas the new parameterization with $10/ \sqrt{n}$ for $α=1$ results in an output variance of $0.9$. The findings clarify why deep $\mathcal{L}$-Lipschitz networks suffer rapid information loss at initialization and offer practical prescriptions for choosing initialization hyperparameters to mitigate this effect. However, using the Higgs boson classification dataset, a hyperparameter sweep over optimizers, initialization scale, and depth was conducted to validate the results on real-world data, showing that although the derivation ensures variance preservation, empirical results indicate He initialization still performs better.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.