Number of relevant directions in Principal Component Analysis and Wishart random matrices
Abstract: We compute analytically, for large $N$, the probability $\mathcal{P}(N_+,N)$ that a $N\times N$ Wishart random matrix has $N_+$ eigenvalues exceeding a threshold $N\zeta$, including its large deviation tails. This probability plays a benchmark role when performing the Principal Component Analysis of a large empirical dataset. We find that $\mathcal{P}(N_+,N)\approx\exp(-\beta N2 \psi_\zeta(N_+/N))$, where $\beta$ is the Dyson index of the ensemble and $\psi_\zeta(\kappa)$ is a rate function that we compute explicitly in the full range $0\leq \kappa\leq 1$ and for any $\zeta$. The rate function $\psi_\zeta(\kappa)$ displays a quadratic behavior modulated by a logarithmic singularity close to its minimum $\kappa\star(\zeta)$. This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance $\Delta(N)$ of the number of relevant components is also shown to grow universally (independent of $\zeta)$ as $\Delta(N)\sim (\beta \pi2){-1}\ln N$ for large $N$.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.