Inference in PCA under weak identifiability

Series

Erasmus Econometric Institute Series
Speaker(s)

Davy Paindaveine (Université Libre de Bruxelles, Belgium)
Field

Econometrics

Location

Erasmus University Rotterdam, E building, room ET-18
Rotterdam
Date and time

November 30, 2023
12:00 - 13:00

Abstract

We consider inference on principal directions in non-standard asymptotic scenarios where principal directions are unidentifiable in the limit. To fix ideas, we first tackle the problem of testing the null hypothesis $H_0: \theta_1= \theta_1^0$ against the alternative $H_1: \theta_1 \neq \theta_1^0$, where~$\theta_1$ is the "first" eigenvector of the underlying covariance matrix and $\theta_1^0$ is a fixed unit $p$-vector. In the classical setup where eigenvalues $\lambda_1>\lambda_2\geq \ldots\geq \lambda_p$ are fixed, the likelihood ratio test (LRT) and the Le Cam optimal test for this problem are asymptotically equivalent under the null hypothesis, hence also under sequences of contiguous alternatives. We show that this equivalence does not survive asymptotic scenarios where $\lambda_{n1}/\lambda_{n2}=1+O(r_n)$ with $r_n=O(1/\sqrt{n})$. For such scenarios, the Le Cam optimal test still asymptotically meets the nominal level constraint, whereas the LRT becomes extremely liberal. Consequently, the former test should be favored over the latter one whenever the two largest sample eigenvalues are close to each other. By relying on the Le Cam theory of asymptotic experiments, we study in the aforementioned asymptotic scenarios the non-null and optimality properties of the Le Cam optimal test and show that the null robustness of this test is not obtained at the expense of efficiency. Our asymptotic investigation is extensive in the sense that it allows $r_n$ to converge to zero at an arbitrary rate. While these results relate to robustness to weak identifiability, we also introduce sign tests that combine such non-standard robustness with more classical robustness to outliers and/or heavy tails. Finally, we tackle the corresponding point estimation problem by considering in this framework the asymptotic behaviour of the celebrated scatter estimator from Tyler (1987).