We study the problem of list-decodable (robust) Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. In the former problem, we are given a set T of points in R n with the promise that an α-fraction of points in T , where 0 < α < 1/2, are drawn from an unknown mean identity covariance Gaussian G, and no assumptions are made about the remaining points. The goal is to output a small list of candidate vectors with the guarantee that at least one of the candidates is close to the mean of G. In the latter problem, we are given samples from a k-mixture of spherical Gaussians on R n and the goal is to estimate the unknown model parameters up to small accuracy. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. Specifically, our main contributions are as follows: List-Decodable Mean Estimation. Fix any d ∈ Z + and 0 < α < 1/2. We design an algorithm with sample complexity O d (poly(n d /α)) and runtime O d (poly(n/α) d) that outputs a list of O(1/α) many candidate vectors such that with high probability one of the candidates is within ℓ 2-distance O d (α −1/(2d)) from the mean of G. The only previous algorithm for this problem [CSV17] achieved error˜Oerror˜ error˜O(α −1/2) under second moment conditions. For d = O(1/ε), where ε > 0 is a constant, our algorithm runs in polynomial time and achieves error O(α ε). For d = Θ(log(1/α)), our algorithm runs in time (n/α) O(log(1/α)) and achieves error O(log 3/2 (1/α)), almost matching the information-theoretically optimal bound of Θ(log 1/2 (1/α)) that we establish. We also give a Statistical Query (SQ) lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. Learning Mixtures of Spherical Gaussians. We give a learning algorithm for mixtures of spherical Gaussians, with unknown spherical covariances, that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform k-mixture of identity co-variance Gaussians we obtain the following: For any ε > 0, if the pairwise separation between the means is at least Ω(k ε + log(1/δ)), our algorithm learns the unknown parameters within accuracy δ with sample complexity and running time poly(n, 1/δ, (k/ε) 1/ε). Moreover, our algorithm is robust to a small dimension-independent fraction of corrupted data. The previously best known polynomial time algorithm [VW02] required separation at least k 1/4 polylog(k/δ). Finally, our algorithm works under separation of˜Oof˜ of˜O(log 3/2 (k) + log(1/δ)) with sample complexity and running time poly(n, 1/δ, k log k). This bound is close to the information-theoretically minimum separation of Ω(√ log k) [RV17]. Our main technical contribution is a new technique, using degree-d multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.
translated by 谷歌翻译