We provide fast algorithms for overconstrained p regression and related problems: for an n × d input matrix A and vector b ∈ R n , in O(nd log n) time we reduce the problem min x∈R d Ax − b p to the same problem with input matrix˜Amatrix˜ matrix˜A of dimension s × d and corresponding˜bcorresponding˜corresponding˜b of dimension s × 1. Here, ˜ A and˜band˜and˜b are a coreset for the problem, consisting of sampled and rescaled rows of A and b; and s is independent of n and polynomial in d. Our results improve on the best previous algorithms when n d, for all p ∈ [1, ∞) except p = 2; in particular, they improve the O(nd 1.376+) running time of Sohler and Woodruff (STOC, 2011) for p = 1, that uses asymptot-ically fast matrix multiplication, and the O(nd 5 log n) time of Dasgupta et al. (SICOMP, 2009) for general p, that uses ellipsoidal rounding. We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general p problems. To complement this theory, we provide a detailed empirical evaluation of implementations of our algorithms for p = 1, comparing them with several related algorithms. Among other things, our empirical results clearly show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for p spaces and, for p = 1, a fast subspace embedding of independent interest that we call the Fast Cauchy Transform: a distribution over matrices Π : R n → R O(d log d) , found obliviously to A, that approximately preserves the 1 norms: that is, with large probability, simultaneously for all x, Ax 1 ≈ ΠAx 1 , with distortion O(d 2+η), for an arbitrarily small constant η > 0; and, moreover, ΠA can be computed in O(nd log d) time. The techniques underlying our Fast Cauchy Transform include fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables.
translated by 谷歌翻译