我们在高维批处理设置中提出了统计上健壮和计算高效的线性学习方法,其中功能$ d $的数量可能超过样本量$ n $。在通用学习环境中,我们采用两种算法,具体取决于所考虑的损失函数是否为梯度lipschitz。然后,我们将我们的框架实例化,包括几种应用程序,包括香草稀疏,群 - 帕克斯和低升级矩阵恢复。对于每种应用,这导致了有效而强大的学习算法,这些算法在重尾分布和异常值的存在下达到了近乎最佳的估计率。对于香草$ S $ -SPARSITY,我们能够以重型尾巴和$ \ eta $ - 腐败的计算成本与非企业类似物相当的计算成本达到$ s \ log(d)/n $速率。我们通过开放源代码$ \ mathtt {python} $库提供了有效的算法实现文献中提出的最新方法。
translated by 谷歌翻译
在本文中,我们利用过度参数化来设计高维单索索引模型的无规矩算法,并为诱导的隐式正则化现象提供理论保证。具体而言,我们研究了链路功能是非线性且未知的矢量和矩阵单索引模型,信号参数是稀疏向量或低秩对称矩阵,并且响应变量可以是重尾的。为了更好地理解隐含正规化的角色而没有过度的技术性,我们假设协变量的分布是先验的。对于载体和矩阵设置,我们通过采用分数函数变换和专为重尾数据的强大截断步骤来构造过度参数化最小二乘损耗功能。我们建议通过将无规则化的梯度下降应用于损耗函数来估计真实参数。当初始化接近原点并且步骤中足够小时,我们证明了所获得的解决方案在载体和矩阵案件中实现了最小的收敛统计速率。此外,我们的实验结果支持我们的理论调查结果,并表明我们的方法在$ \ ell_2 $ -staticatisticated率和变量选择一致性方面具有明确的正则化的经验卓越。
translated by 谷歌翻译
We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $\mu$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $\mu$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $\tau$, having an additive $\log(1/\tau)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.
translated by 谷歌翻译
套索是一种高维回归的方法,当时,当协变量$ p $的订单数量或大于观测值$ n $时,通常使用它。由于两个基本原因,经典的渐近态性理论不适用于该模型:$(1)$正规风险是非平滑的; $(2)$估算器$ \ wideHat {\ boldsymbol {\ theta}} $与true参数vector $ \ boldsymbol {\ theta}^*$无法忽略。结果,标准的扰动论点是渐近正态性的传统基础。另一方面,套索估计器可以精确地以$ n $和$ p $大,$ n/p $的订单为一。这种表征首先是在使用I.I.D的高斯设计的情况下获得的。协变量:在这里,我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各种数量的分布之间的距离上建立了非反应界限,它们在合适的稀疏类别中均匀地固定在信号上$ \ boldsymbol {\ theta}^*$。作为应用程序,我们研究了借助拉索的分布,并表明需要校正程度对于计算有效的置信区间是必要的。
translated by 谷歌翻译
现代统计应用常常涉及最小化可能是非流动和/或非凸起的目标函数。本文侧重于广泛的Bregman-替代算法框架,包括本地线性近似,镜像下降,迭代阈值,DC编程以及许多其他实例。通过广义BREGMAN功能的重新发出使我们能够构建合适的误差测量并在可能高维度下建立非凸起和非凸起和非球形目标的全球收敛速率。对于稀疏的学习问题,在一些规律性条件下,所获得的估算器作为代理人的固定点,尽管不一定是局部最小化者,但享受可明确的统计保障,并且可以证明迭代顺序在所需的情况下接近统计事实准确地快速。本文还研究了如何通过仔细控制步骤和放松参数来设计基于适应性的动力的加速度而不假设凸性或平滑度。
translated by 谷歌翻译
本文为信号去噪提供了一般交叉验证框架。然后将一般框架应用于非参数回归方法,例如趋势过滤和二元推车。然后显示所得到的交叉验证版本以获得最佳调谐的类似物所熟知的几乎相同的收敛速度。没有任何先前的趋势过滤或二元推车的理论分析。为了说明框架的一般性,我们还提出并研究了两个基本估算器的交叉验证版本;套索用于高维线性回归和矩阵估计的奇异值阈值阈值。我们的一般框架是由Chatterjee和Jafarov(2015)的想法的启发,并且可能适用于使用调整参数的广泛估算方法。
translated by 谷歌翻译
High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.
translated by 谷歌翻译
我们提出了一种基于优化的基于优化的框架,用于计算差异私有M估算器以及构建差分私立置信区的新方法。首先,我们表明稳健的统计数据可以与嘈杂的梯度下降或嘈杂的牛顿方法结合使用,以便分别获得具有全局线性或二次收敛的最佳私人估算。我们在局部强大的凸起和自我协调下建立当地和全球融合保障,表明我们的私人估算变为对非私人M估计的几乎最佳附近的高概率。其次,我们通过构建我们私有M估计的渐近方差的差异私有估算来解决参数化推断的问题。这自然导致近​​似枢轴统计,用于构建置信区并进行假设检测。我们展示了偏置校正的有效性,以提高模拟中的小样本实证性能。我们说明了我们在若干数值例子中的方法的好处。
translated by 谷歌翻译
在本文中,我们提出了一种均匀抖动的一位量化方案,以进行高维统计估计。该方案包含截断,抖动和量化,作为典型步骤。作为规范示例,量化方案应用于三个估计问题:稀疏协方差矩阵估计,稀疏线性回归和矩阵完成。我们研究了高斯和重尾政权,假定重尾数据的基本分布具有有限的第二或第四刻。对于每个模型,我们根据一位量化的数据提出新的估计器。在高斯次级政权中,我们的估计器达到了对数因素的最佳最小速率,这表明我们的量化方案几乎没有额外的成本。在重尾状态下,虽然我们的估计量基本上变慢,但这些结果是在这种单位量化和重型尾部设置中的第一个结果,或者比现有可比结果表现出显着改善。此外,我们为一位压缩传感和一位矩阵完成的问题做出了巨大贡献。具体而言,我们通过凸面编程将一位压缩感传感扩展到次高斯甚至是重尾传感向量。对于一位矩阵完成,我们的方法与标准似然方法基本不同,并且可以处理具有未知分布的预量化随机噪声。提出了有关合成数据的实验结果,以支持我们的理论分析。
translated by 谷歌翻译
This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.
translated by 谷歌翻译
我们研究了用于线性回归的主动采样算法,该算法仅旨在查询目标向量$ b \ in \ mathbb {r} ^ n $的少量条目,并将近最低限度输出到$ \ min_ {x \ In \ mathbb {r} ^ d} \ | ax-b \ | $,其中$ a \ in \ mathbb {r} ^ {n \ times d} $是一个设计矩阵和$ \ | \ cdot \ | $是一些损失函数。对于$ \ ell_p $ norm回归的任何$ 0 <p <\ idty $,我们提供了一种基于Lewis权重采样的算法,其使用只需$ \ tilde {o}输出$(1+ \ epsilon)$近似解决方案(d ^ {\ max(1,{p / 2})} / \ mathrm {poly}(\ epsilon))$查询到$ b $。我们表明,这一依赖于$ D $是最佳的,直到对数因素。我们的结果解决了陈和Derezi的最近开放问题,陈和Derezi \'{n} Ski,他们为$ \ ell_1 $ norm提供了附近的最佳界限,以及$ p \中的$ \ ell_p $回归的次优界限(1,2) $。我们还提供了$ O的第一个总灵敏度上限(D ^ {\ max \ {1,p / 2 \} \ log ^ 2 n)$以满足最多的$ p $多项式增长。这改善了Tukan,Maalouf和Feldman的最新结果。通过将此与我们的技术组合起来的$ \ ell_p $回归结果,我们获得了一个使$ \ tilde o的活动回归算法(d ^ {1+ \ max \ {1,p / 2 \}} / \ mathrm {poly}。 (\ epsilon))$疑问,回答陈和德里兹的另一个打开问题{n}滑雪。对于Huber损失的重要特殊情况,我们进一步改善了我们对$ \ tilde o的主动样本复杂性的绑定(d ^ {(1+ \ sqrt2)/ 2} / \ epsilon ^ c)$和非活跃$ \ tilde o的样本复杂性(d ^ {4-2 \ sqrt 2} / \ epsilon ^ c)$,由于克拉克森和伍德拉夫而改善了Huber回归的以前的D ^ 4 $。我们的敏感性界限具有进一步的影响,使用灵敏度采样改善了各种先前的结果,包括orlicz规范子空间嵌入和鲁棒子空间近似。最后,我们的主动采样结果为每种$ \ ell_p $ norm提供的第一个Sublinear时间算法。
translated by 谷歌翻译
In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures-arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving optimal statistical performance. A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively. We prove statistical error rates for three kinds of population loss functions: strongly convex, nonstrongly convex, and smooth non-convex. In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. To achieve better communication efficiency, we further propose a median-based distributed algorithm that is provably robust, and uses only one communication round. For strongly convex quadratic loss, we show that this algorithm achieves the same optimal error rate as the robust distributed gradient descent algorithms.
translated by 谷歌翻译
我们考虑最小化高维目标函数的问题,该功能可以包括正则化术语,使用(可能的噪声)评估该功能。这种优化也称为无衍生,零阶或黑匣子优化。我们提出了一个新的$ \ textbf {z} $ feroth - $ \ textbf {o} $ rder $ \ textbf {r} $ ptimization方法,称为zoro。当潜在的梯度大致稀疏时,Zoro需要很少的客观函数评估,以获得降低目标函数的新迭代。我们通过自适应,随机梯度估计器实现这一点,然后是不精确的近端梯度方案。在一个新颖的大致稀疏梯度假设和各种不同的凸面设置下,我们显示了zoro的(理论和实证)收敛速率仅对对数依赖于问题尺寸。数值实验表明,Zoro在合成和实际数据集中优于具有相似假设的现有方法。
translated by 谷歌翻译
Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularization in the context of classification. After contrasting this setting with that of regression and inverse problems, we develop an iterative regularization approach based on the use of the hinge loss function. More precisely we consider a diagonal approach for a family of algorithms for which we prove convergence as well as rates of convergence. Our approach compares favorably with other alternatives, as confirmed also in numerical simulations.
translated by 谷歌翻译
我们开发机器以设计有效的可计算和一致的估计,随着观察人数而达到零的估计误差,因为观察的次数增长,当面对可能损坏的答复,除了样本的所有品,除了每种量之外的ALL。作为具体示例,我们调查了两个问题:稀疏回归和主成分分析(PCA)。对于稀疏回归,我们实现了最佳样本大小的一致性$ n \ gtrsim(k \ log d)/ \ alpha ^ $和最佳错误率$ o(\ sqrt {(k \ log d)/(n \ cdot \ alpha ^ 2))$ N $是观察人数,$ D $是尺寸的数量,$ k $是参数矢量的稀疏性,允许在数量的数量中为逆多项式进行逆多项式样品。在此工作之前,已知估计是一致的,当Inliers $ \ Alpha $ IS $ O(1 / \ log \ log n)$,即使是(非球面)高斯设计矩阵时也是一致的。结果在弱设计假设下持有,并且在这种一般噪声存在下仅被D'Orsi等人最近以密集的设置(即一般线性回归)显示。 [DNS21]。在PCA的上下文中,我们在参数矩阵上的广泛尖端假设下获得最佳错误保证(通常用于矩阵完成)。以前的作品可以仅在假设下获得非琐碎的保证,即与最基于的测量噪声以$ n $(例如,具有方差1 / n ^ 2 $的高斯高斯)。为了设计我们的估算,我们用非平滑的普通方(如$ \ ell_1 $ norm或核规范)装备Huber丢失,并以一种新的方法来分析损失的新方法[DNS21]的方法[DNS21]。功能。我们的机器似乎很容易适用于各种估计问题。
translated by 谷歌翻译
异常值广泛发生在大数据应用中,可能严重影响统计估计和推理。在本文中,引入了抗强估计的框架,以强制任意给出的损耗函数。它与修剪方法密切连接,并且包括所有样本的显式外围参数,这反过来促进计算,理论和参数调整。为了解决非凸起和非体性的问题,我们开发可扩展的算法,以实现轻松和保证快速收敛。特别地,提出了一种新的技术来缓解对起始点的要求,使得在常规数据集上,可以大大减少数据重采样的数量。基于组合的统计和计算处理,我们能够超越M估计来执行非因思分析。所获得的抗性估算器虽然不一定全局甚至是局部最佳的,但在低维度和高维度中享有最小的速率最优性。回归,分类和神经网络的实验表明,在总异常值发生的情况下提出了拟议方法的优异性能。
translated by 谷歌翻译
We establish a simple connection between robust and differentially-private algorithms: private mechanisms which perform well with very high probability are automatically robust in the sense that they retain accuracy even if a constant fraction of the samples they receive are adversarially corrupted. Since optimal mechanisms typically achieve these high success probabilities, our results imply that optimal private mechanisms for many basic statistics problems are robust. We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems. Assuming the Brennan-Bresler secret-leakage planted clique conjecture, we demonstrate a fundamental tradeoff between computational efficiency, privacy leakage, and success probability for sparse mean estimation. Private algorithms which match this tradeoff are not yet known -- we achieve that (up to polylogarithmic factors) in a polynomially-large range of parameters via the Sum-of-Squares method. To establish an information-computation gap for private sparse mean estimation, we also design new (exponential-time) mechanisms using fewer samples than efficient algorithms must use. Finally, we give evidence for privacy-induced information-computation gaps for several other statistics and learning problems, including PAC learning parity functions and estimation of the mean of a multivariate Gaussian.
translated by 谷歌翻译
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples.
translated by 谷歌翻译
我们给出了第一个多项式算法来估计$ d $ -variate概率分布的平均值,从$ \ tilde {o}(d)$独立的样本受到纯粹的差异隐私的界限。此问题的现有算法无论是呈指数运行时间,需要$ \ OMEGA(D ^ {1.5})$样本,或仅满足较弱的集中或近似差分隐私条件。特别地,所有先前的多项式算法都需要$ d ^ {1+ \ omega(1)} $ samples,以保证“加密”高概率,1-2 ^ { - d ^ {\ omega(1) $,虽然我们的算法保留$ \ tilde {o}(d)$ SAMPS复杂性即使在此严格设置中也是如此。我们的主要技术是使用强大的方块方法(SOS)来设计差异私有算法的新方法。算法的证据是在高维算法统计数据中的许多近期作品中的一个关键主题 - 显然需要指数运行时间,但可以通过低度方块证明可以捕获其分析可以自动变成多项式 - 时间算法具有相同的可证明担保。我们展示了私有算法的类似证据现象:工作型指数机制的实例显然需要指数时间,但可以用低度SOS样张分析的指数时间,可以自动转换为多项式差异私有算法。我们证明了捕获这种现象的元定理,我们希望在私人算法设计中广泛使用。我们的技术还在高维度之间绘制了差异私有和强大统计数据之间的新连接。特别是通过我们的校验算法镜头来看,几次研究的SOS证明在近期作品中的算法稳健统计中直接产生了我们差异私有平均估计算法的关键组成部分。
translated by 谷歌翻译
近期在应用于培训深度神经网络和数据分析中的其他优化问题中的非凸优化的优化算法的兴趣增加,我们概述了最近对非凸优化优化算法的全球性能保证的理论结果。我们从古典参数开始,显示一般非凸面问题无法在合理的时间内有效地解决。然后,我们提供了一个问题列表,可以通过利用问题的结构来有效地找到全球最小化器,因为可能的问题。处理非凸性的另一种方法是放宽目标,从找到全局最小,以找到静止点或局部最小值。对于该设置,我们首先为确定性一阶方法的收敛速率提出了已知结果,然后是最佳随机和随机梯度方案的一般理论分析,以及随机第一阶方法的概述。之后,我们讨论了非常一般的非凸面问题,例如最小化$ \ alpha $ -weakly-are-convex功能和满足Polyak-lojasiewicz条件的功能,这仍然允许获得一阶的理论融合保证方法。然后,我们考虑更高阶和零序/衍生物的方法及其收敛速率,以获得非凸优化问题。
translated by 谷歌翻译