我们建议在线变更点检测的新过程。我们的方法扩展了一个想法,即在变更前和变换后分布之间最大化差异度量。这将导致一个适合参数和非参数场景的灵活过程。我们证明了程序的平均运行长度及其预期的检测延迟。通过关于合成和现实世界数据集的数值实验来说明该算法的效率。
translated by 谷歌翻译
在这项工作中,我们对香草生成的对抗网络(GAN)的非渐近性质进行了彻底的研究。We derive theoretical guarantees for the density estimation with GANs under a proper choice of the deep neural networks classes representing generators and discriminators.特别是,我们证明了由此产生的估计会聚到真实密度$ \ mathsf {p} ^ * $以jensen-shannon(js)以$(\ log {n} / n)^ {2 \Beta /(2 \ beta + d)} $ why $ n $是样本大小和$ \ beta $ commentines $ \ mathsf {p} ^ * $的平滑度。据我们所知,这是使用Vanilla Gans的浓度估计的文献中的第一个结果,这些融合率比N ^ { - 1/2} $更快地在政权$ \ beta> D / 2 $中。此外,我们表明所获得的速率是考虑的密度类别的最低限度最佳(最高因子因子)。
translated by 谷歌翻译
尽管U统计量在现代概率和统计学中存在着无处不在的,但其在依赖框架中的非反应分析可能被忽略了。在最近的一项工作中,已经证明了对统一的马尔可夫链的U级统计数据的新浓度不平等。在本文中,我们通过在三个不同的研究领域中进一步推动了当前知识状态,将这一理论突破付诸实践。首先,我们为使用MCMC方法估算痕量类积分运算符光谱的新指数不平等。新颖的是,这种结果适用于具有正征和负征值的内核,据我们所知,这是新的。此外,我们研究了使用成对损失函数和马尔可夫链样品的在线算法的概括性能。我们通过展示如何从任何在线学习者产生的假设序列中提取低风险假设来提供在线到批量转换结果。我们最终对马尔可夫链的不变度度量的密度进行了拟合优度测试的非反应分析。我们确定了一些类别的替代方案,基于$ L_2 $距离的测试具有规定的功率。
translated by 谷歌翻译
本文提出了新的偏差不等式,其在多武装强盗模型中的自适应采样下均匀地均匀。使用给定的一维指数家庭中的kullback-leibler发散来测量偏差,并且可以一次考虑几个臂。它们是通过基于分层的每个臂鞅构造而构建的,并通过将那些鞅乘以来获得。我们的偏差不平等允许我们根据广义概率比来分析一大类连续识别问题的概要概率比,并且为臂的装置的某些功能构造紧密的置信区间。
translated by 谷歌翻译
This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any H\"{o}lder smooth function up to a given approximation error in H\"{o}lder norms in such a way that all weights of this neural network are bounded by $1$. The latter feature is essential to control generalization errors in many statistical and machine learning applications.
translated by 谷歌翻译
我们在决策边界是一定规律的假设下,研究从无噪声训练样本的学习分类功能的问题。我们为这一估计问题建立了普遍的下限,对于连续决策边界的一般阶级。对于本地禁区的类别,我们发现最佳估计率基本上独立于底层维度,并且可以通过在适当类的深神经网络上通过经验风险最小化方法实现。这些结果基于$ l ^ 1 $和$ l ^ \ infty $ intty $ inthty $ off的禁区常规职能的新颖估计数。
translated by 谷歌翻译
量化概率分布之间的异化的统计分歧(SDS)是统计推理和机器学习的基本组成部分。用于估计这些分歧的现代方法依赖于通过神经网络(NN)进行参数化经验变化形式并优化参数空间。这种神经估算器在实践中大量使用,但相应的性能保证是部分的,并呼吁进一步探索。特别是,涉及的两个错误源之间存在基本的权衡:近似和经验估计。虽然前者需要NN课程富有富有表现力,但后者依赖于控制复杂性。我们通过非渐近误差界限基于浅NN的基于浅NN的估计的估算权,重点关注四个流行的$ \ mathsf {f} $ - 分离 - kullback-leibler,chi squared,squared hellinger,以及总变异。我们分析依赖于实证过程理论的非渐近功能近似定理和工具。界限揭示了NN尺寸和样品数量之间的张力,并使能够表征其缩放速率,以确保一致性。对于紧凑型支持的分布,我们进一步表明,上述上三次分歧的神经估算器以适当的NN生长速率接近Minimax率 - 最佳,实现了对数因子的参数速率。
translated by 谷歌翻译
This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.
translated by 谷歌翻译
我们提出了一种基于最大平均差异(MMD)的新型非参数两样本测试,该测试是通过具有不同核带宽的聚合测试来构建的。这种称为MMDAGG的聚合过程可确保对所使用的内核的收集最大化测试能力,而无需持有核心选择的数据(这会导致测试能力损失)或任意内核选择,例如中位数启发式。我们在非反应框架中工作,并证明我们的聚集测试对Sobolev球具有最小自适应性。我们的保证不仅限于特定的内核,而是符合绝对可集成的一维翻译不变特性内核的任何产品。此外,我们的结果适用于流行的数值程序来确定测试阈值,即排列和野生引导程序。通过对合成数据集和现实世界数据集的数值实验,我们证明了MMDAGG优于MMD内核适应的替代方法,用于两样本测试。
translated by 谷歌翻译
对于高维和非参数统计模型,速率最优估计器平衡平方偏差和方差是一种常见的现象。虽然这种平衡被广泛观察到,但很少知道是否存在可以避免偏差和方差之间的权衡的方法。我们提出了一般的策略,以获得对任何估计方差的下限,偏差小于预先限定的界限。这表明偏差差异折衷的程度是不可避免的,并且允许量化不服从其的方法的性能损失。该方法基于许多抽象的下限,用于涉及关于不同概率措施的预期变化以及诸如Kullback-Leibler或Chi-Sque-diversence的信息措施的变化。其中一些不平等依赖于信息矩阵的新概念。在该物品的第二部分中,将抽象的下限应用于几种统计模型,包括高斯白噪声模型,边界估计问题,高斯序列模型和高维线性回归模型。对于这些特定的统计应用,发生不同类型的偏差差异发生,其实力变化很大。对于高斯白噪声模型中集成平方偏置和集成方差之间的权衡,我们将较低界限的一般策略与减少技术相结合。这允许我们将原始问题与估计的估计器中的偏差折衷联动,以更简单的统计模型中具有额外的对称性属性。在高斯序列模型中,发生偏差差异的不同相位转换。虽然偏差和方差之间存在非平凡的相互作用,但是平方偏差的速率和方差不必平衡以实现最小估计速率。
translated by 谷歌翻译
我们考虑$ k $武装的随机土匪,并考虑到$ t $ t $的累积后悔界限。我们对同时获得最佳订单$ \ sqrt {kt} $的策略感兴趣,并与发行依赖的遗憾相关,即与$ \ kappa \ ln t $相匹配,该遗憾是最佳的。和Robbins(1985)以及Burnetas和Katehakis(1996),其中$ \ kappa $是最佳问题依赖性常数。这个常数的$ \ kappa $取决于所考虑的模型$ \ Mathcal {d} $(武器上可能的分布家族)。 M \'Enard and Garivier(2017)提供了在一维指数式家庭给出的模型的参数案例中实现这种双重偏见的策略,而Lattimore(2016,2018)为(Sub)高斯分布的家族而做到了这一点。差异小于$ 1 $。我们将此结果扩展到超过$ [0,1] $的所有分布的非参数案例。我们通过结合Audibert和Bubeck(2009)的MOSS策略来做到这一点,该策略享受了最佳订单$ \ sqrt {kt} $的无分配遗憾,以及Capp \'e等人的KL-UCB策略。 (2013年),我们为此提供了对最佳分布$ \ kappa \ ln t $遗憾的首次分析。我们能够在努力简化证明(以前已知的遗憾界限,因此进行的新分析)时,能够获得这种非参数两次审查结果;因此,本贡献的第二个优点是为基于$ k $武装的随机土匪提供基于索引的策略的经典后悔界限的证明。
translated by 谷歌翻译
深度分离结果提出了对深度神经网络过较浅的架构的好处的理论解释,建立前者具有卓越的近似能力。然而,没有已知的结果,其中更深的架构利用这种优势成为可提供的优化保证。我们证明,当数据由具有满足某些温和假设的径向对称的分布产生的数据时,梯度下降可以使用具有两层S形激活的深度2神经网络有效地学习球指示器功能,并且隐藏层固定在一起训练。由于众所周知,当使用用单层非线性的深度2网络(Safran和Shamir,2017)使用深度2网络时,球指示器难以近似于一定的重型分配,这建立了我们最好的知识,基于第一优化的分离结果,其中近似架构的近似效益在实践中可怕的。我们的证明技术依赖于随机特征方法,该方法减少了用单个神经元学习的问题,其中新工具需要在数据分布重尾时显示梯度下降的收敛。
translated by 谷歌翻译
我们研究了在存在$ \ epsilon $ - 对抗异常值的高维稀疏平均值估计的问题。先前的工作为此任务获得了该任务的样本和计算有效算法,用于辅助性Subgaussian分布。在这项工作中,我们开发了第一个有效的算法,用于强大的稀疏平均值估计,而没有对协方差的先验知识。对于$ \ Mathbb r^d $上的分布,带有“认证有限”的$ t $ tum-矩和足够轻的尾巴,我们的算法达到了$ o(\ epsilon^{1-1/t})$带有样品复杂性$的错误(\ epsilon^{1-1/t}) m =(k \ log(d))^{o(t)}/\ epsilon^{2-2/t} $。对于高斯分布的特殊情况,我们的算法达到了$ \ tilde o(\ epsilon)$的接近最佳错误,带有样品复杂性$ m = o(k^4 \ mathrm {polylog}(d)(d))/\ epsilon^^ 2 $。我们的算法遵循基于方形的总和,对算法方法的证明。我们通过统计查询和低度多项式测试的下限来补充上限,提供了证据,表明我们算法实现的样本时间 - 错误权衡在质量上是最好的。
translated by 谷歌翻译
三角形流量,也称为kn \“{o}的Rosenblatt测量耦合,包括用于生成建模和密度估计的归一化流模型的重要构建块,包括诸如实值的非体积保存变换模型的流行自回归流模型(真实的NVP)。我们提出了三角形流量统计模型的统计保证和样本复杂性界限。特别是,我们建立了KN的统计一致性和kullback-leibler估算器的rospblatt的kullback-leibler估计的有限样本会聚率使用实证过程理论的工具测量耦合。我们的结果突出了三角形流动下播放功能类的各向异性几何形状,优化坐标排序,并导致雅各比比流动的统计保证。我们对合成数据进行数值实验,以说明我们理论发现的实际意义。
translated by 谷歌翻译
Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to log nfactors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates.
translated by 谷歌翻译
专家(MOE)的混合是一种流行的统计和机器学习模型,由于其灵活性和效率,多年来一直引起关注。在这项工作中,我们将高斯门控的局部MOE(GLOME)和块对基因协方差局部MOE(Blome)回归模型在异质数据中呈现非线性关系,并在高维预测变量之间具有潜在的隐藏图形结构相互作用。这些模型从计算和理论角度提出了困难的统计估计和模型选择问题。本文致力于研究以混合成分数量,高斯平均专家的复杂性以及协方差矩阵的隐藏块 - 基因结构为特征的Glome或Blome模型集合中的模型选择问题。惩罚最大似然估计框架。特别是,我们建立了以弱甲骨文不平等的形式的非反应风险界限,但前提是罚款的下限。然后,在合成和真实数据集上证明了我们的模型的良好经验行为。
translated by 谷歌翻译
We consider the problem of estimating the optimal transport map between a (fixed) source distribution $P$ and an unknown target distribution $Q$, based on samples from $Q$. The estimation of such optimal transport maps has become increasingly relevant in modern statistical applications, such as generative modeling. At present, estimation rates are only known in a few settings (e.g. when $P$ and $Q$ have densities bounded above and below and when the transport map lies in a H\"older class), which are often not reflected in practice. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfies a Poincar\'e inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for bounded densities and H\"older transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
我们提出了一种统一的技术,用于顺序估计分布之间的凸面分歧,包括内核最大差异等积分概率度量,$ \ varphi $ - 像Kullback-Leibler发散,以及最佳运输成本,例如Wassersein距离的权力。这是通过观察到经验凸起分歧(部分有序)反向半角分离的实现来实现的,而可交换过滤耦合,其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备,将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列,在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格,保留对问题参数的相同依赖性(如适用的尺寸或字母大小)。这些结果也适用于更一般的凸起功能,如负差分熵,实证过程的高度和V型统计。
translated by 谷歌翻译
We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture its dependence on Bellman fluctuations, mixing time of the Markov chain, any mis-specification in the model, as well as the choice of weight function defining the estimator itself, and reveal some delicate interactions between mixing time and model mis-specification. For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i.i.d. sample transition pairs, whereas under mis-specification, temporal dependence in data inflates the statistical error. However, any such deterioration can be mitigated by increased look-ahead. We complement our upper bounds by proving minimax lower bounds that establish optimality of TD-based methods with appropriately chosen look-ahead and weighting, and reveal some fundamental differences between value function estimation and ordinary non-parametric regression.
translated by 谷歌翻译