聚集专家预测的问题在宽范围内普遍存在的机器学习,经济学,气候科学以及国家安全。尽管如此,我们对这个问题的理论理解相当浅薄。本文启动了在从广泛的信息结构上对专家知识选择专家知识的背景下的预测聚集。虽然在全面地处于完全普遍的情况下,无法实现非竞争性能保证,但我们表明,在我们呼叫\ emph {投影替代}的专家信息结构的条件下,我们可以做到这一点。投影替代条件是信息替代品的概念:越来越减少边际回报以学习专家的信号。我们表明,根据投影替代条件,专家预测的平均值就基本上提高了信任随机专家的战略。然后,我们考虑更允许的设置,其中聚合器可以访问先前。我们展示了通过平均专家预测,然后通过将其从前恒定的持续因素移动到之前的平均值来表明,这一增殖器的性能保证基本上比可能的情况更好。我们的结果为过去的极端化的实证研究提供了理论基础,并有助于向极端化的适当金额提供指导。
translated by 谷歌翻译
我们考虑了贝叶斯的预测汇总模型,在观察了关于未知二进制事件的私人信号之后,$ n $专家向校长报告了有关事件的后验信念,然后将报告汇总为事件的单个预测。专家的信号和事件的结果遵循校长未知的联合分配,但校长可以访问I.I.D.来自分布的“样本”,每个样本都是专家报告的元组(不是信号)和事件的实现。使用这些样品,主要目的是找到$ \ varepsilon $ - 易于最佳(贝叶斯)聚合器。我们研究此问题的样本复杂性。我们表明,对于任意离散分布,样本的数量必须至少为$ \ tilde \ omega(m^{n-2} / \ varepsilon)$,其中$ m $是每个专家信号空间的大小。该样本复杂性在专家$ n $的数量中成倍增长。但是,如果专家的信号是独立的,以实现事件的实现为条件,那么样本复杂性将大大降低到$ \ tilde o(1 / \ varepsilon^2)$,这不取决于$ n $。
translated by 谷歌翻译
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
translated by 谷歌翻译
我们基于电子价值开发假设检测理论,这是一种与p值不同的证据,允许毫不费力地结合来自常见场景中的几项研究的结果,其中决定执行新研究可能取决于以前的结果。基于E-V值的测试是安全的,即它们在此类可选的延续下保留I型错误保证。我们将增长速率最优性(GRO)定义为可选的连续上下文中的电力模拟,并且我们展示了如何构建GRO E-VARIABLE,以便为复合空缺和替代,强调模型的常规测试问题,并强调具有滋扰参数的模型。 GRO E值采取具有特殊前瞻的贝叶斯因子的形式。我们使用几种经典示例说明了该理论,包括一个样本安全T检验(其中右哈尔前方的右手前锋为GE)和2x2差价表(其中GRE之前与标准前沿不同)。分享渔业,奈曼和杰弗里斯·贝叶斯解释,电子价值观和相应的测试可以提供所有三所学校的追随者可接受的方法。
translated by 谷歌翻译
We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is a polynomial factor approximation to the this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal. Our work thus establishes fundamental lower and upper bounds on measuring distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.
translated by 谷歌翻译
部分可观察到的马尔可夫决策过程(POMDPS)是加强学习的自然和一般模型,以考虑到代理人对其当前国家的不确定性。在POMDPS的文献中,习惯性地假设在已知参数时计算最佳策略的规划Oracle,即使已知问题是计算的。几乎所有现有的规划算法都在指数时间内运行,缺乏可证明的性能保证,或者需要在每个可能的政策下对转换动态进行强烈的假设。在这项工作中,我们重新审视了规划问题并问:是否有自然和积极的假设,使计划变得容易?我们的主要结果是用于规划(一步)可观察POMDPS的QuasioInomial-time算法。具体而言,我们假设各国的分离良好的分布导致分开的观察分布,因此观察结果在每一步中至少有一些信息。至关重要的是,这个假设没有对POMDP的过渡动态的限制;尽管如此,它意味着近乎最佳的政策承认准简洁的描述,这通常不是真实的(在标准的硬度假设下)。我们的分析基于滤波器稳定性的新定量界限 - 即潜在状态的最佳滤波器的速率忘记其初始化。此外,在指数时间假设下,我们证明了在可观察POMDPS中规划的匹配硬度。
translated by 谷歌翻译
当在未知约束集中任意变化的分布中生成数据时,我们会考虑使用专家建议的预测。这种半反向的设置包括(在极端)经典的I.I.D.设置时,当未知约束集限制为单身人士时,当约束集是所有分布的集合时,不受约束的对抗设置。对冲状态中,对冲算法(长期以来已知是最佳的最佳速率(速率))最近被证明是对I.I.D.的最佳最小值。数据。在这项工作中,我们建议放松I.I.D.通过在约束集的所有自然顺序上寻求适应性来假设。我们在各个级别的Minimax遗憾中提供匹配的上限和下限,表明确定性学习率的对冲在极端之外是次优的,并证明人们可以在各个级别的各个层面上都能适应Minimax的遗憾。我们使用以下规范化领导者(FTRL)框架实现了这种最佳适应性,并采用了一种新型的自适应正则化方案,该方案隐含地缩放为当前预测分布的熵的平方根,而不是初始预测分布的熵。最后,我们提供了新的技术工具来研究FTRL沿半逆转频谱的统计性能。
translated by 谷歌翻译
聚类是无监督学习中的基本原始,它引发了丰富的计算挑战性推理任务。在这项工作中,我们专注于将$ D $ -dimential高斯混合的规范任务与未知(和可能的退化)协方差集成。最近的作品(Ghosh等人。恢复在高斯聚类实例中种植的某些隐藏结构。在许多类似的推理任务上的工作开始,这些较低界限强烈建议存在群集的固有统计到计算间隙,即群集任务是\ yringit {statistically}可能但没有\ texit {多项式 - 时间}算法成功。我们考虑的聚类任务的一个特殊情况相当于在否则随机子空间中找到种植的超立体载体的问题。我们表明,也许令人惊讶的是,这种特定的聚类模型\ extent {没有展示}统计到计算间隙,即使在这种情况下继续应用上述的低度和SOS下限。为此,我们提供了一种基于Lenstra - Lenstra - Lovasz晶格基础减少方法的多项式算法,该方法实现了$ D + 1 $样本的统计上最佳的样本复杂性。该结果扩展了猜想统计到计算间隙的问题的类问题可以通过“脆弱”多项式算法“关闭”,突出显示噪声在统计到计算间隙的发作中的关键而微妙作用。
translated by 谷歌翻译
我们考虑激励探索:一种多臂匪徒的版本,其中武器的选择由自私者控制,而算法只能发布建议。该算法控制信息流,信息不对称可以激励代理探索。先前的工作达到了最佳的遗憾率,直到乘法因素,这些因素根据贝叶斯先验而变得很大,并在武器数量上成倍规模扩展。采样每只手臂的一个更基本的问题一旦遇到了类似的因素。我们专注于激励措施的价格:出于激励兼容的目的,绩效的损失,广泛解释为。我们证明,如果用足够多的数据点初始化,则标准的匪徒汤普森采样是激励兼容的。因此,当收集这些数据点时,由于激励措施的绩效损失仅限于初始回合。这个问题主要降低到样本复杂性的问题:需要多少个回合?我们解决了这个问题,提供了匹配的上限和下限,并在各种推论中实例化。通常,最佳样品复杂性在“信念强度”中的武器数量和指数中是多项式。
translated by 谷歌翻译
我们给出了\ emph {list-codobable协方差估计}的第一个多项式时间算法。对于任何$ \ alpha> 0 $,我们的算法获取输入样本$ y \ subseteq \ subseteq \ mathbb {r}^d $ size $ n \ geq d^{\ mathsf {poly}(1/\ alpha)} $获得通过对抗损坏I.I.D的$(1- \ alpha)n $点。从高斯分布中的样本$ x $ size $ n $,其未知平均值$ \ mu _*$和协方差$ \ sigma _*$。在$ n^{\ mathsf {poly}(1/\ alpha)} $ time中,它输出$ k = k(\ alpha)=(1/\ alpha)^{\ mathsf {poly}的常数大小列表(1/\ alpha)} $候选参数,具有高概率,包含$(\ hat {\ mu},\ hat {\ sigma})$,使得总变化距离$ tv(\ Mathcal {n}(n})(n}(n})( \ mu _*,\ sigma _*),\ Mathcal {n}(\ hat {\ mu},\ hat {\ sigma}))<1-o _ {\ alpha}(1)$。这是距离的统计上最强的概念,意味着具有独立尺寸误差的参数的乘法光谱和相对Frobenius距离近似。我们的算法更普遍地适用于$(1- \ alpha)$ - 任何具有低度平方总和证书的分布$ d $的损坏,这是两个自然分析属性的:1)一维边际和抗浓度2)2度多项式的超收缩率。在我们工作之前,估计可定性设置的协方差的唯一已知结果是针对Karmarkar,Klivans和Kothari(2019),Raghavendra和Yau(2019和2019和2019和2019和2019年)的特殊情况。 2020年)和巴克西(Bakshi)和科塔里(Kothari)(2020年)。这些结果需要超级物理时间,以在基础维度中获得任何子构误差。我们的结果意味着第一个多项式\ emph {extcect}算法,用于列表可解码的线性回归和子空间恢复,尤其允许获得$ 2^{ - \ Mathsf { - \ Mathsf {poly}(d)} $多项式时间错误。我们的结果还意味着改进了用于聚类非球体混合物的算法。
translated by 谷歌翻译
在这里,我们重新审视线性二次估计的经典问题,即估计线性动力系统从嘈杂测量的轨迹。当测量噪声是高斯时,庆祝的卡尔曼滤波器提供了最佳估计器,但是当一个人偏离这种假设时,广泛众所周知,众所周知会破裂。当噪音重尾时。许多临时启发式机启发式就是处理异常值的实践中。在开创性的工作中,Schick和Mitter在测量噪声是高斯的已知无穷无尽的扰动时给予了可证明的保证,并提出了一个可以获得类似的禁令的重要担保的重要问题。在这项工作中,我们给出了一个真正强大的过滤器:当甚至恒定的测量分数都存在对比腐败时,我们给出了线性二次估计的第一个强化保证。该框架可以模拟重型且甚至是非静止噪声过程。我们的算法在与知道损坏位置的最佳算法竞争的意义上强调了卡尔曼过滤器。我们的作品处于挑战性的贝叶斯环境,其中测量数量与我们需要估计的复杂性缩放。此外,在线性动态系统中过去信息随时间衰减。我们开发了一套新技术,以强大地提取不同时间步长和不同时间尺度的信息。
translated by 谷歌翻译
大部分强化学习理论都建立在计算上难以实施的甲板上。专门用于在部分可观察到的马尔可夫决策过程(POMDP)中学习近乎最佳的政策,现有算法要么需要对模型动态(例如确定性过渡)做出强有力的假设,要么假设访问甲骨文作为解决艰难的计划或估算问题的访问子例程。在这项工作中,我们在合理的假设下开发了第一个用于POMDP的无Oracle学习算法。具体而言,我们给出了一种用于在“可观察” pomdps中学习的准化性时间端到端算法,其中可观察性是一个假设,即对国家而言,分离良好的分布诱导了分离良好的分布分布而不是观察。我们的技术规定了在不确定性下使用乐观原则来促进探索的更传统的方法,而是在构建策略涵盖的情况下提供了一种新颖的barycentric跨度应用。
translated by 谷歌翻译
我们在高斯分布下使用Massart噪声与Massart噪声进行PAC学习半个空间的问题。在Massart模型中,允许对手将每个点$ \ mathbf {x} $的标签与未知概率$ \ eta(\ mathbf {x})\ leq \ eta $,用于某些参数$ \ eta \ [0,1 / 2] $。目标是找到一个假设$ \ mathrm {opt} + \ epsilon $的错误分类错误,其中$ \ mathrm {opt} $是目标半空间的错误。此前已经在两个假设下研究了这个问题:(i)目标半空间是同质的(即,分离超平面通过原点),并且(ii)参数$ \ eta $严格小于$ 1/2 $。在此工作之前,当除去这些假设中的任何一个时,不知道非增长的界限。我们研究了一般问题并建立以下内容:对于$ \ eta <1/2 $,我们为一般半个空间提供了一个学习算法,采用样本和计算复杂度$ d ^ {o_ {\ eta}(\ log(1 / \ gamma) )))}} \ mathrm {poly}(1 / \ epsilon)$,其中$ \ gamma = \ max \ {\ epsilon,\ min \ {\ mathbf {pr} [f(\ mathbf {x})= 1], \ mathbf {pr} [f(\ mathbf {x})= -1] \} \} $是目标半空间$ f $的偏差。现有的高效算法只能处理$ \ gamma = 1/2 $的特殊情况。有趣的是,我们建立了$ d ^ {\ oomega(\ log(\ log(\ log(\ log))}}的质量匹配的下限,而是任何统计查询(SQ)算法的复杂性。对于$ \ eta = 1/2 $,我们为一般半空间提供了一个学习算法,具有样本和计算复杂度$ o_ \ epsilon(1)d ^ {o(\ log(1 / epsilon))} $。即使对于均匀半空间的子类,这个结果也是新的;均匀Massart半个空间的现有算法为$ \ eta = 1/2 $提供可持续的保证。我们与D ^ {\ omega(\ log(\ log(\ log(\ log(\ epsilon))} $的近似匹配的sq下限补充了我们的上限,这甚至可以为同类半空间的特殊情况而保持。
translated by 谷歌翻译
在随着时间变化的组合环境中的在线决策激励,我们研究了将离线算法转换为其在线对应物的问题。我们专注于使用贪婪算法对局部错误的贪婪算法进行恒定因子近似的离线组合问题。对于此类问题,我们提供了一个通用框架,该框架可有效地将稳健的贪婪算法转换为使用Blackwell的易近算法。我们证明,在完整信息设置下,由此产生的在线算法具有$ O(\ sqrt {t})$(近似)遗憾。我们进一步介绍了Blackwell易接近性的强盗扩展,我们称之为Bandit Blackwell的可接近性。我们利用这一概念将贪婪的稳健离线算法转变为匪(t^{2/3})$(近似)$(近似)的遗憾。展示了我们框架的灵活性,我们将脱机之间的转换应用于收入管理,市场设计和在线优化的几个问题,包括在线平台中的产品排名优化,拍卖中的储备价格优化以及supperular tossodular最大化。 。我们还将还原扩展到连续优化的类似贪婪的一阶方法,例如用于最大化连续强的DR单调下调功能,这些功能受到凸约束的约束。我们表明,当应用于这些应用程序时,我们的转型会导致新的后悔界限或改善当前已知界限。我们通过为我们的两个应用进行数值模拟来补充我们的理论研究,在这两种应用中,我们都观察到,转换的数值性能在实际情况下优于理论保证。
translated by 谷歌翻译
不同的代理需要进行预测。他们观察到相同的数据,但有不同的模型:他们预测使用不同的解释变量。我们研究哪个代理商认为它们具有最佳的预测能力 - 通过最小的主观后均匀平均平方预测误差来衡量 - 并且显示它如何取决于样本大小。使用小样品,我们呈现结果表明它是使用低维模型的代理。对于大型样品,通常是具有高维模型的代理,可能包括无关的变量,但从未排除相关的变量。我们将结果应用于拍卖生产资产拍卖中的获胜模型,以争辩于企业家和具有简单模型的投资者将在新部门过度代表,并了解解释横断面变异的“因素”的扩散资产定价文学中的预期股票回报。
translated by 谷歌翻译
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
translated by 谷歌翻译
我们研究了在存在$ \ epsilon $ - 对抗异常值的高维稀疏平均值估计的问题。先前的工作为此任务获得了该任务的样本和计算有效算法,用于辅助性Subgaussian分布。在这项工作中,我们开发了第一个有效的算法,用于强大的稀疏平均值估计,而没有对协方差的先验知识。对于$ \ Mathbb r^d $上的分布,带有“认证有限”的$ t $ tum-矩和足够轻的尾巴,我们的算法达到了$ o(\ epsilon^{1-1/t})$带有样品复杂性$的错误(\ epsilon^{1-1/t}) m =(k \ log(d))^{o(t)}/\ epsilon^{2-2/t} $。对于高斯分布的特殊情况,我们的算法达到了$ \ tilde o(\ epsilon)$的接近最佳错误,带有样品复杂性$ m = o(k^4 \ mathrm {polylog}(d)(d))/\ epsilon^^ 2 $。我们的算法遵循基于方形的总和,对算法方法的证明。我们通过统计查询和低度多项式测试的下限来补充上限,提供了证据,表明我们算法实现的样本时间 - 错误权衡在质量上是最好的。
translated by 谷歌翻译
In many modern applications of deep learning the neural network has many more parameters than the data points used for its training. Motivated by those practices, a large body of recent theoretical research has been devoted to studying overparameterized models. One of the central phenomena in this regime is the ability of the model to interpolate noisy data, but still have test error lower than the amount of noise in that data. arXiv:1906.11300 characterized for which covariance structure of the data such a phenomenon can happen in linear regression if one considers the interpolating solution with minimum $\ell_2$-norm and the data has independent components: they gave a sharp bound on the variance term and showed that it can be small if and only if the data covariance has high effective rank in a subspace of small co-dimension. We strengthen and complete their results by eliminating the independence assumption and providing sharp bounds for the bias term. Thus, our results apply in a much more general setting than those of arXiv:1906.11300, e.g., kernel regression, and not only characterize how the noise is damped but also which part of the true signal is learned. Moreover, we extend the result to the setting of ridge regression, which allows us to explain another interesting phenomenon: we give general sufficient conditions under which the optimal regularization is negative.
translated by 谷歌翻译
We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic template to learn predictors satisfying a guarantee we call Loss Outcome Indistinguishability. For a set of statistical tests--based on a collection of losses and hypothesis class--a predictor is Loss OI if it is indistinguishable (according to the tests) from Nature's true probabilities over outcomes. By design, Loss OI implies omniprediction in a direct and intuitive manner. We simplify Loss OI further, decomposing it into a calibration condition plus multiaccuracy for a class of functions derived from the loss and hypothesis classes. By careful analysis of this class, we give efficient constructions of omnipredictors for interesting classes of loss functions, including non-convex losses. This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration. We show that calibrated multiaccuracy implies Loss OI for the important set of convex losses arising from Generalized Linear Models, without requiring full multicalibration. For such losses, we show an equivalence between our computational notion of Loss OI and a geometric notion of indistinguishability, formulated as Pythagorean theorems in the associated Bregman divergence. We give an efficient algorithm for calibrated multiaccuracy with computational complexity comparable to that of multiaccuracy. In all, calibrated multiaccuracy offers an interesting tradeoff point between efficiency and generality in the omniprediction landscape.
translated by 谷歌翻译
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples.
translated by 谷歌翻译