训练数据的量是决定学习算法的概括能力的关键因素之一。直观地,人们期望随着训练数据的增加,错误率将降低。也许令人惊讶的是,自然尝试正式化这种直觉引起了有趣且具有挑战性的数学问题。例如,在他们关于模式识别的古典书籍中,Devroye,Gyorfi和Lugosi(1996)询问是否存在{单调}贝叶斯一致的算法。这个问题一直开放25年以上,直到最近Pestov(2021)使用单调贝叶斯一致算法的复杂构造解决了该问题进行二进制分类。我们得出了多类分类的一般结果,表明每个学习算法A都可以转换为具有相似性能的单调。此外,转换是有效的,仅使用黑盒甲骨文访问A。 Loog(2019),Viering and Loog(2021)和Mhammedi(2021)。我们的转换很容易意味着在各种情况下单调学习者:例如,它将Pestov的结果扩展到具有任意数量的标签的分类任务。这与针对二进制分类量身定制的Pestov的工作形成鲜明对比。另外,我们在单调算法的误差上提供统一的边界。这使我们的转换适用于无分销设置。例如,在PAC学习中,这意味着每个可学习的课程都接受单调PAC学习者。这通过Viering,Mey和Loog(2019)解决了问题; Viering and Loog(2021); Mhammedi(2021)。
translated by 谷歌翻译
Boosting是一种著名的机器学习方法,它基于将弱和适度不准确假设与强烈而准确的假设相结合的想法。我们研究了弱假设属于界限能力类别的假设。这个假设的灵感来自共同的惯例,即虚弱的假设是“易于学习的类别”中的“人数规则”。 (Schapire和Freund〜 '12,Shalev-Shwartz和Ben-David '14。)正式,我们假设弱假设类别具有有界的VC维度。我们关注两个主要问题:(i)甲骨文的复杂性:产生准确的假设需要多少个弱假设?我们设计了一种新颖的增强算法,并证明它绕过了由Freund和Schapire('95,'12)的经典下限。虽然下限显示$ \ omega({1}/{\ gamma^2})$弱假设有时是必要的,而有时则需要使用$ \ gamma $ -margin,但我们的新方法仅需要$ \ tilde {o}({1})({1}) /{\ gamma})$弱假设,前提是它们属于一类有界的VC维度。与以前的增强算法以多数票汇总了弱假设的算法不同,新的增强算法使用了更复杂(“更深”)的聚合规则。我们通过表明复杂的聚合规则实际上是规避上述下限是必要的,从而补充了这一结果。 (ii)表现力:通过提高有限的VC类的弱假设可以学习哪些任务?可以学到“遥远”的复杂概念吗?为了回答第一个问题,我们{介绍组合几何参数,这些参数捕获增强的表现力。}作为推论,我们为认真的班级的第二个问题提供了肯定的答案,包括半空间和决策树桩。一路上,我们建立并利用差异理论的联系。
translated by 谷歌翻译
可实现和不可知性的可读性的等价性是学习理论的基本现象。与PAC学习和回归等古典设置范围的变种,近期趋势,如对冲强劲和私人学习,我们仍然缺乏统一理论;等同性的传统证据往往是不同的,并且依赖于强大的模型特异性假设,如统一的收敛和样本压缩。在这项工作中,我们给出了第一个独立的框架,解释了可实现和不可知性的可读性的等价性:三行黑箱减少简化,统一,并在各种各样的环境中扩展了我们的理解。这包括没有已知的学报的模型,例如学习任意分布假设或一般损失,以及许多其他流行的设置,例如强大的学习,部分学习,公平学习和统计查询模型。更一般地,我们认为可实现和不可知的学习的等价性实际上是我们调用属性概括的更广泛现象的特殊情况:可以满足有限的学习算法(例如\噪声公差,隐私,稳定性)的任何理想性质假设类(可能在某些变化中)延伸到任何学习的假设类。
translated by 谷歌翻译
标签排名(LR)对应于学习一个假设的问题,以通过有限一组标签将功能映射到排名。我们采用了对LR的非参数回归方法,并获得了这一基本实际问题的理论绩效保障。我们在无噪声和嘈杂的非参数回归设置中介绍了一个用于标签排名的生成模型,并为两种情况下提供学习算法的示例复杂性界限。在无噪声环境中,我们研究了全排序的LR问题,并在高维制度中使用决策树和随机林提供计算有效的算法。在嘈杂的环境中,我们考虑使用统计观点的不完整和部分排名的LR更通用的情况,并使用多种多组分类的一种方法获得样本复杂性范围。最后,我们与实验补充了我们的理论贡献,旨在了解输入回归噪声如何影响观察到的输出。
translated by 谷歌翻译
经典的算法adaboost允许转换一个弱学习者,这是一种算法,它产生的假设比机会略好,成为一个强大的学习者,在获得足够的培训数据时,任意高精度。我们提出了一种新的算法,该算法从弱学习者中构建了一个强大的学习者,但比Adaboost和所有其他弱者到强大的学习者使用训练数据少,以实现相同的概括界限。样本复杂性下限表明我们的新算法使用最小可能的训练数据,因此是最佳的。因此,这项工作解决了从弱学习者中构建强大学习者的经典问题的样本复杂性。
translated by 谷歌翻译
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals.Our goal is a broad understanding of the resources required for private learning in terms of samples, computation time, and interaction. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. This result dispels the similarity between learning with noise and private learning (both must be robust to small changes in inputs), since parity is thought to be very hard to learn given random classification noise.Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Therefore, for local private learning algorithms, the similarity to learning with noise is stronger: local learning is equivalent to SQ learning, and SQ algorithms include most known noise-tolerant learning algorithms. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms. Because of the equivalence to SQ learning, this result also separates adaptive and nonadaptive SQ learning.
translated by 谷歌翻译
A classical result in learning theory shows the equivalence of PAC learnability of binary hypothesis classes and the finiteness of VC dimension. Extending this to the multiclass setting was an open problem, which was settled in a recent breakthrough result characterizing multiclass PAC learnability via the DS dimension introduced earlier by Daniely and Shalev-Shwartz. In this work we consider list PAC learning where the goal is to output a list of $k$ predictions. List learning algorithms have been developed in several settings before and indeed, list learning played an important role in the recent characterization of multiclass learnability. In this work we ask: when is it possible to $k$-list learn a hypothesis class? We completely characterize $k$-list learnability in terms of a generalization of DS dimension that we call the $k$-DS dimension. Generalizing the recent characterization of multiclass learnability, we show that a hypothesis class is $k$-list learnable if and only if the $k$-DS dimension is finite.
translated by 谷歌翻译
The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest sense: for any practically interesting Vapnik-Chervonenkis class, we provide an in-expectation optimal one-inclusion graph algorithm whose high probability risk bound cannot go beyond that implied by Markov's inequality. Our construction of these poorly performing one-inclusion graph algorithms uses Varshamov-Tenengolts error correcting codes. Our negative result has several implications. First, it shows that the same poor high-probability performance is inherited by several recent prediction strategies based on generalizations of the one-inclusion graph algorithm. Second, our analysis shows yet another statistical problem that enjoys an estimator that is provably optimal in expectation via a leave-one-out argument, but fails in the high-probability regime. This discrepancy occurs despite the boundedness of the binary loss for which arguments based on concentration inequalities often provide sharp high probability risk bounds.
translated by 谷歌翻译
给定真实的假设类$ \ mathcal {h} $,我们在什么条件下调查有一个差异的私有算法,它从$ \ mathcal {h} $给出的最佳假设.I.i.d.数据。灵感来自最近的成果的二进制分类的相关环境(Alon等,2019; Bun等,2020),其中显示了二进制类的在线学习是必要的,并且足以追随其私人学习,Jung等人。 (2020)显示,在回归的设置中,$ \ mathcal {h} $的在线学习是私人可读性所必需的。这里的在线学习$ \ mathcal {h} $的特点是其$ \ eta $-sequentient胖胖子的优势,$ {\ rm sfat} _ \ eta(\ mathcal {h})$,适用于所有$ \ eta> 0 $。就足够的私人学习条件而言,Jung等人。 (2020)显示$ \ mathcal {h} $私下学习,如果$ \ lim _ {\ eta \ downarrow 0} {\ rm sfat} _ \ eta(\ mathcal {h})$是有限的,这是一个相当限制的健康)状况。我们展示了在轻松的条件下,\ LIM \ INF _ {\ eta \ downarrow 0} \ eta \ cdot {\ rm sfat} _ \ eta(\ mathcal {h})= 0 $,$ \ mathcal {h} $私人学习,为\ \ rm sfat} _ \ eta(\ mathcal {h})$ \ eta \ dockarrow 0 $ divering建立第一个非参数私人学习保证。我们的技术涉及一种新颖的过滤过程,以输出非参数函数类的稳定假设。
translated by 谷歌翻译
Determining the optimal sample complexity of PAC learning in the realizable setting was a central open problem in learning theory for decades. Finally, the seminal work by Hanneke (2016) gave an algorithm with a provably optimal sample complexity. His algorithm is based on a careful and structured sub-sampling of the training data and then returning a majority vote among hypotheses trained on each of the sub-samples. While being a very exciting theoretical result, it has not had much impact in practice, in part due to inefficiency, since it constructs a polynomial number of sub-samples of the training data, each of linear size. In this work, we prove the surprising result that the practical and classic heuristic bagging (a.k.a. bootstrap aggregation), due to Breimann (1996), is in fact also an optimal PAC learner. Bagging pre-dates Hanneke's algorithm by twenty years and is taught in most undergraduate machine learning courses. Moreover, we show that it only requires a logarithmic number of sub-samples to reach optimality.
translated by 谷歌翻译
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a real-valued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and Kullback-Leibler regularization. We demonstrate how to apply the results to SVM for regression and classification.1. For a qualitative discussion about sensitivity analysis with links to other resources see e.g. http://sensitivity-analysis.jrc.cec.eu.int/
translated by 谷歌翻译
We first prove that Littlestone classes, those which model theorists call stable, characterize learnability in a new statistical model: a learner in this new setting outputs the same hypothesis, up to measure zero, with probability one, after a uniformly bounded number of revisions. This fills a certain gap in the literature, and sets the stage for an approximation theorem characterizing Littlestone classes in terms of a range of learning models, by analogy to definability of types in model theory. We then give a complete analogue of Shelah's celebrated (and perhaps a priori untranslatable) Unstable Formula Theorem in the learning setting, with algorithmic arguments taking the place of the infinite.
translated by 谷歌翻译
收购数据是机器学习的许多应用中的一项艰巨任务,只有一个人希望并且预期人口风险在单调上汇率增加(更好的性能)。事实证明,甚至对于最小化经验风险的最大限度的算法,甚至不令人惊讶的情况。在训练中的风险和不稳定的非单调行为表现出并出现在双重血统描述中的流行深度学习范式中。这些问题突出了目前对学习算法和泛化的理解缺乏了解。因此,追求这种行为的表征是至关重要的,这是至关重要的。在本文中,我们在弱假设下获得了一致和风险的单调算法,从而解决了一个打开问题Viering等。 2019关于如何避免风险曲线的非单调行为。我们进一步表明,风险单调性不一定以更糟糕的风险率的价格出现。为实现这一目标,我们推出了持有某些非I.I.D的独立利益的新经验伯恩斯坦的浓度不等式。鞅差异序列等进程。
translated by 谷歌翻译
我们建立了量子算法设计与电路下限之间的第一一般连接。具体来说,让$ \ mathfrak {c} $是一类多项式大小概念,假设$ \ mathfrak {c} $可以在统一分布下的成员查询,错误$ 1/2 - \ gamma $通过时间$ t $量子算法。我们证明如果$ \ gamma ^ 2 \ cdot t \ ll 2 ^ n / n $,则$ \ mathsf {bqe} \ nsubseteq \ mathfrak {c} $,其中$ \ mathsf {bqe} = \ mathsf {bque} [2 ^ {o(n)}] $是$ \ mathsf {bqp} $的指数时间模拟。在$ \ gamma $和$ t $中,此结果是最佳的,因为它不难学习(经典)时间$ t = 2 ^ n $(没有错误) ,或在Quantum Time $ t = \ mathsf {poly}(n)$以傅立叶采样为单位为1/2美元(2 ^ { - n / 2})$。换句话说,即使对这些通用学习算法的边际改善也会导致复杂性理论的主要后果。我们的证明在学习理论,伪随机性和计算复杂性的几个作品上构建,并且至关重要地,在非凡的经典学习算法与由Oliveira和Santhanam建立的电路下限之间的联系(CCC 2017)。扩展他们对量子学习算法的方法,结果产生了重大挑战。为此,我们展示了伪随机发电机如何以通用方式意味着学习到较低的连接,构建针对均匀量子计算的第一个条件伪随机发生器,并扩展了Impagliazzo,JaiSwal的本地列表解码算法。 ,Kabanets和Wigderson(Sicomp 2010)通过微妙的分析到量子电路。我们认为,这些贡献是独立的兴趣,可能会发现其他申请。
translated by 谷歌翻译
Recently, Robey et al. propose a notion of probabilistic robustness, which, at a high-level, requires a classifier to be robust to most but not all perturbations. They show that for certain hypothesis classes where proper learning under worst-case robustness is \textit{not} possible, proper learning under probabilistic robustness \textit{is} possible with sample complexity exponentially smaller than in the worst-case robustness setting. This motivates the question of whether proper learning under probabilistic robustness is always possible. In this paper, we show that this is \textit{not} the case. We exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are \textit{not} probabilistically robustly PAC learnable with \textit{any} proper learning rule. However, if we compare the output of the learner to the best hypothesis for a slightly \textit{stronger} level of probabilistic robustness, we show that not only is proper learning \textit{always} possible, but it is possible via empirical risk minimization.
translated by 谷歌翻译
在这项工作中,我们调查了Steinke和Zakynthinou(2020)的“条件互信息”(CMI)框架的表现力,以及使用它来提供统一框架,用于在可实现的环境中证明泛化界限。我们首先证明可以使用该框架来表达任何用于从一类界限VC维度输出假设的任何学习算法的非琐碎(但是次优)界限。我们证明了CMI框架在用于学习半个空间的预期风险上产生最佳限制。该结果是我们的一般结果的应用,显示稳定的压缩方案Bousquet al。 (2020)尺寸$ k $有统一有限的命令$ o(k)$。我们进一步表明,适当学习VC类的固有限制与恒定的CMI存在适当的学习者的存在,并且它意味着对Steinke和Zakynthinou(2020)的开放问题的负面分辨率。我们进一步研究了价值最低限度(ERMS)的CMI的级别$ H $,并表明,如果才能使用有界CMI输出所有一致的分类器(版本空间),只有在$ H $具有有界的星号(Hanneke和杨(2015)))。此外,我们证明了一般性的减少,表明“休假”分析通过CMI框架表示。作为推论,我们研究了Haussler等人提出的一包图算法的CMI。 (1994)。更一般地说,我们表明CMI框架是通用的,因为对于每一项一致的算法和数据分布,当且仅当其评估的CMI具有样品的载位增长时,预期的风险就会消失。
translated by 谷歌翻译
我们研究了称为“乐观速率”(Panchenko 2002; Srebro等,2010)的统一收敛概念,用于与高斯数据的线性回归。我们的精致分析避免了现有结果中的隐藏常量和对数因子,这已知在高维设置中至关重要,特别是用于了解插值学习。作为一个特殊情况,我们的分析恢复了Koehler等人的保证。(2021年),在良性过度的过度条件下,严格地表征了低规范内插器的人口风险。但是,我们的乐观速度绑定还分析了具有任意训练错误的预测因子。这使我们能够在随机设计下恢复脊和套索回归的一些经典统计保障,并有助于我们在过度参数化制度中获得精确了解近端器的过度风险。
translated by 谷歌翻译
我们考虑在对抗环境中的强大学习模型。学习者获得未腐败的培训数据,并访问可能受到测试期间对手影响的可能腐败。学习者的目标是建立一个强大的分类器,该分类器将在未来的对抗示例中进行测试。每个输入的对手仅限于$ k $可能的损坏。我们将学习者 - 对手互动建模为零和游戏。该模型与Schmidt等人的对抗示例模型密切相关。 (2018); Madry等。 (2017)。我们的主要结果包括对二进制和多类分类的概括界限,以及实现的情况(回归)。对于二元分类设置,我们都拧紧Feige等人的概括。 (2015年),也能够处理无限假设类别。样本复杂度从$ o(\ frac {1} {\ epsilon^4} \ log(\ frac {| h |} {\ delta})$ to $ o \ big(\ frac {1} { epsilon^2}(kvc(h)\ log^{\ frac {3} {2}+\ alpha}(kvc(h))+\ log(\ frac {1} {\ delta} {\ delta})\ big)\ big)\ big)$ for任何$ \ alpha> 0 $。此外,我们将算法和概括从二进制限制到多类和真实价值的案例。一路上,我们获得了脂肪震惊的尺寸和$ k $ fold的脂肪的尺寸和Rademacher复杂性的结果最大值的功能类别;这些可能具有独立的兴趣。对于二进制分类,Feige等人(2015年)使用遗憾的最小化算法和Erm Oracle作为黑匣子;我们适应了多类和回归设置。该算法为我们提供了给定培训样本中的球员的近乎最佳政策。
translated by 谷歌翻译
我们在禁用的对手存在下研究公平分类,允许获得$ \ eta $,选择培训样本的任意$ \ eta $ -flaction,并任意扰乱受保护的属性。由于战略误报,恶意演员或归责的错误,受保护属性可能不正确的设定。和现有的方法,使随机或独立假设对错误可能不满足其在这种对抗环境中的保证。我们的主要贡献是在这种对抗的环境中学习公平分类器的优化框架,这些普遍存在的准确性和公平性提供了可证明的保证。我们的框架适用于多个和非二进制保护属性,专为大类线性分数公平度量设计,并且还可以处理除了受保护的属性之外的扰动。我们证明了我们框架的近密性,对自然假设类别的保证:没有算法可以具有明显更好的准确性,并且任何具有更好公平性的算法必须具有较低的准确性。凭经验,我们评估了我们对统计率的统计税务统计税率为一个对手的统计税率产生的分类机。
translated by 谷歌翻译
所有著名的机器学习算法构成了受监督和半监督的学习工作,只有在一个共同的假设下:培训和测试数据遵循相同的分布。当分布变化时,大多数统计模型必须从新收集的数据中重建,对于某些应用程序,这些数据可能是昂贵或无法获得的。因此,有必要开发方法,以减少在相关领域中可用的数据并在相似领域中进一步使用这些数据,从而减少需求和努力获得新的标签样品。这引起了一个新的机器学习框架,称为转移学习:一种受人类在跨任务中推断知识以更有效学习的知识能力的学习环境。尽管有大量不同的转移学习方案,但本调查的主要目的是在特定的,可以说是最受欢迎的转移学习中最受欢迎的次级领域,概述最先进的理论结果,称为域适应。在此子场中,假定数据分布在整个培训和测试数据中发生变化,而学习任务保持不变。我们提供了与域适应性问题有关的现有结果的首次最新描述,该结果涵盖了基于不同统计学习框架的学习界限。
translated by 谷歌翻译