监督学习需要大量标记数据,在存在隐私问题或标签成本高的情况下,这可能是一个巨大的瓶颈。为了克服这个问题,我们提出了一种新的弱监督学习设置,其中只需要类似的(S)数据对(两个例子属于同类)和未标记的(U)数据点而不是完全标记的数据,这被称为SU分类。我们证明了分类风险的无偏估计只能从SU数据中获得,其经验风险最小化的估计误差达到了最优参数收敛速度。最后,我们通过实验证明了所提出的方法的有效性。
translated by 谷歌翻译
在现实世界的分类问题中,例如在隐私感知情况下,数据点之间的成对相似性和不相似性可能比完全标记的数据更容易获得。为了处理这样的成对信息,已经提出了经验风险最小化方法,给出了可以仅从成对相似性和未标记数据计算的分类风险的无偏估计。然而,到目前为止,这个方向无法处理配对相似性。另一方面,半监督聚类是可以使用相似性和不相似性的方法之一。然而,它们通常需要对数据分布进行强有力的几何假设,例如多种假设,这可能会降低性能。在本文中,我们推导出一种无偏差风险估计器,它可以处理所有相似性/不相似性和未标记数据。我们从理论上建立了误差界限,并通过实验证明了我们的经验风险最小化方法的实用性。
translated by 谷歌翻译
本文旨在提供对对称损失的更好理解。首先,我们表明,使用对称损耗有利于平衡误码率(BER)最小化和接收器工作特性曲线(AUC)下的区域从损坏的标签最大化。其次,我们证明了对称损失的一般理论性质,包括分类校准条件,超额风险界限,条件风险最小化和AUC-一致性条件。第三,由于所有非负对称损失都是非凸的,我们提出了一个凸障碍铰链损失,它可以从对称条件中获益,尽管它在任何地方都不是对称的。最后,我们对来自损坏标签的BER和AUC优化进行了实验,以验证对称条件的相关性。
translated by 谷歌翻译
Many of the ordinal regression models that have been proposed in the literature can be seen as methods that minimize a convex surrogate of the zero-one, absolute, or squared loss functions. A key property that allows to study the statistical implications of such approximations is that of Fisher consistency. Fisher consistency is a desirable property for surrogate loss functions and implies that in the population setting, i.e., if the probability distribution that generates the data were available, then optimization of the surrogate would yield the best possible model. In this paper we will characterize the Fisher consistency of a rich family of surrogate loss functions used in the context of ordinal regression, including support vector ordinal regression, ORBoosting and least absolute deviation. We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification. We also derive excess risk bounds for a surrogate of the absolute error that generalize existing risk bounds for binary classification. Finally, our analysis suggests a novel surrogate of the squared error loss. We compare this novel surrogate with competing approaches on 9 different datasets. Our method shows to be highly competitive in practice, outperforming the least squares loss on 7 out of 9 datasets.
translated by 谷歌翻译
考虑一个分类问题,我们有标记和未标记的数据可用。我们表明,对于由基于凸基数的替代损失定义的线性分类器正在减少,不可能构建任何半监督方法,该方法能够保证通过标记和未标记数据上的这种替代损失测量的监督分类器的改进。对于也基于凸边缘的损失函数,我们证明可以进行安全改进。
translated by 谷歌翻译
来自正数和未标记数据的二元分类瓶颈(PU分类)是从测试边际分布中得出未标记模式的要求,并且假阳性错误的惩罚与假阴性错误相同。但是,这些要求在实践中并未得到满足。在本文中,我们将PU分类推广到类先验移位和非对称错误情景。在对Bayes最优分类器的分析的基础上,我们证明了给定一个测试类先验,在类先行移位下的PU分类等价于具有对称误差的PU分类。然后,我们提出了两个不同的框架来处理这些问题,即风险最小化框架和密度比估计框架。最后,我们证明了所提出的框架的有效性,并通过使用基准数据集的实验来比较两个框架。
translated by 谷歌翻译
我们考虑仅从正面和未标记的观察(PU学习)中学习二元分类器的问题。尽管最近在PUlearning中的研究成功地展示了理论和经验性能,但是mostexisting算法需要求解凸或非凸优化问题,因此不适合大规模数据集。在本文中,我们通过扩展先前为监督二元分类提出的工作(Sriperumbuduret al。,2012),提出了一种简单但理论上基于PU的学习算法。当假设空间是再生kernelHilbert空间中的闭合球时,所提出的PU学习算法产生闭合形式分类器。另外,我们建立估计误差的上限和超额风险。获得的估计误差界限是更尖锐的现有结果,并且超额风险界限不依赖于近似误差项。据我们所知,我们是第一个明确推导出PU学习领域超额风险的人。最后,我们使用合成和真实数据集进行了大量的数值实验,证明了所提算法的准确性,可扩展性和鲁棒性。
translated by 谷歌翻译
Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at "describing" the positive class by clustering-related methods , but one-class classification does not have the ability to tune hyper-parameters and their aim is not on "discriminating" positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.
translated by 谷歌翻译
正未标记(PU)学习解决了从正(P)和未标记(U)数据学习二进制分类器的问题。通常应用于负(N)数据难以完全标记的情况。然而,在许多实际情况中,收集仅包含所有可能N数据的一小部分的非代表性N集可以更容易。本文研究了一种新的分类框架,该框架在PU学习中融入了这种有偏见的N(bN)数据。训练N数据有偏差的事实也使得我们的工作与标准半监督学习的工作非常不同。我们提供了一种基于经验风险最小化的方法来解决这个PUbN分类问题。我们的方法可以被视为传统示例 - 重新加权算法的变体,每个示例的权重通过从PU学习中汲取灵感的初步步骤来计算。 Wealso导出所提出方法的估计误差界限。实验结果证明了我们的算法在PUbNlearning场景中的有效性,以及在几个基准数据集上的普通PU倾斜场景。
translated by 谷歌翻译
我们研究了具有拒绝的多类分类的问题,其中分类器可以选择不进行预测以避免关键的分类。我们考虑两种方法来解决这个问题:一个基于置信度得分的传统方法和一个基于分类器和拒绝器同时约束的更新方法。前面的一种现有方法侧重于一类特定的损失,其经验表现并不十分令人信服。在本文中,我们提出了基于置信度的多类别分类拒绝标准,它可以处理更多的一般损失并保证对贝叶斯最优解的校准。后一种方法是相对较新的,并且仅在二元情况下可用,而且是最好的知识。我们的第二个贡献是证明在多类别中通过这种方法几乎不可能校准到贝叶斯最优解。最后,我们进行实验以验证理论发现的相关性。
translated by 谷歌翻译
From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.
translated by 谷歌翻译
与标准分类范例相比,其中每个训练模式都给出了真实(或可能有噪声)的类,互补标签学习仅使用每个都配有补充标签的训练模式。这仅指定模式不属于的一个类。 。这些关于互补标签学习的论文提出了一种无偏估计的分类风险,只能从互补标记的数据中计算出来。但是,它需要对损耗函数有限制条件,因此不可能使用诸如softmax cross-entropyloss之类的普遍损失。最近,提出了具有softmax交叉熵损失的另一种配方,其具有一致性保证。然而,这个表述确实明确涉及风险估计。因此,通过交叉验证不可能进行模型/超参数选择---我们可能需要额外的通常标记的数据用于验证目的,这在当前设置中是不可用的。在本文中,我们给出了一个新的互补标签学习的一般框架,并为任意损失和模型推导出一个无偏的风险估计器。我们通过非负校正进一步改进风险估计,并通过实验证明其优越性。
translated by 谷歌翻译
我们为一般多类分类提出了一个强大的对抗性预测框架。我们的方法寻求预测分布,该预测分布针对最近匹配训练数据的统计数据的最坏情况条件标签分布(对抗分布)稳健地优化非凸和非连续多类丢失度量。虽然优化的损失度量是非凸的和非连续的,但框架的双重公式是凸优化问题,可以重新定义为具有规定的凸代理损失的风险最小化模型,我们称之为对抗代理损失。我们证明,对抗性代理损失了一般多类别分类问题的替代损失构造中存在的差距,同时更好地与原始多类损失保持一致,保证Fisher一致性,通过核心技巧实现融合特征空间的方式,并在实践中提供竞争性表现。
translated by 谷歌翻译
我们解决了在无监督域适应中测量两个域之间差异的问题。我们指出,当应用诸如深度神经网络的复杂模型时,现有的差异对策信息量较少。此外,对现有差异度量的估计在计算上可能是困难的并且仅限于二元分类任务。为了缓解这些缺点,我们提出了一种新颖的差异度量,对于许多不仅限于二元分类的任务,理论上基于并且可以有效地应用于复杂模型,非常容易估计。我们还提供易于解释的泛化界限,以解释在一些伪监测域适应中伪标记方法家族的有效性。最后,我们进行实验以验证我们提出的差异度量的有用性。
translated by 谷歌翻译
分布式鲁棒监督学习(DRSL)是构建可靠机器学习系统所必需的。当在该世界中部署机器学习时,其性能可能显着降低,因为测试数据可能跟随训练数据的不同分布。具有f-分歧的DRSL通过最小化对侧重新加权的训练损失来明确地考虑最坏情况的分布偏移。在本文中,我们分析了这个DRSL,重点是分类场景。由于DRSL是针对分布式移位场景而明确规划的,因此我们自然希望它能够提供可以积极处理移位分布的自我分类器。然而,令人惊讶的是,我们证明DRSL最终给出的分类器非常符合给定的训练分布,这太过于悲观。这种紧张主义来自两个来源:分类中使用的特定损失以及DRSL试图确定的各种分布过于宽泛的事实。在我们的分析的推动下,我们提出了简单的DRSL,它可以克服这种悲观情绪并凭经验证明其有效性。
translated by 谷歌翻译
无监督域自适应是问题设置,其中源域和目标域中的数据生成分布不同,并且目标域中的标签不可用。无监督域适应中的一个重要问题是如何衡量源域和目标域之间的差异。先前提出的不使用源域标签的差异需要高计算成本来估计并且可能导致目标域中的松散一般化误差限制。为了缓解这些问题,我们提出了一个新的差异,称为源引导差异($ S $ -disc),其中包括源域中的标签。因此,可以通过有限的样本收敛保证有效地计算$ S $ -disc。此外,我们证明$ S $ -disc可以提供比基于现有差异更严格的泛化误差限制。最后,我们报告了实验结果,证明了$ S $ -disc优于现有差异的优势。
translated by 谷歌翻译
Ordinal regression seeks class label predictions when the penalty incurred for mistakes increases according to an ordering over the labels. The absolute error is a canonical example. Many existing methods for this task reduce to binary classification problems and employ surrogate losses, such as the hinge loss. We instead derive uniquely defined surrogate ordinal regression loss functions by seeking the predictor that is robust to the worst-case approximations of training data labels, subject to matching certain provided training data statistics. We demonstrate the advantages of our approach over other surrogate losses based on hinge loss approximations using UCI ordinal prediction tasks.
translated by 谷歌翻译
我们考虑仅从正数据和未标记数据(PU学习)学习二元分类器并在案例控制场景下估计未标记数据中的类先验的问题。大多数最近的PU学习方法需要估计未标记数据中的类先验概率,并且事先用另一种方法估计。然而,这样的两步法首先估计类别然后训练分类器可能不是最佳方法,因为在训练分类器时不考虑类别先验的估计误差。在本文中,我们提出了一种新的方法来估计类别先验和训练分类。我们提出的方法易于实现且计算效率高。通过实验,我们证明了所提出方法的实际用途。
translated by 谷歌翻译
In the multi-view learning paradigm, the input variable is partitioned into two different views X 1 and X 2 and there is a target variable Y of interest. The underlying assumption is that either view alone is sufficient to predict the target Y accurately. This provides a natural semi-supervised learning setting in which unlabeled data can be used to eliminate hypothesis from either view, whose predictions tend to disagree with predictions based on the other view. This work explicitly formalizes an information theoretic, multi-view assumption and studies the multi-view paradigm in the PAC style semi-supervised framework of Balcan and Blum [2006]. Underlying the PAC style framework is that an incompatibility function is assumed to be known-roughly speaking, this incompatibility function is a means to score how good a function is based on the unlabeled data alone. Here, we show how to derive incompatibility functions for certain loss functions of interest, so that minimizing this incompatibility over unlabeled data helps reduce expected loss on future test cases. In particular, we show how the class of empirically successful coregularization algorithms fall into our framework and provide performance bounds (using the results in Rosenberg and Bartlett [2007], Farquhar et al. [2005]). We also provide a normative justification for canonical correlation analysis (CCA) as a dimensionality reduction technique. In particular, we show (for strictly convex loss functions of the formℓ(W.x.y) that we can first use CCA as dimensionality reduction technique and (if the multi-view assumption is satisfied) this projection does not throw away much predictive information about the target Y-the benefit being that subsequent learning with a labeled set need only work in this lower dimensional space. Abstract In the multi-view learning paradigm, the input variable is partitioned into two different views X 1 and X 2 and there is a target variable Y of interest. The underlying assumption is that either view alone is sufficient to predict the target Y accurately. This provides a natural semi-supervised learning setting in which unlabeled data can be used to eliminate hypothesis from either view, whose predictions tend to disagree with predictions based on the other view. This work explicitly formalizes an information theoretic, multi-view assumption and studies the multi-view paradigm in the PAC style semi-supervised framework of Balcan and Blum [2006]. Underlying the PAC style framework is that an incompatibility function is assumed to be known-roughly speaking, this incompatibility function is a means to score how good a function is based on the unlabeled data alone. Here, we show how to derive incompatibility functions for certain loss functions of interest, so that minimizing this incompatibility over unlabeled data helps reduce expected loss on future test cases. In particular, we show how the class of empirically successful co-regularization algorithms fall into our
translated by 谷歌翻译
模仿学习(IL)旨在通过示范来学习最优政策。然而,这种示范往往是不完美的,因为收集最优的政策是昂贵的。为了有效地从不完美的示范中学习,我们提出了一种利用置信度得分的新方法,它描述了示范的质量。更具体地说,我们提出了两种基于置信度的IL方法,即两步重要性加权IL(2IWIL)和生成性对抗性IL,具有不完美的证明和置信度(IC-GAIL)。我们证明,仅给出一小部分次优演示的置信度分数在理论上和经验上都显着提高了IL的性能。
translated by 谷歌翻译