2018-02-12

translated by 谷歌翻译

2019-01-27

translated by 谷歌翻译

2016-12-28

translated by 谷歌翻译

2017-10-19
Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at "describing" the positive class by clustering-related methods , but one-class classification does not have the ability to tune hyper-parameters and their aim is not on "discriminating" positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.
translated by 谷歌翻译

2018-09-19

translated by 谷歌翻译

2018-10-01

translated by 谷歌翻译

2019-01-30

translated by 谷歌翻译
Many of the ordinal regression models that have been proposed in the literature can be seen as methods that minimize a convex surrogate of the zero-one, absolute, or squared loss functions. A key property that allows to study the statistical implications of such approximations is that of Fisher consistency. Fisher consistency is a desirable property for surrogate loss functions and implies that in the population setting, i.e., if the probability distribution that generates the data were available, then optimization of the surrogate would yield the best possible model. In this paper we will characterize the Fisher consistency of a rich family of surrogate loss functions used in the context of ordinal regression, including support vector ordinal regression, ORBoosting and least absolute deviation. We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification. We also derive excess risk bounds for a surrogate of the absolute error that generalize existing risk bounds for binary classification. Finally, our analysis suggests a novel surrogate of the squared error loss. We compare this novel surrogate with competing approaches on 9 different datasets. Our method shows to be highly competitive in practice, outperforming the least squares loss on 7 out of 9 datasets.
translated by 谷歌翻译
From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.
translated by 谷歌翻译

2016-11-07

translated by 谷歌翻译

2019-01-30

translated by 谷歌翻译

2018-09-11

translated by 谷歌翻译

translated by 谷歌翻译

2018-10-10

translated by 谷歌翻译
In the multi-view learning paradigm, the input variable is partitioned into two different views X 1 and X 2 and there is a target variable Y of interest. The underlying assumption is that either view alone is sufficient to predict the target Y accurately. This provides a natural semi-supervised learning setting in which unlabeled data can be used to eliminate hypothesis from either view, whose predictions tend to disagree with predictions based on the other view. This work explicitly formalizes an information theoretic, multi-view assumption and studies the multi-view paradigm in the PAC style semi-supervised framework of Balcan and Blum [2006]. Underlying the PAC style framework is that an incompatibility function is assumed to be known-roughly speaking, this incompatibility function is a means to score how good a function is based on the unlabeled data alone. Here, we show how to derive incompatibility functions for certain loss functions of interest, so that minimizing this incompatibility over unlabeled data helps reduce expected loss on future test cases. In particular, we show how the class of empirically successful coregularization algorithms fall into our framework and provide performance bounds (using the results in Rosenberg and Bartlett [2007], Farquhar et al. [2005]). We also provide a normative justification for canonical correlation analysis (CCA) as a dimensionality reduction technique. In particular, we show (for strictly convex loss functions of the formℓ(W.x.y) that we can first use CCA as dimensionality reduction technique and (if the multi-view assumption is satisfied) this projection does not throw away much predictive information about the target Y-the benefit being that subsequent learning with a labeled set need only work in this lower dimensional space. Abstract In the multi-view learning paradigm, the input variable is partitioned into two different views X 1 and X 2 and there is a target variable Y of interest. The underlying assumption is that either view alone is sufficient to predict the target Y accurately. This provides a natural semi-supervised learning setting in which unlabeled data can be used to eliminate hypothesis from either view, whose predictions tend to disagree with predictions based on the other view. This work explicitly formalizes an information theoretic, multi-view assumption and studies the multi-view paradigm in the PAC style semi-supervised framework of Balcan and Blum [2006]. Underlying the PAC style framework is that an incompatibility function is assumed to be known-roughly speaking, this incompatibility function is a means to score how good a function is based on the unlabeled data alone. Here, we show how to derive incompatibility functions for certain loss functions of interest, so that minimizing this incompatibility over unlabeled data helps reduce expected loss on future test cases. In particular, we show how the class of empirically successful co-regularization algorithms fall into our
translated by 谷歌翻译

2018-09-15

translated by 谷歌翻译
In this paper, we theoretically study the problem of binary classification in the presence of random classification noise-the learner, instead of seeing the true labels , sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional-the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence-methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88% accuracy even when 40% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.
translated by 谷歌翻译

2014-11-27
In this paper, we study a classification problem in which sample labels arerandomly corrupted. In this scenario, there is an unobservable sample withnoise-free labels. However, before being observed, the true labels areindependently flipped with a probability $\rho\in[0,0.5)$, and the random labelnoise can be class-conditional. Here, we address two fundamental problemsraised by this scenario. The first is how to best use the abundant surrogateloss functions designed for the traditional classification problem when thereis label noise. We prove that any surrogate loss function can be used forclassification with noisy labels by using importance reweighting, withconsistency assurance that the label noise does not ultimately hinder thesearch for the optimal classifier of the noise-free sample. The other is theopen problem of how to obtain the noise rate $\rho$. We show that the rate isupper bounded by the conditional probability $P(y|x)$ of the noisy sample.Consequently, the rate can be estimated, because the upper bound can be easilyreached in classification problems. Experimental results on synthetic and realdatasets confirm the efficiency of our methods.
translated by 谷歌翻译
This paper proposes a modelling of Support Vector Machine (SVM) learning to address the problem of learning with sloppy labels. In binary classification, learning with sloppy labels is the situation where a learner is provided with labelled data, where the observed labels of each class are possibly noisy (flipped) version of their true class and where the probability of flipping a label y to −y only depends on y. The noise probability is therefore constant and uniform within each class: learning with positive and unlabeled data is for instance a motivating example for this model. In order to learn with sloppy labels, we propose SloppySvm, an SVM algorithm that minimizes a tailored nonconvex functional that is shown to be a uniform estimate of the noise-free SVM functional. Several experiments validate the soundness of our approach.
translated by 谷歌翻译

2013-03-05
In many real-world classification problems, the labels of training examplesare randomly corrupted. Most previous theoretical work on classification withlabel noise assumes that the two classes are separable, that the label noise isindependent of the true class label, or that the noise proportions for eachclass are known. In this work, we give conditions that are necessary andsufficient for the true class-conditional distributions to be identifiable.These conditions are weaker than those analyzed previously, and allow for theclasses to be nonseparable and the noise levels to be asymmetric and unknown.The conditions essentially state that a majority of the observed labels arecorrect and that the true class-conditional distributions are "mutuallyirreducible," a concept we introduce that limits the similarity of the twodistributions. For any label noise problem, there is a unique pair of trueclass-conditional distributions satisfying the proposed conditions, and weargue that this pair corresponds in a certain sense to maximal denoising of theobserved distributions. Our results are facilitated by a connection to "mixture proportionestimation," which is the problem of estimating the maximal proportion of onedistribution that is present in another. We establish a novel rate ofconvergence result for mixture proportion estimation, and apply this to obtainconsistency of a discrimination rule based on surrogate loss minimization.Experimental results on benchmark data and a nuclear particle classificationproblem demonstrate the efficacy of our approach.
translated by 谷歌翻译
${authors} 分类：${tags}
${pubdate}${abstract_cn}
translated by 谷歌翻译