This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
translated by 谷歌翻译
Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum $\ell_1$-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order $\frac{\|\wgt\|_1^{2/3}}{n^{1/3}}$ for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order $\frac{1}{\sqrt{\log(d/n)}}$. We are therefore first to show benign overfitting for the maximum $\ell_1$-margin classifier.
translated by 谷歌翻译
It is widely believed that given the same labeling budget, active learning algorithms like uncertainty sampling achieve better predictive performance than passive learning (i.e. uniform sampling), albeit at a higher computational cost. Recent empirical evidence suggests that this added cost might be in vain, as uncertainty sampling can sometimes perform even worse than passive learning. While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that passive learning outperforms uncertainty sampling even for noiseless data and when using the uncertainty of the Bayes optimal classifier. Insights from our proof indicate that this high-dimensional phenomenon is exacerbated when the separation between the classes is small. We corroborate this intuition with experiments on 20 high-dimensional datasets spanning a diverse range of applications, from finance and histology to chemistry and computer vision.
translated by 谷歌翻译
随着机器学习算法在关键决策过程中的敏感数据上部署,它们也是私人和公平的越来越重要的。在本文中,我们表明,当数据具有长尾结构时,不可能构建既私有的学习算法,又无法对少数族裔亚人群产生更高的准确性。我们进一步表明,即使有严格的隐私要求,放松的整体准确性也会导致良好的公平性。为了证实我们在实践中的理论结果,我们使用各种综合,视觉〜(\ cifar和celeba)以及表格〜(法学院)数据集和学习算法提供了一组广泛的实验结果。
translated by 谷歌翻译
在关键安全应用中,当没有可解释的解释时,从业者不愿信任神经网络。许多尝试提供此类解释的尝试围绕基于像素的属性或使用先前已知的概念。在本文中,我们旨在通过证明\ emph {高级,以前未知的地面概念}来提供解释。为此,我们提出了一个概率建模框架来得出(c)插入(l)收入和(p)rediction(clap) - 基于VAE的分类器,该分类器使用可视上可解释的概念作为简单分类器的预测指标。假设是基本概念的生成模型,我们证明拍手能够在达到最佳分类精度的同时识别它们。我们对合成数据集的实验验证了拍手确定合成数据集的不同基础真相概念,并在医疗胸部X射线数据集上产生有希望的结果。
translated by 谷歌翻译
我们提供匹配的Under $ \ sigma ^ 2 / \ log(d / n)$的匹配的上下界限为最低$ \ ell_1 $ -norm插值器,a.k.a.基础追踪。我们的结果紧紧达到可忽略的术语,而且是第一个暗示噪声最小范围内插值的渐近一致性,因为各向同性特征和稀疏的地面真理。我们的工作对最低$ \ ell_2 $ -norm插值的“良性接收”进行了补充文献,其中才能在特征有效地低维时实现渐近一致性。
translated by 谷歌翻译
许多最近的作品表明,过度分辨率隐含地降低了MIN-NORM Interpolator和Max-Maxifiers的方差。这些调查结果表明,RIDGE正则化在高维度下具有消失的益处。我们通过表明,即使在没有噪声的情况下,避免通过脊正则化的插值可以显着提高泛化。我们证明了这种现象,用于线性回归和分类的强大风险,因此提供了强大的过度装备的第一个理论结果。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译