Concept bottleneck models (CBMs) (Koh et al. 2020) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions.We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate thata simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms a static approach proposed in Koh et al. (2020) as well as active feature acquisition methods proposed in the literature. We show that the interactiveCBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech-UCSDBirds, CheXpert and OAI datasets.
translated by 谷歌翻译
Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.
translated by 谷歌翻译
Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.
translated by 谷歌翻译
可靠的异常检测对于深度学习模型的现实应用至关重要。深层生成模型产生的可能性虽然进行了广泛的研究,但仍被认为是对异常检测的不切实际的。一方面,深层生成模型的可能性很容易被低级输入统计数据偏差。其次,许多用于纠正这些偏见的解决方案在计算上是昂贵的,或者对复杂的天然数据集的推广不佳。在这里,我们使用最先进的深度自回归模型探索离群值检测:PixelCNN ++。我们表明,PixelCNN ++的偏见主要来自基于局部依赖性的预测。我们提出了两个我们称为“震动”和“搅拌”的徒转化家族,它们可以改善低水平的偏见并隔离长期依赖性对PixelCNN ++可能性的贡献。这些转换在计算上是便宜的,并且在评估时很容易应用。我们使用五个灰度和六个自然图像数据集对我们的方法进行了广泛的评估,并表明它们达到或超过了最新的离群检测性能。总而言之,轻巧的补救措施足以在具有深层生成模型的图像上实现强大的离群检测。
translated by 谷歌翻译
我们介绍了一个大规模实验,该实验对编码器进行了预处理,其参数计数范围从700m到9.3b不等,随后蒸馏到较小的型号中,范围为17m-170亿参数,其应用到自然语言理解(NLU)组件(NLU)组件(虚拟助手系统。尽管我们使用70%的口语数据训练,但在对书面形式的跨语性自然语言推论(XNLI)语料库进行评估时,我们的教师模型与XLM-R和MT5相当。我们使用系统中的内域数据对教师模型进行了第二阶段的训练,以提高了3.86%的相对分类,而相对7.01%的插槽填充。我们发现,即使是从我们的2阶段教师模型中提取的170亿参数模型,与仅接受公共数据的2.3B参数老师相比,与2.3B参数老师相比,意图分类更好2.88%,并且7.69%的插槽填充错误率更好(第1阶段),强调了。内域数据对训练的重要性。当使用标记的NLU数据进行离线评估时,我们的17m参数阶段2蒸馏模型的表现分别优于XLM-R碱基(85m Params)和Distillbert(42m Params),分别优于4.23%至6.14%。最后,我们介绍了一个完整的虚拟助手实验平台的结果,在该平台中,我们发现使用经过预训练和蒸馏管道训练的模型超过了从8500万参数教师蒸馏的模型,在自动测量全系统用户不满的自动测量中,从8500万参数教师蒸馏出3.74%-4.91%。
translated by 谷歌翻译
分层增强学习中的选项框架将整体目标分解为选项或更简单的任务和相关策略的组合,从而可以在动作领域进行抽象。理想情况下,可以在不同的高级目标中重复使用这些选择;确实,这种重复使用对于实现可以有效利用其先前经验的持续学习代理的愿景是必要的。先前的方法仅提出了将预科选项转移到新任务设置的有限形式。我们提出了一种新颖的选项索引方法,用于分层学习(OI-HRL),在该方法中,我们学习选项与环境中存在的项目之间的亲和力功能。这使我们能够通过将目标指导的学习仅限于与手头的任务相关的那些选项,在测试时间零弹性概括中有效地重用大量的经过预告片的选项库。我们开发了一个元训练循环,该循环通过结合有关检索期权与高级目标的相关性的反馈来了解一系列HRL问题的选项和环境的表示。我们在两个模拟设置中评估了OI -HRL -Craftworld和AI2THOR环境 - 并表明我们与Oracular Baseline达到了性能竞争,并且比基线的实质性取得了可观的增长,该基线具有可用于学习层次结构策略的整个选项库。
translated by 谷歌翻译
除了使用硬标签的标准监督学习外,通常在许多监督学习设置中使用辅助损失来改善模型的概括。例如,知识蒸馏增加了第二个教师模仿模型训练的损失,在该培训中,教师可能是一个验证的模型,可以输出比标签更丰富的分布。同样,在标记数据有限的设置中,弱标记信息以标签函数的形式使用。此处引入辅助损失来对抗标签函数,这些功能可能是基于嘈杂的规则的真实标签近似值。我们解决了学习以原则性方式结合这些损失的问题。我们介绍AMAL,该AMAL使用元学习在验证度量上学习实例特定的权重,以实现损失的最佳混合。在许多知识蒸馏和规则降解域中进行的实验表明,Amal在这些领域中对竞争基准的增长可显着。我们通过经验分析我们的方法,并分享有关其提供性能提升的机制的见解。
translated by 谷歌翻译
持续学习(CL)旨在开发单一模型适应越来越多的任务的技术,从而潜在地利用跨任务的学习以资源有效的方式。 CL系统的主要挑战是灾难性的遗忘,在学习新任务时忘记了早期的任务。为了解决此问题,基于重播的CL方法在遇到遇到任务中选择的小缓冲区中维护和重复培训。我们提出梯度Coreset重放(GCR),一种新颖的重播缓冲区选择和使用仔细设计的优化标准的更新策略。具体而言,我们选择并维护一个“Coreset”,其与迄今为止关于当前模型参数的所有数据的梯度紧密近似,并讨论其有效应用于持续学习设置所需的关键策略。在学习的离线持续学习环境中,我们在最先进的最先进的最先进的持续学习环境中表现出显着的收益(2%-4%)。我们的调查结果还有效地转移到在线/流媒体CL设置,从而显示现有方法的5%。最后,我们展示了持续学习的监督对比损失的价值,当与我们的子集选择策略相结合时,累计增益高达5%。
translated by 谷歌翻译
当用离群数据与培训分布相去甚远,深层网络通常会充满信心,但仍有不正确的预测。由深生成模型(DGM)计算出的可能性是使用未标记数据的异常检测的候选指标。然而,以前的研究表明,DGM的可能性是不可靠的,可以通过简单转换对输入数据很容易偏见。在这里,我们在最简单的DGM中检查了使用变异自动编码器(VAE)(VAE)的离群值检测。我们提出了新型的分析和算法方法,以减轻VAE可能性的关键偏见。我们的偏差校正是特定于样本的,计算便宜的,并且很容易针对各种解码器可见分布进行计算。接下来,我们表明,众所周知的图像预处理技术(对比拉伸)扩展了偏置校正的有效性,以进一步改善异常检测。我们的方法通过九个灰度和自然图像数据集实现了最先进的精度,并在最近的四种竞争方法中表现出了显着的优势 - 无论是速度和性能而言,都具有显着的优势。总而言之,轻巧的补救措施足以通过VAE实现强大的离群值检测。
translated by 谷歌翻译
In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e.g., aggregation systems such as Uber, Deliveroo). In this paper, we consider such multi-agent systems where each agent is self-interested and takes a sequence of decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG). We derive key properties for equilibrium solutions in SNCG model with non-atomic and also nearly non-atomic agents. With those key equilibrium properties, we provide a novel Multi-Agent Reinforcement Learning (MARL) mechanism that minimizes variance across values of agents in the same state. To demonstrate the utility of this new mechanism, we provide detailed results on a real-world taxi dataset and also a generic simulator for aggregation systems. We show that our approach reduces the variance in revenues earned by taxi drivers, while still providing higher joint revenues than leading approaches.
translated by 谷歌翻译