Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This makes it difficult to create physically faithful drift test cases or to provide specifications of data models that should be avoided when deploying a machine learning model. In this study, we demonstrate how these shortcomings can be overcome by pairing machine learning robustness validation with physical optics. We examine the role raw sensor data and differentiable data models can play in controlling performance risks related to image dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases. The experiments presented here show that the average decrease in model performance is ten to four times less severe than under post-hoc augmentation testing. Second, the gradient connection between task and data models allows for drift forensics that can be used to specify performance-sensitive data models which should be avoided during deployment of a machine learning model. Third, drift adjustment opens up the possibility for processing adjustments in the face of drift. This can lead to speed up and stabilization of classifier training at a margin of up to 20% in validation accuracy. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.
translated by 谷歌翻译
Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, language models are not trained to perform well at these tasks, they are trained to accurately predict the next token given previous tokes in tokenized text. It is not clear whether language models are better or worse than humans at next token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity. In both experiments, we find humans to be consistently \emph{worse} than even relatively small language models like GPT3-Ada at next-token prediction.
translated by 谷歌翻译
Individual neurons in neural networks often represent a mixture of unrelated features. This phenomenon, called polysemanticity, can make interpreting neural networks more difficult and so we aim to understand its causes. We propose doing so through the lens of feature \emph{capacity}, which is the fractional dimension each feature consumes in the embedding space. We show that in a toy model the optimal capacity allocation tends to monosemantically represent the most important features, polysemantically represent less important features (in proportion to their impact on the loss), and entirely ignore the least important features. Polysemanticity is more prevalent when the inputs have higher kurtosis or sparsity and more prevalent in some architectures than others. Given an optimal allocation of capacity, we go on to study the geometry of the embedding space. We find a block-semi-orthogonal structure, with differing block sizes in different models, highlighting the impact of model architecture on the interpretability of its neurons.
translated by 谷歌翻译
将来,强大的AI系统可能会在高风险的设置中部署,在这种情况下,单个故障可能是灾难性的。在高风险设置中改善AI安全性的一种技术是对手训练,该培训使用对手来生成示例进行训练,以实现更好的最差表现。在这项工作中,我们将语言生成任务用作测试台,以通过对抗性培训来实现高可靠性。我们创建了一系列的对抗训练技术 - 包括一种有助于人类对手的工具 - 以在分类器中找到和消除故障,该分类器过滤了发电机建议的文本完成。在简单的“避免受伤”任务中,我们确定我们可以设置非常保守的分类器阈值,而不会显着影响过滤后的输出的质量。使用我们选择的阈值,使用基线分类器进行过滤,将不安全完成的速度从分布数据的数据降低到约2.4%至0.003%,这是我们测量能力的极限。我们发现,对抗性训练可显着提高对我们训练的对抗攻击的鲁棒性,而不会影响分布性能。我们希望在高风险的可靠性环境中看到进一步的工作,包括更强大的工具来增强人类对手,以及更好的方法来衡量高水平的可靠性,直到我们可以自信地排除强大模型的灾难性部署时间失败的可能性。
translated by 谷歌翻译
本文解决了视频检测问题的视频监视问题。由于异常事件的固有稀有性和异质性,该问题被视为一种正态建模策略,在这种策略中,我们的模型学习以对象为中心的正常模式,而无需在训练过程中看到异常样本。主要贡献在于耦合预处理的对象级动作具有基于余弦的异常估计功能的原型原型,因此通过向基于主流重建的策略引入其他约束来扩展以前的方法。我们的框架利用外观和运动信息来学习对象级别的行为并捕获内存模块中的原型模式。在几个知名数据集上进行的实验证明了我们方法的有效性,因为它在最相关的时空评估指标上优于当前的最新时间。
translated by 谷歌翻译
机器学习,已经在越来越多的系统和应用程序的核心,被设置为更普遍存在的可穿戴设备和物联网的快速崛起。在大多数机器学习应用中,主要焦点是实现的结果的质量(例如,预测准确性),因此正在收集大量数据,需要大量的计算资源来构建模型。但是,在许多情况下,建立大型集中式数据存储库是不可行或不切实际的。例如,在个人健康中,隐私问题可能会抑制详细个人数据的共享。在这种情况下,理想情况下,机器学习应该在可穿戴设备本身上执行,这提高了诸如Smartwatches的电池容量的主要计算限制。因此,本文调查了节俭学习,旨在使用最少量资源来构建最准确的可能模型。通过节俭镜头检查广泛的学习算法,在各种数据集上分析了它们的准确性/运行时性能。此后,最有前途的算法通过在SmartWatch中实现它们,并让他们在手表本身上学习活动识别模型来评估现实世界的情况。
translated by 谷歌翻译
本文介绍了学习迭代查询细化的元策略的设计代理的首先成功步骤。我们的方法使用机器读取来指导从聚合搜索结果中选择细化项。然后,使用简单但有效的搜索操作员能够赋予代理,以对查询和搜索结果发挥细粒度和透明控制。我们开发一种新颖的方式来发电综合搜索会话,它通过(自我)监督学习来利用基于变压器的语言模型的力量。我们还提出了一种强化学习代理,具有动态约束的动作,从划痕中了解互动搜索策略。我们使用传统的基于术语的BM25排名函数获得与最近神经方法相当的检索和回答质量性能。我们对搜索政策进行了深入的分析。
translated by 谷歌翻译
现代机器学习系统越来越多地以广泛的个人数据收集为特征,尽管回报降低并增加了这种做法的社会成本。然而,数据最小化是欧盟一般数据保护法规('GDPR')中列出的核心数据保护原则之一,并要求仅处理足够,相关且仅限于必要物品的个人数据。但是,由于缺乏技术解释,该原则的采用有限。在这项工作中,我们以机器学习和法律的文献为基础提出FIDO,这是抑制数据过度收集的框架。 Fido学会了基于与系统性能相关的数据最小化的解释来限制数据收集。具体而言,Fido通过迭代更新性能曲线的估计值或数据集大小和性能之间的关系,从而提供了数据收集,以停止标准。 FIDO通过分段功率定律技术估算性能曲线,该技术在整个数据收集过程中分别对算法性能的不同阶段进行建模。经验实验表明,该框架会产生准确的性能曲线和数据收集,从而在数据集中停止标准并功能采集算法。我们进一步证明,许多其他曲线家庭系统地高估了其他数据的回报。在设计数据最小化框架时,我们的调查结果和分析提供了对相关考虑因素的更深入的见解,包括主动功能获取对单个用户的影响以及用户特定数据最小化的可行性。我们以实施数据最小化的实用建议得出结论。
translated by 谷歌翻译
The Extremal River Problem has emerged as a flagship problem for causal discovery in extreme values of a network. The task is to recover a river network from only extreme flow measured at a set $V$ of stations, without any information on the stations' locations. We present QTree, a new simple and efficient algorithm to solve the Extremal River Problem that performs very well compared to existing methods on hydrology data and in simulations. QTree returns a root-directed tree and achieves almost perfect recovery on the Upper Danube network data, the existing benchmark data set, as well as on new data from the Lower Colorado River network in Texas. It can handle missing data, has an automated parameter tuning procedure, and runs in time $O(n |V|^2)$, where $n$ is the number of observations and $|V|$ the number of nodes in the graph. Furthermore, we prove that the QTree estimator is consistent under a Bayesian network model for extreme values with noise. We also assess the small sample behaviour of QTree through simulations and detail the strengths and possible limitations of QTree.
translated by 谷歌翻译
本文确定了数据驱动系统中的数据最小化和目的限制的两个核心数据保护原理。虽然当代数据处理实践似乎与这些原则的赔率达到差异,但我们证明系统可以在技术上使用的数据远远少于目前的数据。此观察是我们详细的技术法律分析的起点,揭示了妨碍了妨碍了实现的障碍,并举例说明了在实践中应用数据保护法的意外权衡。我们的分析旨在向辩论提供关于数据保护对欧盟人工智能发展的影响,为数据控制员,监管机构和研究人员提供实际行动点。
translated by 谷歌翻译