已经提出了数百种防御能力,以使深度神经网络可靠,以防止最小的(对抗)输入扰动。但是,只有少数这些防御能力提高了他们的主张,因为正确评估鲁棒性是极具挑战性的:弱攻击即使在不知不觉中也无法找到对抗性示例,从而使脆弱的网络看起来很健壮。在本文中,我们提出了一项测试,以识别弱攻击,从而确定弱国防评估。我们的测试稍微修改了神经网络,以确保每个样本的对抗示例存在。因此,任何正确的攻击都必须成功打破此修改后的网络。在13个先前出版的防御措施中,有11个对防御的最初评估未能通过我们的测试,而打破这些防御的更强烈的攻击则使它通过了。我们希望攻击单元测试(例如我们的)将成为未来鲁棒性评估的主要组成部分,并增加对当前受到怀疑的经验领域的信心。
translated by 谷歌翻译
在本文中,我们展示了如何通过仅依靠现成的预审预周化的模型来实现对2型界限的最先进的对抗性鲁棒性。为此,我们实例化了Salman等人的DeNoceed平滑方法。通过结合预处理的降级扩散概率模型和标准的高智分类器。这使我们能够在限制在2个norm范围内的对抗扰动下证明Imagenet上的71%精度,使用任何方法比先前的认证SOTA提高了14个百分点,或改善了与DeNoed Spootering相比的30个百分点。我们仅使用预审预测的扩散模型和图像分类器获得这些结果,而无需进行任何模型参数的任何微调或重新训练。
translated by 谷歌翻译
在私人数据集上训练的机器学习模型已显示出泄漏其私人数据。尽管最近的工作发现平均数据点很少被泄漏,但离群样本通常会经历记忆和隐私泄漏。我们演示和分析了记忆的洋葱效应:删除最容易受到隐私攻击的离群点的“层”,这使以前安全的新层暴露于同一攻击。我们执行几个实验来研究这种效果,并了解其发生的原因。这种效果的存在有各种后果。例如,它表明,在没有严格的隐私保证培训的情况下防御记忆的提案不太可能有效。此外,它表明,诸如机器学习之类的隐私技术实际上可能会损害其他用户的隐私。
translated by 谷歌翻译
员额推理攻击允许对训练的机器学习模型进行对手以预测模型的训练数据集中包含特定示例。目前使用平均案例的“精度”度量来评估这些攻击,该攻击未能表征攻击是否可以自信地识别培训集的任何成员。我们认为,应该通过计算其低(例如<0.1%)假阳性率来计算攻击来评估攻击,并在以这种方式评估时发现大多数事先攻击差。为了解决这一问题,我们开发了一个仔细结合文献中多种想法的似然比攻击(Lira)。我们的攻击是低于虚假阳性率的10倍,并且在攻击现有度量的情况下也严格占主导地位。
translated by 谷歌翻译
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. Worryingly, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.
translated by 谷歌翻译
会员推理攻击是机器学习模型中最简单的隐私泄漏形式之一:给定数据点和模型,确定该点是否用于培训模型。当查询其培训数据时,现有会员推理攻击利用模型的异常置信度。如果对手访问模型的预测标签,则不会申请这些攻击,而不会置信度。在本文中,我们介绍了仅限标签的会员资格推理攻击。我们的攻击而不是依赖置信分数,而是评估模型预测标签在扰动下的稳健性,以获得细粒度的隶属信号。这些扰动包括常见的数据增强或对抗例。我们经验表明,我们的标签占会员推理攻击与先前攻击相符,以便需要访问模型信心。我们进一步证明,仅限标签攻击违反了(隐含或明确)依赖于我们呼叫信心屏蔽的现象的员工推论攻击的多种防御。这些防御修改了模型的置信度分数以挫败攻击,但留下模型的预测标签不变。我们的标签攻击展示了置信性掩蔽不是抵御会员推理的可行的防御策略。最后,我们调查唯一的案例标签攻击,该攻击推断为少量异常值数据点。我们显示仅标签攻击也匹配此设置中基于置信的攻击。我们发现具有差异隐私和(强)L2正则化的培训模型是唯一已知的防御策略,成功地防止所有攻击。即使差异隐私预算太高而无法提供有意义的可证明担保,这仍然存在。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.
translated by 谷歌翻译
Passive monitoring of acoustic or radio sources has important applications in modern convenience, public safety, and surveillance. A key task in passive monitoring is multiobject tracking (MOT). This paper presents a Bayesian method for multisensor MOT for challenging tracking problems where the object states are high-dimensional, and the measurements follow a nonlinear model. Our method is developed in the framework of factor graphs and the sum-product algorithm (SPA). The multimodal probability density functions (pdfs) provided by the SPA are effectively represented by a Gaussian mixture model (GMM). To perform the operations of the SPA in high-dimensional spaces, we make use of Particle flow (PFL). Here, particles are migrated towards regions of high likelihood based on the solution of a partial differential equation. This makes it possible to obtain good object detection and tracking performance even in challenging multisensor MOT scenarios with single sensor measurements that have a lower dimension than the object positions. We perform a numerical evaluation in a passive acoustic monitoring scenario where multiple sources are tracked in 3-D from 1-D time-difference-of-arrival (TDOA) measurements provided by pairs of hydrophones. Our numerical results demonstrate favorable detection and estimation accuracy compared to state-of-the-art reference techniques.
translated by 谷歌翻译