Language models are widely deployed to provide automatic text completion services in user products. However, recent research has revealed that language models (especially large ones) bear considerable risk of memorizing private training data, which is then vulnerable to leakage and extraction by adversaries. In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text, while varying other factors such as model size and adversarial conditions. We test both "heuristic" mitigations (those without formal privacy guarantees) and Differentially Private training, which provides provable levels of privacy at the cost of some model performance. Our experiments show that (with the exception of L2 regularization), heuristic mitigations are largely ineffective in preventing memorization in our test suite, possibly because they make too strong of assumptions about the characteristics that define "sensitive" or "private" text. In contrast, Differential Privacy reliably prevents memorization in our experiments, despite its computational and model-performance costs.
translated by 谷歌翻译
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. Worryingly, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.
translated by 谷歌翻译
随着大型预训练的语言模型(例如GPT-2和BERT)的广泛可用性,最近的趋势是微调一个预训练的模型,以在下游任务上实现最新的性能。一个自然的示例是“智能回复”应用程序,其中调整了预训练的模型以为给定的查询消息提供建议的答复。由于这些模型通常是使用敏感数据(例如电子邮件或聊天成绩单)调整的,因此了解和减轻模型泄漏其调整数据的风险很重要。我们研究了典型的智能回复管道中的潜在信息泄漏漏洞,并引入了一种新型的主动提取攻击,该攻击利用包含敏感数据的文本中的规范模式。我们通过实验表明,对手可以提取培训数据中存在的敏感用户信息。我们探讨了潜在的缓解策略,并从经验上证明了差异隐私如何成为这种模式提取攻击的有效防御机制。
translated by 谷歌翻译
Privacy preserving deep learning is an emerging field in machine learning that aims to mitigate the privacy risks in the use of deep neural networks. One such risk is training data extraction from language models that have been trained on datasets , which contain personal and privacy sensitive information. In our study, we investigate the extent of named entity memorization in fine-tuned BERT models. We use single-label text classification as representative downstream task and employ three different fine-tuning setups in our experiments, including one with Differentially Privacy (DP). We create a large number of text samples from the fine-tuned BERT models utilizing a custom sequential sampling strategy with two prompting strategies. We search in these samples for named entities and check if they are also present in the fine-tuning datasets. We experiment with two benchmark datasets in the domains of emails and blogs. We show that the application of DP has a huge effect on the text generation capabilities of BERT. Furthermore, we show that a fine-tuned BERT does not generate more named entities entities specific to the fine-tuning dataset than a BERT model that is pre-trained only. This suggests that BERT is unlikely to emit personal or privacy sensitive named entities. Overall, our results are important to understand to what extent BERT-based services are prone to training data extraction attacks.
translated by 谷歌翻译
This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models-a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users' private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization.In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its application to quantitatively limit data exposure in Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.
translated by 谷歌翻译
大型语言模型被显示为记住隐私信息,例如培训数据中的社会保险号。鉴于培训语料库的巨大规模,筛选和自动筛选和过滤这些隐私数据是一项挑战。在本文中,我们提出了秘密编辑的培训(CRT),这是一种培训语言生成模型的方法,同时保护机密细分市场。我们从差异隐私(解决一个相关但独特的问题)中借鉴了想法,并表明我们的方法能够通过随机将培训过程的部分随机化来防止意外的记忆。此外,我们证明了通过近似正确的筛选策略进行修复会放大机密性保证。我们实施LSTM和GPT语言模型的方法。我们的实验结果表明,通过CRT训练的模型获得了几乎相同的困惑,同时保持了强大的机密性。
translated by 谷歌翻译
最近的数据提取攻击暴露了语言模型可以记住一些培训样本逐字。这是一种漏洞,可以损害模型培训数据的隐私。在这项工作中,我们介绍了子句:私人私人下一象征预测的实用协议,旨在防止在公共语料库预训练后在私人语料库中进行微调的语言模型的隐私违规。我们展示子子句通过放松差异私密预测,限制了私人语料库中的任何单独用户所唯一的信息的泄漏。重要的是,子提M允许一个紧张,数据相关的隐私会计机制,它允许它挫败现有的数据提取攻击,同时保持语言模型的效用。子句是即使在公开释放由大型变压器的模型等基于GPT-2的基于大型变换器的模型制作的数千个下一令牌预测,也是第一个维护隐私的协议。
translated by 谷歌翻译
Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.
translated by 谷歌翻译
Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain, which, to the best of our knowledge, has not been addressed before.
translated by 谷歌翻译
We demonstrate that it is possible to train large recurrent language models with user-level differential privacy guarantees with only a negligible cost in predictive accuracy. Our work builds on recent advances in the training of deep networks on user-partitioned data and privacy accounting for stochastic gradient descent. In particular, we add user-level privacy protection to the federated averaging algorithm, which makes "large step" updates from user-level data. Our work demonstrates that given a dataset with a sufficiently large number of users (a requirement easily met by even small internet-scale datasets), achieving differential privacy comes at the cost of increased computation, rather than in decreased utility as in most prior work. We find that our private LSTM language models are quantitatively and qualitatively similar to un-noised models when trained on a large dataset.
translated by 谷歌翻译
Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. We first show that the rate at which language models regenerate training sequences is superlinearly related to a sequence's count in the training set. For instance, a sequence that is present 10 times in the training data is on average generated ~1000 times more often than a sequence that is present only once. We next show that existing methods for detecting memorized sequences have near-chance accuracy on non-duplicated training sequences. Finally, we find that after applying methods to deduplicate training data, language models are considerably more secure against these types of privacy attacks. Taken together, our results motivate an increased focus on deduplication in privacy-sensitive applications and a reevaluation of the practicality of existing privacy attacks.
translated by 谷歌翻译
差异隐私(DP)提供了正式的隐私保证,以防止对手可以访问机器学习模型,从而从提取有关单个培训点的信息。最受欢迎的DP训练方法是差异私有随机梯度下降(DP-SGD),它通过在训练过程中注入噪声来实现这种保护。然而,以前的工作发现,DP-SGD通常会导致标准图像分类基准的性能显着降解。此外,一些作者假设DP-SGD在大型模型上固有地表现不佳,因为保留隐私所需的噪声规范与模型维度成正比。相反,我们证明了过度参数化模型上的DP-SGD可以比以前想象的要好得多。将仔细的超参数调整与简单技术结合起来,以确保信号传播并提高收敛速率,我们获得了新的SOTA,而没有额外数据的CIFAR-10,在81.4%的81.4%下(8,10^{ - 5}) - 使用40 -layer wide-Resnet,比以前的SOTA提高了71.7%。当对预训练的NFNET-F3进行微调时,我们在ImageNet(0.5,8*10^{ - 7})下达到了83.8%的TOP-1精度。此外,我们还在(8,8 \ cdot 10^{ - 7})下达到了86.7%的TOP-1精度,DP仅比当前的非私人SOTA仅4.3%。我们认为,我们的结果是缩小私人图像分类和非私有图像分类之间准确性差距的重要一步。
translated by 谷歌翻译
In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets.
translated by 谷歌翻译
随着语言模型的不断增加,它对于保护这些模型免于泄漏私人信息变得至关重要。以前的工作试图通过培训具有不同隐私保证的基于RNN的语言模型来应对这一挑战。但是,将经典的差异隐私应用于语言模型会导致模型性能差,因为基本隐私概念过于困惑,并且为数据中所有令牌提供了不体化的保护。鉴于自然语言中的私人信息很少(例如,电子邮件的大部分可能无法携带个人身份信息),我们提出了一个新的隐私概念,选择性差异隐私,以提供严格的数据,以保证数据的敏感部分改善模型实用程序。为了实现这样一个新的概念,我们为基于RNN的语言模型开发了相应的隐私机制,即选择性DPSGD。除了语言建模外,我们还将方法应用于更具体的应用程序 - dialog系统。语言建模和对话系统建设的实验表明,与基线相比,在各种隐私攻击下,提议的保留隐私机制可以实现更好的公用事业,同时保持安全。数据和代码在https://github.com/wyshi/lm_privacy上发布,以促进未来的研究。
translated by 谷歌翻译
机器学习模型表现出两个看似矛盾的现象:训练数据记忆和各种遗忘形式。在记忆中,模型过于适合特定的培训示例,并容易受到隐私攻击的影响。在忘记时,最终忘记了在培训初期出现的例子。在这项工作中,我们将这些现象联系起来。我们提出了一种技术,以衡量训练示例的细节在多大程度上``忘记'',从而不易受到他们最近未曾见过的示例的隐私攻击的影响。我们表明,尽管非凸性可以防止在最坏的情况下忘记发生,但标准图像和语音模型在经验上确实会随着时间的流逝而忘记示例。我们将非确定性识别为潜在的解释,表明经过确定性训练的模型不会忘记。我们的结果表明,当使用极大的数据集培训(例如用于预训练模型的示例)时,早期看到的例子可能会观察到隐私益处,而牺牲了后来看到的示例。
translated by 谷歌翻译
聊天机器人用于许多应用程序中,例如自动化代理,智能家庭助理,在线游戏中的互动角色等。因此,确保他们不会以不希望的方式行事,对用户提供令人反感或有毒的反应。这并不是一项琐碎的任务,因为最先进的聊天机器人模型是在从互联网公开收集的大型公共数据集上培训的。本文提出了对聊天机器人中毒性的首次大规模测量。我们表明,公开可用的聊天机器人很容易在喂养有毒的查询时提供有毒的反应。更令人担忧的是,一些无毒的查询也会触发有毒反应。然后,我们着手设计和实验攻击,即毒性,该攻击依赖于微调的GPT-2来产生无毒的查询,使聊天机器人以有毒的方式做出反应。我们广泛的实验评估表明,我们的攻击对公共聊天机器人模型有效,并且优于先前工作提出的手动制作的恶意查询。我们还评估了针对毒性的三种防御机制,表明它们要么以影响聊天机器人的效用而降低攻击性能,要么仅有效地减轻了一部分攻击。这强调了对计算机安全和在线安全社区进行更多研究的需求,以确保聊天机器人模型不会伤害其用户。总体而言,我们有信心有毒可以用作审计工具,我们的工作将为设计更有效的聊天机器人安全防御措施铺平道路。
translated by 谷歌翻译
从公共机器学习(ML)模型中泄漏数据是一个越来越重要的领域,因为ML的商业和政府应用可以利用多个数据源,可能包括用户和客户的敏感数据。我们对几个方面的当代进步进行了全面的调查,涵盖了非自愿数据泄漏,这对ML模型很自然,潜在的恶毒泄漏是由隐私攻击引起的,以及目前可用的防御机制。我们专注于推理时间泄漏,这是公开可用模型的最可能场景。我们首先在不同的数据,任务和模型体系结构的背景下讨论什么是泄漏。然后,我们提出了跨非自愿和恶意泄漏的分类法,可用的防御措施,然后进行当前可用的评估指标和应用。我们以杰出的挑战和开放性的问题结束,概述了一些有希望的未来研究方向。
translated by 谷歌翻译
最近的工作证明了从生成语言模型中成功提取培训数据。但是,在文本分类模型中,这种提取是否可行,因为培训目标是预测类标签而不是下一字预测。这提出了一个有趣的挑战,并提出了关于文本分类设置中培训数据隐私的重要问题。因此,我们通过研究与学习任务无关的培训数据的意外记忆的问题来研究文本分类域中的潜在隐私泄漏。我们提出了一种算法,通过利用模型提供的类标签的可能性来提取部分文本的缺失令牌。我们通过将金丝雀插入训练集并试图在训练后提取令牌来测试算法的有效性。在我们的实验中,我们证明了在一定程度上可以成功提取。这也可以用作审计策略,以评估未经同意的任何未经授权使用个人数据的使用。
translated by 谷歌翻译
Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning.
translated by 谷歌翻译
最近的大规模自然语言处理(NLP)系统对大规模和多样化的语料库使用预先培训的大型语言模型(LLM)。实际上,预训练的模型通过对特定于任务的数据集进行微调来适应各种任务。 LLMS虽然有效,但已被证明可以记住培训数据的实例,从而有可能揭示在预训练期间处理的私人信息。潜在的泄漏可能会进一步传播到LLM经过微调的下游任务。另一方面,保存隐私的算法通常涉及从头开始的重新划痕,这对于LLM来说非常昂贵。在这项工作中,我们提出了一个简单,易于解释的,并且在解码阶段将其轻巧的扰动机制应用于已经训练的模型。我们的扰动机制是模型不可抑制的,可以与任何LLM结合使用。我们提供的理论分析表明,所提出的机制是私人的,实验结果显示了隐私 - 私人权衡权衡。
translated by 谷歌翻译