基于预先训练的语言模型(PRLMS)在源代码理解任务中取得的巨大成功,当前的文献研究要么进一步改善PRLM的性能(概括)或对对抗性攻击的鲁棒性。但是,他们必须在这两个方面之间的权衡方面妥协,而且它们都没有考虑以有效和实用的方式改善双方。为了填补这一空白,我们建议使用语义保护对抗代码嵌入(空间),以找到最坏的传播语义保留攻击,同时迫使模型在这些最坏情况下预测正确的标签。实验和分析表明,在提高PRLMS代码的性能的同时,空间可以保持强大的防御性攻击。
translated by 谷歌翻译
Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are potentially susceptible to adversarial attacks in the natural channel. We propose, CodeAttack, a simple yet effective black-box attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks. We evaluate the transferability of CodeAttack on several code-code (translation and repair) and code-NL (summarization) tasks across different programming languages. CodeAttack outperforms state-of-the-art adversarial NLP attack models to achieve the best overall drop in performance while being more efficient, imperceptible, consistent, and fluent. The code can be found at https://github.com/reddy-lab-code-research/CodeAttack.
translated by 谷歌翻译
最近的作品表明了解释性和鲁棒性是值得信赖和可靠的文本分类的两个关键成分。然而,以前的作品通常是解决了两个方面的一个:i)如何提取准确的理由,以便在有利于预测的同时解释; ii)如何使预测模型对不同类型的对抗性攻击稳健。直观地,一种产生有用的解释的模型应该对对抗性攻击更加强大,因为我们无法信任输出解释的模型,而是在小扰动下改变其预测。为此,我们提出了一个名为-BMC的联合分类和理由提取模型。它包括两个关键机制:混合的对手训练(AT)旨在在离散和嵌入空间中使用各种扰动,以改善模型的鲁棒性,边界匹配约束(BMC)有助于利用边界信息的引导来定位理由。基准数据集的性能表明,所提出的AT-BMC优于分类和基本原子的基础,由大边距提取。鲁棒性分析表明,建议的AT-BMC将攻击成功率降低了高达69%。经验结果表明,强大的模型与更好的解释之间存在连接。
translated by 谷歌翻译
Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. Specifically, we address the aforementioned challenges in a unified optimization framework. And we develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables and leverage an effective sampling method to establish an accurate mapping from the continuous optimization variables to the discrete textual perturbations. Moreover, as a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models. Extensive experiments are provided to demonstrate the effectiveness of TextGrad not only in attack generation for robustness evaluation but also in adversarial defense.
translated by 谷歌翻译
我们将自然语言处理模型的脆弱性归因于以下事实:类似的输入转换为嵌入空间中不同的表示形式,导致输出不一致,我们提出了一种新颖的强大训练方法,称为快速三胞胎度量度量学习(FTML)。具体而言,我们认为原始样本应具有相似的表示及其对手对应物,并将其代表与其他样品区分开,以提高鲁棒性。为此,我们将三胞胎度量学习采用标准培训中,以将单词更接近其正样本(即同义词),并在嵌入空间中推出其负面样本(即非综合样品)。广泛的实验表明,FTML可以显着促进模型的鲁棒性,以针对各种高级对抗攻击,同时保持对原始样品的竞争性分类精度。此外,我们的方法是有效的,因为它只需要调整嵌入方式,并且在标准培训上引入了很少的开销。我们的工作显示出通过稳健的单词嵌入来改善文本鲁棒性的巨大潜力。
translated by 谷歌翻译
尽管在许多机器学习任务方面取得了巨大成功,但深度神经网络仍然易于对抗对抗样本。虽然基于梯度的对抗攻击方法在计算机视野领域探索,但由于文本的离散性质,直接应用于自然语言处理中,这是不切实际的。为了弥合这一差距,我们提出了一般框架,以适应现有的基于梯度的方法来制作文本对抗性样本。在该框架中,将基于梯度的连续扰动添加到嵌入层中,并在前向传播过程中被放大。然后用掩模语言模型头解码最终的扰动潜在表示以获得潜在的对抗性样本。在本文中,我们将我们的框架与\ textbf {t} Extual \ TextBF {P} ROJECTED \ TextBF {G} Radient \ TextBF {D} excent(\ TextBF {TPGD})进行ronject \ textbf {p}。我们通过在三个基准数据集上执行转移黑匣子攻击来评估我们的框架来评估我们的框架。实验结果表明,与强基线方法相比,我们的方法达到了更好的性能,并产生更精细和语法的对抗性样本。所有代码和数据都将公开。
translated by 谷歌翻译
Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of BERT-like language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveal that the improved generalization is due to `more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models.
translated by 谷歌翻译
离散对手攻击是对保留输出标签的语言输入的象征性扰动,但导致预测误差。虽然这种攻击已经广泛探索了评估模型稳健性的目的,但他们的改善稳健性的效用仅限于离线增强。具体地,给定训练有素的模型,攻击用于产生扰动(对抗性)示例,并且模型重新培训一次。在这项工作中,我们解决了这个差距并利用了在线增强的离散攻击,在每个训练步骤中产生了对抗的例子,适应模型的变化性质。我们提出(i)基于最佳搜索的新的离散攻击,以及(ii)与现有工作不同的随机采样攻击不是基于昂贵的搜索过程。令人惊讶的是,我们发现随机抽样导致鲁棒性的令人印象深刻,优于普通使用的离线增强,同时导致训练时间〜10x的加速。此外,在线增强基于搜索的攻击证明了更高的培训成本,显着提高了三个数据集的鲁棒性。最后,我们表明我们的新攻击与先前的方法相比,我们的新攻击显着提高了鲁棒性。
translated by 谷歌翻译
数据增强是通过转换为机器学习的人工创建数据的人工创建,是一个跨机器学习学科的研究领域。尽管它对于增加模型的概括功能很有用,但它还可以解决许多其他挑战和问题,从克服有限的培训数据到正规化目标到限制用于保护隐私的数据的数量。基于对数据扩展的目标和应用的精确描述以及现有作品的分类法,该调查涉及用于文本分类的数据增强方法,并旨在为研究人员和从业者提供简洁而全面的概述。我们将100多种方法划分为12种不同的分组,并提供最先进的参考文献来阐述哪种方法可以通过将它们相互关联,从而阐述了哪种方法。最后,提供可能构成未来工作的基础的研究观点。
translated by 谷歌翻译
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further improve robustness and accuracy of the pre-trained code model. Built on the above two modules, we develop CLAWSAT, a novel self-supervised learning (SSL) framework for code by integrating $\underline{\textrm{CL}}$ with $\underline{\textrm{a}}$dversarial vie$\underline{\textrm{w}}$s (CLAW) with $\underline{\textrm{s}}$taggered $\underline{\textrm{a}}$dversarial $\underline{\textrm{t}}$raining (SAT). On evaluating three downstream tasks across Python and Java, we show that CLAWSAT consistently yields the best robustness and accuracy ($\textit{e.g.}$ 11$\%$ in robustness and 6$\%$ in accuracy on the code summarization task in Python). We additionally demonstrate the effectiveness of adversarial learning in CLAW by analyzing the characteristics of the loss landscape and interpretability of the pre-trained models.
translated by 谷歌翻译
随着预先训练模型的巨大成功,Pretrain-Then-Finetune范式已被广泛采用下游任务,以获得源代码的理解。但是,与昂贵的培训从头开始培训,如何将预先训练的模型从划痕进行有效地调整到新任务的训练模型尚未完全探索。在本文中,我们提出了一种桥接预先训练的模型和与代码相关任务的方法。我们利用语义保留的转换来丰富下游数据分集,并帮助预先接受的模型学习语义特征不变于这些语义上等效的转换。此外,我们介绍课程学习以易于努力的方式组织转换的数据,以微调现有的预先训练的模型。我们将我们的方法应用于一系列预先训练的型号,它们在源代码理解的任务中显着优于最先进的模型,例如算法分类,代码克隆检测和代码搜索。我们的实验甚至表明,在没有重量训练的代码数据上,自然语言预先训练的模型罗伯塔微调我们的轻质方法可以优于或竞争现有的代码,在上述任务中进行微调,如Codebert和Codebert和GraphCodebert。这一发现表明,代码预训练模型中仍有很大的改进空间。
translated by 谷歌翻译
Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our method consistently and significantly outperforms previous defenses under various attack settings without trading off clean accuracy. To the best of our knowledge, this is the first NLP defense that leverages the manifold structure against adversarial attacks. Our code is available at \url{https://github.com/dangne/tmd}.
translated by 谷歌翻译
大规模的预训练语言模型在广泛的自然语言理解(NLU)任务中取得了巨大的成功,甚至超过人类性能。然而,最近的研究表明,这些模型的稳健性可能受到精心制作的文本对抗例子的挑战。虽然已经提出了几个单独的数据集来评估模型稳健性,但仍缺少原则和全面的基准。在本文中,我们呈现对抗性胶水(AdvGlue),这是一个新的多任务基准,以定量和彻底探索和评估各种对抗攻击下现代大规模语言模型的脆弱性。特别是,我们系统地应用14种文本对抗的攻击方法来构建一个粘合的援助,这是由人类进一步验证的可靠注释。我们的调查结果总结如下。 (i)大多数现有的对抗性攻击算法容易发生无效或暧昧的对手示例,其中大约90%的含量改变原始语义含义或误导性的人的注册人。因此,我们执行仔细的过滤过程来策划高质量的基准。 (ii)我们测试的所有语言模型和强大的培训方法在AdvGlue上表现不佳,差价远远落后于良性准确性。我们希望我们的工作能够激励开发新的对抗攻击,这些攻击更加隐身,更加统一,以及针对复杂的对抗性攻击的新强大语言模型。 Advglue在https://adversarialglue.github.io提供。
translated by 谷歌翻译
The robustness of Text-to-SQL parsers against adversarial perturbations plays a crucial role in delivering highly reliable applications. Previous studies along this line primarily focused on perturbations in the natural language question side, neglecting the variability of tables. Motivated by this, we propose the Adversarial Table Perturbation (ATP) as a new attacking paradigm to measure the robustness of Text-to-SQL models. Following this proposition, we curate ADVETA, the first robustness evaluation benchmark featuring natural and realistic ATPs. All tested state-of-the-art models experience dramatic performance drops on ADVETA, revealing models' vulnerability in real-world practices. To defend against ATP, we build a systematic adversarial training example generation framework tailored for better contextualization of tabular data. Experiments show that our approach not only brings the best robustness improvement against table-side perturbations but also substantially empowers models against NL-side perturbations. We release our benchmark and code at: https://github.com/microsoft/ContextualSP.
translated by 谷歌翻译
现有的研究表明,对抗性示例可以直接归因于具有高度预测性的非稳态特征的存在,但很容易被对手对愚弄NLP模型进行操纵。在这项研究中,我们探讨了捕获特定于任务的鲁棒特征的可行性,同时使用信息瓶颈理论消除了非舒适的特征。通过广泛的实验,我们表明,通过我们的信息基于瓶颈的方法训练的模型能够在稳健的精度上取得显着提高,超过了所有先前报道的防御方法的性能,而在SST-2上几乎没有遭受清洁准确性的表现下降,Agnews和IMDB数据集。
translated by 谷歌翻译
最近的研究表明,预训练的语言模型(LMS)容易受到文本对抗性攻击的影响。但是,现有的攻击方法要么遭受低攻击成功率,要么无法在指数级的扰动空间中有效搜索。我们提出了一个有效有效的框架Semattack,以通过构建不同的语义扰动函数来生成自然的对抗文本。特别是,Semattack优化了对通用语义空间约束的生成的扰动,包括错字空间,知识空间(例如WordNet),上下文化的语义空间(例如,BERT群集的嵌入空间)或这些空间的组合。因此,生成的对抗文本在语义上更接近原始输入。广泛的实验表明,最新的(SOTA)大规模LMS(例如Deberta-V2)和国防策略(例如Freelb)仍然容易受到Semattack的影响。我们进一步证明,Semattack是一般的,并且能够为具有较高攻击成功率的不同语言(例如英语和中文)生成自然的对抗文本。人类评估还证实,我们产生的对抗文本是自然的,几乎不会影响人类的表现。我们的代码可在https://github.com/ai-secure/semattack上公开获取。
translated by 谷歌翻译
最近的自然语言处理(NLP)技术在基准数据集中实现了高性能,主要原因是由于深度学习性能的显着改善。研究界的进步导致了最先进的NLP任务的生产系统的巨大增强,例如虚拟助理,语音识别和情感分析。然而,随着对抗性攻击测试时,这种NLP系统仍然仍然失败。初始缺乏稳健性暴露于当前模型的语言理解能力中的令人不安的差距,当NLP系统部署在现实生活中时,会产生问题。在本文中,我们通过以各种维度的系统方式概述文献来展示了NLP稳健性研究的结构化概述。然后,我们深入了解稳健性的各种维度,跨技术,指标,嵌入和基准。最后,我们认为,鲁棒性应该是多维的,提供对当前研究的见解,确定文学中的差距,以建议值得追求这些差距的方向。
translated by 谷歌翻译
自然语言视频本地化(NLVL)是视觉语言理解区域的重要任务,该方面还要求深入了解单独的计算机视觉和自然语言侧,但更重要的是两侧之间的相互作用。对抗性脆弱性得到了很好的认可,作为深度神经网络模型的关键安全问题,需要谨慎调查。尽管在视频和语言任务中进行了广泛但分开的研究,但目前对NLVL等愿景联合任务的对抗鲁棒性的理解较少。因此,本文旨在通过检查攻击和防御方面的三个脆弱性,全面调查NLVL模型的对抗性鲁棒性。为了实现攻击目标,我们提出了一种新的对抗攻击范式,称为同义句子感知对抗对抗攻击对逆向(潜行),这捕获了视觉和语言侧面之间的跨模式相互作用。
translated by 谷歌翻译
尽管现有的机器阅读理解模型在许多数据集上取得了迅速的进展,但它们远非强劲。在本文中,我们提出了一个面向理解的机器阅读理解模型,以解决三种鲁棒性问题,这些问题过于敏感,稳定性和泛化。具体而言,我们首先使用自然语言推理模块来帮助模型了解输入问题的准确语义含义,以解决过度敏感性和稳定性的问题。然后,在机器阅读理解模块中,我们提出了一种记忆引导的多头注意方法,该方法可以进一步很好地理解输入问题和段落的语义含义。第三,我们提出了一种多语言学习机制来解决概括问题。最后,这些模块与基于多任务学习的方法集成在一起。我们在三个旨在衡量模型稳健性的基准数据集上评估了我们的模型,包括Dureader(健壮)和两个与小队相关的数据集。广泛的实验表明,我们的模型可以很好地解决上述三种鲁棒性问题。而且,即使在某些极端和不公平的评估下,它也比所有这些数据集中所有这些数据集的最先进模型的结果要好得多。我们工作的源代码可在以下网址获得:https://github.com/neukg/robustmrc。
translated by 谷歌翻译
在这项工作中,我们以一种充满挑战的自我监督方法研究无监督的领域适应性(UDA)。困难之一是如何在没有目标标签的情况下学习任务歧视。与以前的文献直接使跨域分布或利用反向梯度保持一致,我们建议域混淆对比度学习(DCCL),以通过域难题桥接源和目标域,并在适应后保留歧视性表示。从技术上讲,DCCL搜索了最大的挑战方向,而精美的工艺领域将增强型混淆为正对,然后对比鼓励该模型向其他领域提取陈述,从而学习更稳定和有效的域名。我们还研究对比度学习在执行其他数据增强时是否必然有助于UDA。广泛的实验表明,DCCL明显优于基准。
translated by 谷歌翻译