语法误差校正(GEC)系统执行序列到序列任务,其中GEC系统校正了包含语法错误的输入单词序列,以输出语法正确的单词序列。随着深度学习方法的出现,自动化的GEC系统变得越来越流行。例如,GEC系统通常用于英语学习者的语音转录作为评估和反馈形式 - 这些强大的GEC系统可用于自动测量候选人流利度的一个方面。 \ textit {edits}的计数从候选人的输入句子(或论文)到GEC系统的语法校正输出句子,这表明候选人的语言能力,其中更少的编辑表明更好的流利度。因此,编辑计数可以被视为\ textit {fluency评分},零表示完美的流利度。但是,尽管基于深度学习的GEC系统非常强大和准确,但它们容易受到对抗性攻击:对手可以在系统的输入下引入一个小的,特定的更改,该系统在输出时会导致大型,不需要的变化。在考虑将GEC系统应用于自动化语言评估时,对手的目的可能是通过对语法上不正确的输入句子进行小改动来作弊,该句子隐藏了GEC系统中的错误被不公正地获得了完美的流利程度。这项工作研究了一种简单的普遍替代攻击攻击,非母语的英语说话者实际上可以采用欺骗用于评估的GEC系统。
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TEXTFOOLER, a simple but strong baseline to generate adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate three advantages of this framework:(1) effective-it outperforms previous attacks by success rate and perturbation rate, (2) utility-preserving-it preserves semantic content, grammaticality, and correct types classified by humans, and (3) efficient-it generates adversarial text with computational complexity linear to the text length. 1
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
The extent to which men and women use language differently has been questioned previously. Finding clear and consistent gender differences in language is not conclusive in general, and the research is heavily influenced by the context and method employed to identify the difference. In addition, the majority of the research was conducted in written form, and the sample was collected in writing. Therefore, we compared the word choices of male and female presenters in public addresses such as TED lectures. The frequency of numerous types of words, such as parts of speech (POS), linguistic, psychological, and cognitive terms were analyzed statistically to determine how male and female speakers use words differently. Based on our data, we determined that male speakers use specific types of linguistic, psychological, cognitive, and social words in considerably greater frequency than female speakers.
In this paper, we propose dictionary attacks against speaker verification - a novel attack vector that aims to match a large fraction of speaker population by chance. We introduce a generic formulation of the attack that can be used with various speech representations and threat models. The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population. The resulting master voice successfully matches a non-trivial fraction of people in an unknown population. Adversarial waveforms obtained with our approach can match on average 69% of females and 38% of males enrolled in the target system at a strict decision threshold calibrated to yield false alarm rate of 1%. By using the attack with a black-box voice cloning system, we obtain master voices that are effective in the most challenging conditions and transferable between speaker encoders. We also show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
词性(POS)标签的一部分在自然语言处理(NLP)中起重要作用。它的应用程序可以在许多NLP任务中找到,例如命名实体识别,句法解析,依赖性解析和文本块。在本文进行的调查中,我们利用了两个广泛使用的工具包的技术,即Clearnlp和Stanford Pos Tagger,以及为越南人开发了两个新的POS标签,然后将它们与三个著名的越南标签者进行比较,即vntagger和rdrpostagger。我们进行系统的比较,以找出具有最佳性能的标签器。我们还设计了一个新功能集来衡量统计标签者的性能。我们由Stanford Tagger和新功能集构建的新标签者可以在标记准确性方面胜过所有其他当前的越南标签。此外,我们还分析了某些功能对统计标签者的性能的感情。最后,实验结果还表明,基于转换的标签器Rdrpostagger的运行速度明显快于任何其他统计标签器。
In this paper we investigated two different methods to parse relative and noun complement clauses in English and resorted to distinct tags for their corresponding that as a relative pronoun and as a complementizer. We used an algorithm to relabel a corpus parsed with the GUM Treebank using Universal Dependency. Our second experiment consisted in using TreeTagger, a Probabilistic Decision Tree, to learn the distinction between the two complement and relative uses of postnominal "that". We investigated the effect of the training set size on TreeTagger accuracy and how representative the GUM Treebank files are for the two structures under scrutiny. We discussed some of the linguistic and structural tenets of the learnability of this distinction.
虽然端到端的神经机翻译(NMT)取得了令人印象深刻的进步,但嘈杂的输入通常会导致模型变得脆弱和不稳定。生成对抗性示例作为增强数据被证明是有用的,以减轻这个问题。对逆势示例生成(AEG)的现有方法是字级或字符级。在本文中,我们提出了一个短语级侵犯示例生成(PAEG)方法来增强模型的鲁棒性。我们的方法利用基于梯度的策略来替代源输入中的弱势位置的短语。我们在三个基准中验证了我们的方法,包括LDC中文 - 英语,IWSLT14德语,以及WMT14英语 - 德语任务。实验结果表明,与以前的方法相比,我们的方法显着提高了性能。
Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP). It is a well-studied topic in several resource-rich languages. However, the development of computational linguistic resources is still in its infancy despite the existence of numerous languages that are historically and literary rich. Assamese, an Indian scheduled language, spoken by more than 25 million people, falls under this category. In this paper, we present a Deep Learning (DL)-based POS tagger for Assamese. The development process is divided into two stages. In the first phase, several pre-trained word embeddings are employed to train several tagging models. This allows us to evaluate the performance of the word embeddings in the POS tagging task. The top-performing model from the first phase is employed to annotate another set of new sentences. In the second phase, the model is trained further using the fresh dataset. Finally, we attain a tagging accuracy of 86.52% in F1 score. The model may serve as a baseline for further study on DL-based Assamese POS tagging.
本文提出的方法是通过单个输入双语词典自动为低资源语言(尤其是资源贫乏的语言)创建大量新的双语词典。我们的算法使用可用的WordNets和Machine Translator(MT)生成了源语言的单词翻译为丰富的目标语言。由于我们的方法仅依赖于一个输入字典,可用的WordNet和MT,因此它们适用于任何双语词典,只要两种语言之一是英语,或者具有链接到Princeton WordNet的WordNet。从5个可用的双语词典开始,我们创建了48个新的双语词典。其中,流行的MTS不支持30双语言:Google和Bing。
手动构建WordNet是一项艰巨的任务,需要多年的专家时间。作为自动构建完整WordNet的第一步,我们建议使用公开可用的WordNet,机器翻译器和/或单语言词典来生成有关资源丰富和资源贫乏语言的WordNet Synset的方法。我们的算法将现有WordNet的合成器转换为目标语言t,然后在翻译候选者上应用排名方法以查找T中的最佳翻译。我们的方法适用于任何至少有一个从英语翻译到它的现有双语字典的语言。
Chronic pain is a multi-dimensional experience, and pain intensity plays an important part, impacting the patients emotional balance, psychology, and behaviour. Standard self-reporting tools, such as the Visual Analogue Scale for pain, fail to capture this burden. Moreover, this type of tools is susceptible to a degree of subjectivity, dependent on the patients clear understanding of how to use it, social biases, and their ability to translate a complex experience to a scale. To overcome these and other self-reporting challenges, pain intensity estimation has been previously studied based on facial expressions, electroencephalograms, brain imaging, and autonomic features. However, to the best of our knowledge, it has never been attempted to base this estimation on the patient narratives of the personal experience of chronic pain, which is what we propose in this work. Indeed, in the clinical assessment and management of chronic pain, verbal communication is essential to convey information to physicians that would otherwise not be easily accessible through standard reporting tools, since language, sociocultural, and psychosocial variables are intertwined. We show that language features from patient narratives indeed convey information relevant for pain intensity estimation, and that our computational models can take advantage of that. Specifically, our results show that patients with mild pain focus more on the use of verbs, whilst moderate and severe pain patients focus on adverbs, and nouns and adjectives, respectively, and that these differences allow for the distinction between these three pain classes.
我们采用自然语言处理技术来分析“ 200万首歌数据库”语料库中的377808英文歌曲歌词,重点是五十年(1960- 2010年)的性别歧视表达和性别偏见的测量。使用性别歧视分类器,我们比以前的研究使用手动注释的流行歌曲样本来确定性别歧视歌词。此外,我们通过测量在歌曲歌词中学到的单词嵌入中的关联来揭示性别偏见。我们发现性别歧视的内容可以在整个时间内增加,尤其是从男性艺术家和出现在Billboard图表中的流行歌曲。根据表演者的性别,歌曲还包含不同的语言偏见,男性独奏艺术家歌曲包含更多和更强烈的偏见。这是对此类类型的第一个大规模分析,在流行文化的如此有影响力的一部分中,可以深入了解语言使用。
最先进的文本分类模型越来越依赖深度神经网络(DNNS)。由于其黑框的性质,忠实而强大的解释方法需要陪同分类器在现实生活中进行部署。但是,在视力应用中已经显示出解释方法对局部,不可察觉的扰动敏感,这些方法可以显着改变解释而不会改变预测类。我们在这里表明,这种扰动的存在也扩展到文本分类器。具体来说,我们介绍了一种新颖的解释攻击算法,它不概论地改变了文本输入样本,以使广泛使用的解释方法的结果发生了很大变化,而在使分类器预测不变。我们在五个序列分类数据集上评估了TEF归因鲁棒性估计性能的性能,并利用每个数据集的三个DNN体系结构和三个变压器体系结构。 TEF可以显着降低未改变和扰动输入归因之间的相关性,这表明所有模型和解释方法都易受TEF扰动的影响。此外,我们评估了扰动如何传输到其他模型架构和归因方法,并表明TEF扰动在目标模型和解释方法未知的情况下也有效。最后,我们引入了一种半世界攻击,能够在不了解受攻击的分类器和解释方法的情况下计算快速,计算轻度扰动。总体而言,我们的工作表明,文本分类器中的解释非常脆弱,用户需要仔细解决其鲁棒性,然后才能在关键应用程序中依靠它们。
