我们将拓扑分析的方法应用于注意力图,该方法是根据BERT模型的注意力头(ARXIV:1810.04805V2)计算的。我们的研究表明,基于训练有素的神经网络的基本持久拓扑特征(即Betti数字)构建的分类器可以与常规分类方法达到分类结果。我们在三个文本分类基准上展示了此类拓扑文本表示形式的相关性。据我们所知,这是分析基于注意力的神经网络拓扑的首次尝试,该网络广泛用于自然语言处理。
translated by 谷歌翻译
近年来,变压器模型的引入引发了自然语言处理(NLP)的革命。伯特(Bert)是仅使用注意机制的第一批文本编码者之一,没有任何复发部分来实现许多NLP任务的最新结果。本文使用拓扑数据分析介绍了文本分类器。我们将BERT的注意图转换为注意图作为该分类器的唯一输入。该模型可以解决诸如将垃圾邮件与HAM消息区分开的任务,认识到语法正确的句子,或将电影评论评估为负面还是正面。它与BERT基线相当表现,并在某些任务上表现优于它。此外,我们提出了一种新方法,以减少拓扑分类器考虑的BERT注意力头的数量,这使我们能够修剪从144个下降到只有10个,而不会降低性能。我们的工作还表明,拓扑模型比原始的BERT模型表现出对对抗性攻击的鲁棒性,该模型在修剪过程中维持。据我们所知,这项工作是第一个在NLP背景下以对抗性攻击的基于拓扑的模型。
translated by 谷歌翻译
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction.
translated by 谷歌翻译
情绪分析的主要方法是基于规则的方法和MA-CHINE学习,特别是具有伯特架构的跨前架构的深神经网络模型,包括伯特。神经网络模型在情感分析任务中的性能优于基于规则的方法的性能。由于深度神经网络模型的可辨运性差,这种情况的原因仍不明确。理解两种方法之间的基本差异的主要键之一是在神经网络模型中考虑情绪词典的分析。为此,我们研究了俄语rubert模型的注意力矩阵。我们在情感文本语料库上进行微调rubert,并比较注意力和中性词典的注意力分布。事实证明,与中性的相比,平均而言,各种模型Var-IANTS的3/4的头部统计上会更加关注情绪词典。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
基于方面的情绪分析(ABSA)是一种文本分析方法,其定义了与特定目标相关的某些方面的意见的极性。 ABSA的大部分研究都是英文,阿拉伯语有少量的工作。最先前的阿拉伯语研究依赖于深度学习模型,主要依赖于独立于上下文的单词嵌入(例如,e.g.word2vec),其中每个单词都有一个独立于其上下文的固定表示。本文探讨了从预先培训的语言模型(如BERT)的上下文嵌入的建模功能,例如BERT,以及在阿拉伯语方面情感极度分类任务中使用句子对输入。特别是,我们开发一个简单但有效的基于伯特的神经基线来处理这项任务。根据三种不同阿拉伯语数据集的实验结果,我们的BERT架构与简单的线性分类层超出了最先进的作品。在Arabic Hotel评论数据库中实现了89.51%的准确性,73%的人类注册书评论数据集和阿拉伯新闻数据集的85.73%。
translated by 谷歌翻译
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets. 1
translated by 谷歌翻译
鉴于将语言模型转移到NLP任务的成功,我们询问全BERT模型是否始终是最好的,并且它存在一个简单但有效的方法,可以在没有的最先进的深神经网络中找到获胜的票复杂的计算。我们构建了一系列基于BERT的模型,具有不同的大小,并对8个二进制分类任务进行比较。结果表明,真正存在的较小的子网比完整模型更好。然后我们提供进一步的研究,并提出一种简单的方法在微调之前适当地收缩斜率。一些扩展实验表明,我们的方法可以省略甚至没有准确性损失的时间和存储开销。
translated by 谷歌翻译
Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention. 1 Code will be released at https://github.com/ clarkkev/attention-analysis.2 We use the English base-sized model.
translated by 谷歌翻译
对比的学习技术已广泛用于计算机视野中作为增强数据集的手段。在本文中,我们将这些对比学习嵌入的使用扩展到情绪分析任务,并证明了对这些嵌入的微调在基于BERT的嵌入物上的微调方面提供了改进,以在评估时实现更高的基准。在Dynasent DataSet上。我们还探讨了我们的微调模型在跨域基准数据集上执行的。此外,我们探索了ups采样技术,以实现更平衡的班级分发,以进一步改进我们的基准任务。
translated by 谷歌翻译
考虑了基于高维预测器的模式识别。定义了基于变压器编码器的分类器。分析了分类器朝向最佳错误分类概率的分类器的错误分类概率的收敛速率。结果表明,该分类器能够规避维度的诅咒,只要血管升性概率满足合适的分层组成模型。此外,通过考虑自然语言处理中的分类问题,理论上地在本文中地分析的变压器分类器之间的变压器分类器之间的差异,通过考虑自然语言处理中的分类问题来说明。
translated by 谷歌翻译
Bert在文本分类任务中取得了显着的结果,但尚未完全利用它,因为仅将最后一层用作下游分类器的表示输出。关于伯特学到的语言特征性质的最新研究表明,不同的层集中在不同种类的语言特征上。我们提出了一个CNN增强的变压器编码器模型,该模型在固定的bert $ [cls] $顶部进行了训练,来自所有层的表示,采用卷积神经网络来生成变压器编码器内的QKV功能映射,而不是线性的输入投影,进入嵌入空间。 CNN-Trans-enc相对较小,因为下游分类器,并且不需要对Bert进行任何微调,因为它可以确保从所有层中的$ [CLS] $表示的最佳使用,从而利用具有更有意义,更有意义,更有意义,更有意义的语言功能和输入的可推广QKV表示。将BERT与CNN-Trans-enc一起使用$ 98.9 \%$和$ 94.8 \%\%$ $ $ $ $ -5,$ 82.23 $($ 8.9 \%$改善),在亚马逊极性上,$ 0.98 \%$($ 0.2 \%$改进)(来自两个数据集的100万个样本子集的K倍交叉验证)。在AG新闻数据集中,CNN-Trans-enc在当前最新的$ 99.94 \%$中,并在DBPEDIA-14上获得了新的最高绩效,平均准确性为99.51美元\%$。索引术语:文本分类,自然语言处理,卷积神经网络,变压器,伯特
translated by 谷歌翻译
Hebrew is a Morphological rich language, making its modeling harder than simpler language. Recent developments such as Transformers in general and Bert in particular opened a path for Hebrew models that reach SOTA results, not falling short from other non-MRL languages. We explore the cutting edge in this field performing style transfer, text generation and classification over news articles collected from online archives. Furthermore, the news portals that feed our collective consciousness are an interesting corpus to study, as their analysis and tracing might reveal insights about our society and discourse.
translated by 谷歌翻译
特洛伊木马攻击引起了严重的安全问题。在本文中,我们研究了Trojaned Bert模型的潜在机制。我们观察到木马模型的注意力焦点漂移行为,即,在遇到中毒输入时,触发令牌劫持了注意力的焦点,无论上下文如何。我们对这种现象提供了彻底的定性和定量分析,揭示了对特洛伊木马机制的见解。基于观察结果,我们提出了一个基于注意力的特洛伊木马检测器,以将木马模型与干净的模型区分开。据我们所知,这是第一篇分析特洛伊木马机制并根据变压器的注意力开发特洛伊木马检测器的论文。
translated by 谷歌翻译
Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resourcerestricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large "teacher" BERT can be effectively transferred to a small "student" Tiny-BERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pretraining and task-specific learning stages. This framework ensures that TinyBERT can capture the general-domain as well as the task-specific knowledge in BERT. TinyBERT 41 with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERT BASE on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT 4 is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only ∼28% parameters and ∼31% inference time of them. Moreover, TinyBERT 6 with 6 layers performs on-par with its teacher BERT BASE .
translated by 谷歌翻译
Short text classification is a crucial and challenging aspect of Natural Language Processing. For this reason, there are numerous highly specialized short text classifiers. However, in recent short text research, State of the Art (SOTA) methods for traditional text classification, particularly the pure use of Transformers, have been unexploited. In this work, we examine the performance of a variety of short text classifiers as well as the top performing traditional text classifier. We further investigate the effects on two new real-world short text datasets in an effort to address the issue of becoming overly dependent on benchmark datasets with a limited number of characteristics. Our experiments unambiguously demonstrate that Transformers achieve SOTA accuracy on short text classification tasks, raising the question of whether specialized short text techniques are necessary.
translated by 谷歌翻译
句子嵌入方法有许多成功的应用。但是,根据监督信号,在结果句子嵌入中捕获了哪些属性。在本文中,我们专注于具有相似体系结构和任务的两种类型的嵌入方法:一种关于自然语言推理任务的微型预训练的语言模型,以及其他微型训练的训练语言模型在单词预测任务上根据其定义句子,并研究其属性。具体而言,我们使用两个角度分区的STS数据集比较他们在语义文本相似性(STS)任务上的性能:1)句子源和2)句子对的表面相似性,并在下游和探测任务上比较其表现。此外,我们尝试结合两种方法,并证明将两种方法组合起来比无监督的STS任务和下游任务的各自方法的性能要好得多。
translated by 谷歌翻译
美国食品药品监督管理局(FDA)推荐的产品特定指南(PSG)对促进和指导通用药物产品开发有助于。为了评估PSG,FDA评估者需要花费大量时间和精力来手动从参考列出的药物标签中手动检索吸收,分布,代谢和排泄(ADME)的支持性药物信息。在这项工作中,我们利用最先进的预训练的语言模型自动将来自FDA批准的药物标签的药代动力学部分中的ADME段落标记,以促进PSG评估。我们通过微调从变形金刚(BERT)模型的预训练的双向编码器表示,采用了转移学习方法来开发新颖的ADME语义标签应用,可以自动从药物标签中自动检索ADME段落而不是手动工作。我们证明,对预训练的BERT模型进行微调可以胜过传统的机器学习技术,实现高达11.6%的绝对F1改进。据我们所知,我们是第一个成功应用BERT来解决ADME语义标签任务的人。我们进一步评估了使用一系列分析方法,例如注意力相似性和基于层的消融,进一步评估了预训练和微调对BERT模型整体性能的相对贡献。我们的分析表明,通过微调学到的信息集中在BERT的顶层中的特定于任务知识上,而预先训练的BERT模型的好处来自底层。
translated by 谷歌翻译
预期观众对某些文本的反应是社会的几个方面不可或缺的,包括政治,研究和商业行业。情感分析(SA)是一种有用的自然语言处理(NLP)技术,它利用词汇/统计和深度学习方法来确定不同尺寸的文本是否表现出正面,负面或中立的情绪。但是,目前缺乏工具来分析独立文本的组并从整体中提取主要情感。因此,当前的论文提出了一种新型算法,称为多层推文分析仪(MLTA),该算法使用多层网络(MLN)以图形方式对社交媒体文本进行了图形方式,以便更好地编码跨越独立的推文集的关系。与其他表示方法相比,图结构能够捕获复杂生态系统中有意义的关系。最先进的图形神经网络(GNN)用于从Tweet-MLN中提取信息,并根据提取的图形特征进行预测。结果表明,与标准的正面,负或中性相比,MLTA不仅可以从更大的可能情绪中预测,从而提供了更准确的情感,还允许对Twitter数据进行准确的组级预测。
translated by 谷歌翻译
已经研究了代表文本作为获取自动文本摘要的图形的图形已有十多年了。随着对自然语言处理(NLP)的关注或变压器的发展,可以在文本的图和注意结构之间建立联系。在本文中,整个文本的句子之间的注意力矩阵被用作文本完全连接图的加权相邻矩阵,可以通过预训练的语言模型产生。GCN进一步应用于文本图模型,以分类每个节点并从文本中找出显着句子。在两个典型数据集上的实验结果证明了这一点,我们提出的模型可以与现有模型相比获得竞争成果。
translated by 谷歌翻译