Many NLP tasks can be regarded as a selection problem from a set of options, such as classification tasks, multi-choice question answering, etc. Textual entailment (TE) has been shown as the state-of-the-art (SOTA) approach to dealing with those selection problems. TE treats input texts as premises (P), options as hypotheses (H), then handles the selection problem by modeling (P, H) pairwise. Two limitations: first, the pairwise modeling is unaware of other options, which is less intuitive since humans often determine the best options by comparing competing candidates; second, the inference process of pairwise TE is time-consuming, especially when the option space is large. To deal with the two issues, this work first proposes a contextualized TE model (Context-TE) by appending other k options as the context of the current (P, H) modeling. Context-TE is able to learn more reliable decision for the H since it considers various context. Second, we speed up Context-TE by coming up with Parallel-TE, which learns the decisions of multiple options simultaneously. Parallel-TE significantly improves the inference speed while keeping comparable performance with Context-TE. Our methods are evaluated on three tasks (ultra-fine entity typing, intent detection and multi-choice QA) that are typical selection problems with different sizes of options. Experiments show our models set new SOTA performance; particularly, Parallel-TE is faster than the pairwise TE by k times in inference. Our code is publicly available at https://github.com/jiangshdd/LearningToSelect.
translated by 谷歌翻译
Event extraction (EE) is the task of identifying interested event mentions from text. Conventional efforts mainly focus on the supervised setting. However, these supervised models cannot generalize to event types out of the pre-defined ontology. To fill this gap, many efforts have been devoted to the zero-shot EE problem. This paper follows the trend of modeling event-type semantics but moves one step further. We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately. To model the definition semantics, we use two separate transformer models to project the contextualized event mentions and corresponding definitions into the same embedding space and then minimize their embedding distance via contrastive learning. On top of that, we also propose a warming phase to help the model learn the minor difference between similar definitions. We name our approach Zero-shot Event extraction with Definition (ZED). Experiments on the MAVEN dataset show that our model significantly outperforms all previous zero-shot EE methods with fast inference speed due to the disjoint design. Further experiments also show that ZED can be easily applied to the few-shot setting when the annotation is available and consistently outperforms baseline supervised methods.
translated by 谷歌翻译
Ultra-fine entity typing (UFET) predicts extremely free-formed types (e.g., president, politician) of a given entity mention (e.g., Joe Biden) in context. State-of-the-art (SOTA) methods use the cross-encoder (CE) based architecture. CE concatenates the mention (and its context) with each type and feeds the pairs into a pretrained language model (PLM) to score their relevance. It brings deeper interaction between mention and types to reach better performance but has to perform N (type set size) forward passes to infer types of a single mention. CE is therefore very slow in inference when the type set is large (e.g., N = 10k for UFET). To this end, we propose to perform entity typing in a recall-expand-filter manner. The recall and expand stages prune the large type set and generate K (K is typically less than 256) most relevant type candidates for each mention. At the filter stage, we use a novel model called MCCE to concurrently encode and score these K candidates in only one forward pass to obtain the final type prediction. We investigate different variants of MCCE and extensive experiments show that MCCE under our paradigm reaches SOTA performance on ultra-fine entity typing and is thousands of times faster than the cross-encoder. We also found MCCE is very effective in fine-grained (130 types) and coarse-grained (9 types) entity typing. Our code is available at \url{https://github.com/modelscope/AdaSeq/tree/master/examples/MCCE}.
translated by 谷歌翻译
转移学习技术和预先培训的最新进展,大型上下文编码器在包括对话助理在内的现实应用程序中促进了创新。意图识别的实际需求需要有效的数据使用,并能够不断更新支持意图,采用新的意图并放弃过时的意图。尤其是,对模型的广义零拍范例,该模型受到了可见意图的训练并在可见和看不见的意图上进行了测试,这是新的重要性。在本文中,我们探讨了用于意图识别的广义零拍设置。遵循零击文本分类的最佳实践,我们使用句子对建模方法对待任务。对于看不见的意图,使用意图标签和用户话语,而无需访问外部资源(例如知识库),我们的表现优于先前的最先进的F1量化,最多可达16 \%。进一步的增强包括意图标签的词汇化,可提高性能高达7%。通过使用从其他句子对任务(例如自然语言推论)转移的任务传输,我们会获得其他改进。
translated by 谷歌翻译
学习高质量的对话表示对于解决各种面向对话的任务至关重要,尤其是考虑到对话系统通常会遇到数据稀缺。在本文中,我们介绍了对话句子嵌入(DSE),这是一种自我监督的对比学习方法,它学习有效的对话表示,适合各种对话任务。 DSE通过连续进行与对比度学习的正面对话的连续对话来从对话中学习。尽管它很简单,但DSE的表现能力比其他对话表示和普遍的句子表示模型要好得多。我们评估DSE的五个下游对话任务,这些任务检查了不同语义粒度的对话表示。几次射击和零射击设置的实验表明,DSE的表现要优于基线。例如,它在6个数据集中的1-Shot意图分类中比最强的无监督基线实现了13%的平均绩效提高。我们还提供了有关模型的好处和局限性的分析。
translated by 谷歌翻译
We present Pre-trained Machine Reader (PMR), a novel method to retrofit Pre-trained Language Models (PLMs) into Machine Reading Comprehension (MRC) models without acquiring labeled data. PMR is capable of resolving the discrepancy between model pre-training and downstream fine-tuning of existing PLMs, and provides a unified solver for tackling various extraction tasks. To achieve this, we construct a large volume of general-purpose and high-quality MRC-style training data with the help of Wikipedia hyperlinks and design a Wiki Anchor Extraction task to guide the MRC-style pre-training process. Although conceptually simple, PMR is particularly effective in solving extraction tasks including Extractive Question Answering and Named Entity Recognition, where it shows tremendous improvements over previous approaches especially under low-resource settings. Moreover, viewing sequence classification task as a special case of extraction task in our MRC formulation, PMR is even capable to extract high-quality rationales to explain the classification process, providing more explainability of the predictions.
translated by 谷歌翻译
预训练的语言模型在对话任务上取得了长足的进步。但是,这些模型通常在表面对话文本上进行训练,因此被证明在理解对话环境的主要语义含义方面是薄弱的。我们研究抽象含义表示(AMR)作为预训练模型的明确语义知识,以捕获预训练期间对话中的核心语义信息。特别是,我们提出了一个基于语义的前训练框架,该框架通过三个任务来扩展标准的预训练框架(Devlin等,2019)。根据AMR图表示。关于聊天聊天和面向任务的对话的理解的实验表明了我们的模型的优势。据我们所知,我们是第一个利用深层语义表示进行对话预训练的人。
translated by 谷歌翻译
跨度提取,旨在从纯文本中提取文本跨度(如单词或短语),是信息提取中的基本过程。最近的作品介绍了通过将跨度提取任务正式化为问题(QA正式化)的跨度提取任务来提高文本表示,以实现最先进的表现。然而,QA正规化并没有充分利用标签知识并遭受培训/推理的低效率。为了解决这些问题,我们介绍了一种新的范例来整合标签知识,并进一步提出一个小说模型,明确有效地将标签知识集成到文本表示中。具体而言,它独立地编码文本和标签注释,然后将标签知识集成到文本表示中,并使用精心设计的语义融合模块进行文本表示。我们在三个典型的跨度提取任务中进行广泛的实验:扁平的网,嵌套网和事件检测。实证结果表明,我们的方法在四个基准测试中实现了最先进的性能,而且分别将培训时间和推理时间降低76%和77%,与QA形式化范例相比。我们的代码和数据可在https://github.com/apkepers/lear中获得。
translated by 谷歌翻译
Two key obstacles in biomedical relation extraction (RE) are the scarcity of annotations and the prevalence of instances without explicitly pre-defined labels due to low annotation coverage. Existing approaches, which treat biomedical RE as a multi-class classification task, often result in poor generalization in low-resource settings and do not have the ability to make selective prediction on unknown cases but give a guess from seen relations, hindering the applicability of those approaches. We present NBR, which converts biomedical RE as natural language inference formulation through indirect supervision. By converting relations to natural language hypotheses, NBR is capable of exploiting semantic cues to alleviate annotation scarcity. By incorporating a ranking-based loss that implicitly calibrates abstinent instances, NBR learns a clearer decision boundary and is instructed to abstain on uncertain instances. Extensive experiments on three widely-used biomedical RE benchmarks, namely ChemProt, DDI and GAD, verify the effectiveness of NBR in both full-set and low-resource regimes. Our analysis demonstrates that indirect supervision benefits biomedical RE even when a domain gap exists, and combining NLI knowledge with biomedical knowledge leads to the best performance gains.
translated by 谷歌翻译
Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code and experiment details of this paper can be obtained from https:// github.com/thunlp/ERNIE.
translated by 谷歌翻译
我们介绍了精致的,这是一种有效的端到端实体链接模型,该模型使用精细的实体类型和实体描述来执行链接。该模型执行提及的检测,细粒实体键入以及单个向前传球中文档中所有提及的实体歧义,使其比现有方法快60倍以上。精制还超过了标准实体链接数据集的最先进性能,平均比3.7 F1。该模型能够将其推广到大规模的知识库,例如Wikidata(其实体是Wikipedia的15倍)和零拍的实体链接。速度,准确性和规模的结合使精制成为从网络规模数据集中提取实体的有效且具有成本效益的系统,该数据集已成功部署该模型。我们的代码和预培训模型可在https://github.com/alexa/refined上找到
translated by 谷歌翻译
预先接受的语言模型实现了最先进的导致各种自然语言处理(NLP)任务。 GPT-3表明,缩放预先训练的语言模型可以进一步利用它们的巨大潜力。最近提出了一个名为Ernie 3.0的统一框架,以预先培训大型知识增强型号,并培训了具有10亿参数的模型。 Ernie 3.0在各种NLP任务上表现出最先进的模型。为了探讨缩放的表现,我们培养了百卢比的3.0泰坦参数型号,在PaddlePaddle平台上有高达260亿参数的泰坦。此外,我们设计了一种自我监督的对抗性损失和可控语言建模损失,以使ERNIE 3.0 TITAN产生可信和可控的文本。为了减少计算开销和碳排放,我们向Ernie 3.0泰坦提出了一个在线蒸馏框架,教师模型将同时教授学生和培训。埃塞尼3.0泰坦是迄今为止最大的中国密集预训练模型。经验结果表明,Ernie 3.0泰坦在68个NLP数据集中优于最先进的模型。
translated by 谷歌翻译
文本逻辑推理,尤其是具有逻辑推理的问题答案(QA)任务,需要对特定逻辑结构的认识。段落级别的逻辑关系代表了命题单位之间的必要或矛盾(例如,结论性句子)。但是,由于当前的质量检查系统专注于基于实体的关系,因此无法探索此类结构。在这项工作中,我们提出了逻辑结构构成建模,以解决逻辑推理质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量质量请参见。网络执行两个过程:(1)利用在线话语连接以及通用逻辑理论的逻辑图构造,(2)通过图形网络学习产生结构性逻辑特征的逻辑表示。该管道应用于一般编码器,其基本功能与高级逻辑功能相结合,以进行答案预测。在三个文本逻辑推理数据集上进行的实验证明了dagns内置的逻辑结构的合理性以及学到的逻辑特征的有效性。此外,零射传输结果显示了特征的通用性,可看不见的逻辑文本。
translated by 谷歌翻译
旨在从非结构化文本中提取结构信息的知识提取(KE)通常会遭受数据稀缺性和新出现的看不见类型,即低资源场景。许多低资源KE的神经方法已广泛研究并取得了令人印象深刻的表现。在本文中,我们在低资源场景中介绍了对KE的文献综述,并将现有作品分为三个范式:(1)利用更高的资源数据,(2)利用更强的模型,(3)利用数据和模型一起。此外,我们描述了有前途的应用,并概述了未来研究的一些潜在方向。我们希望我们的调查能够帮助学术和工业界更好地理解这一领域,激发更多的想法并提高更广泛的应用。
translated by 谷歌翻译
We propose P4E, an identify-and-localize event detection framework that integrates the best of few-shot prompting and structured prediction. Our framework decomposes event detection into an identification task and a localization task. For the identification task, which we formulate as multi-label classification, we leverage cloze-based prompting to align our objective with the pre-training task of language models, allowing our model to quickly adapt to new event types. We then employ an event type-agnostic sequence labeling model to localize the event trigger conditioned on the identification output. This heterogeneous model design allows P4E to quickly learn new event types without sacrificing the ability to make structured predictions. Our experiments demonstrate the effectiveness of our proposed design, and P4E shows superior performance for few-shot event detection on benchmark datasets FewEvent and MAVEN and comparable performance to SOTA for fully-supervised event detection on ACE.
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
与伯特(Bert)等语言模型相比,已证明知识增强语言表示的预培训模型在知识基础构建任务(即〜关系提取)中更有效。这些知识增强的语言模型将知识纳入预训练中,以生成实体或关系的表示。但是,现有方法通常用单独的嵌入表示每个实体。结果,这些方法难以代表播出的实体和大量参数,在其基础代币模型之上(即〜变压器),必须使用,并且可以处理的实体数量为由于内存限制,实践限制。此外,现有模型仍然难以同时代表实体和关系。为了解决这些问题,我们提出了一个新的预培训模型,该模型分别从图书中学习实体和关系的表示形式,并分别在文本中跨越跨度。通过使用SPAN模块有效地编码跨度,我们的模型可以代表实体及其关系,但所需的参数比现有模型更少。我们通过从Wikipedia中提取的知识图对我们的模型进行了预训练,并在广泛的监督和无监督的信息提取任务上进行了测试。结果表明,我们的模型比基线学习对实体和关系的表现更好,而在监督的设置中,微调我们的模型始终优于罗伯塔,并在信息提取任务上取得了竞争成果。
translated by 谷歌翻译
为了减少人际关系提取(RE)任务的注释,提出了遥远的监督方法,同时却在低性能方面挣扎。在这项工作中,我们提出了一个新颖的DSRE-NLI框架,该框架既考虑了现有知识库的遥远监督,又考虑了对其他任务的预读语言模型的间接监督。 DSRE-NLI通过半自动关系语言(SARV)机制为现成的自然语言推理(NLI)发动机充满电,以提供间接的监督并进一步巩固远处注释以使多型分类重新模型受益。基于NLI的间接监督仅获取一个从人类的关系模板作为每个关系的语义通用模板,然后模板集由高质量的文本模式富集,从遥远的注释的语料库中自动开采。通过两种简单有效的数据整合策略,培训数据的质量得到了显着提高。广泛的实验表明,所提出的框架可显着改善远距离监督的RE基准数据集上的SOTA性能(最高为F1的7.73%)。
translated by 谷歌翻译
The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.
translated by 谷歌翻译
Script event prediction aims to predict the subsequent event given the context. This requires the capability to infer the correlations between events. Recent works have attempted to improve event correlation reasoning by using pretrained language models and incorporating external knowledge~(e.g., discourse relations). Though promising results have been achieved, some challenges still remain. First, the pretrained language models adopted by current works ignore event-level knowledge, resulting in an inability to capture the correlations between events well. Second, modeling correlations between events with discourse relations is limited because it can only capture explicit correlations between events with discourse markers, and cannot capture many implicit correlations. To this end, we propose a novel generative approach for this task, in which a pretrained language model is fine-tuned with an event-centric pretraining objective and predicts the next event within a generative paradigm. Specifically, we first introduce a novel event-level blank infilling strategy as the learning objective to inject event-level knowledge into the pretrained language model, and then design a likelihood-based contrastive loss for fine-tuning the generative model. Instead of using an additional prediction layer, we perform prediction by using sequence likelihoods generated by the generative model. Our approach models correlations between events in a soft way without any external knowledge. The likelihood-based prediction eliminates the need to use additional networks to make predictions and is somewhat interpretable since it scores each word in the event. Experimental results on the multi-choice narrative cloze~(MCNC) task demonstrate that our approach achieves better results than other state-of-the-art baselines. Our code will be available at \url{https://github.com/zhufq00/mcnc}.
translated by 谷歌翻译