命名实体识别是定位和分类文本中的实体的任务。但是,NER数据集中未标记的实体问题严重阻碍了NER性能的改善。本文建议SCL-RAI解决这个问题。首先,我们通过基于跨度的对比学习来减少相同标签的跨度表示的距离,同时为不同的标签增加了跨度表示,从而减轻了实体之间的歧义并提高了模型对未标记的实体的稳健性。然后,我们提出检索增强推理,以减轻决策边界转移问题。我们的方法在两个现实世界数据集上大大优于先前的SOTA方法的F1分数4.21%和8.64%。
translated by 谷歌翻译
命名实体识别(NER)是自然语言处理中的重要任务。但是,传统的监督NER需要大规模注释的数据集。提出了远处的监督以减轻对数据集的巨大需求,但是以这种方式构建的数据集非常嘈杂,并且存在严重的未标记实体问题。交叉熵(CE)损耗函数对未标记的数据高度敏感,从而导致严重的性能降解。作为替代方案,我们提出了一种称为NRCES的新损失函数,以应对此问题。Sigmoid项用于减轻噪声的负面影响。此外,我们根据样品和训练过程平衡模型的收敛性和噪声耐受性。关于合成和现实世界数据集的实验表明,在严重的未标记实体问题的情况下,我们的方法表现出强大的鲁棒性,从而实现了现实世界数据集的新最新技术。
translated by 谷歌翻译
我们为指定实体识别(NER)提出了一个有效的双重编码框架,该框架将对比度学习用于映射候选文本跨度,并将实体类型映射到同一矢量表示空间中。先前的工作主要将NER作为序列标记或跨度分类。相反,我们将NER视为一个度量学习问题,它最大程度地提高了实体提及的向量表示之间的相似性及其类型。这使得易于处理嵌套和平坦的ner,并且可以更好地利用嘈杂的自我诉讼信号。 NER对本双重编码器制定的主要挑战在于将非实体跨度与实体提及分开。我们没有明确标记所有非实体跨度为外部(O)与大多数先前方法相同的类别(O),而是引入了一种新型的动态阈值损失,这与标准的对比度损失一起学习。实验表明,我们的方法在受到监督和远处有监督的设置中的表现良好(例如,Genia,NCBI,BC5CDR,JNLPBA)。
translated by 谷歌翻译
While Named Entity Recognition (NER) is a widely studied task, making inferences of entities with only a few labeled data has been challenging, especially for entities with nested structures. Unlike flat entities, entities and their nested entities are more likely to have similar semantic feature representations, drastically increasing difficulties in classifying different entity categories in the few-shot setting. Although prior work has briefly discussed nested structures in the context of few-shot learning, to our best knowledge, this paper is the first one specifically dedicated to studying the few-shot nested NER task. Leveraging contextual dependency to distinguish nested entities, we propose a Biaffine-based Contrastive Learning (BCL) framework. We first design a Biaffine span representation module for learning the contextual span dependency representation for each entity span rather than only learning its semantic representation. We then merge these two representations by the residual connection to distinguish nested entities. Finally, we build a contrastive learning framework to adjust the representation distribution for larger margin boundaries and more generalized domain transfer learning ability. We conducted experimental studies on three English, German, and Russian nested NER datasets. The results show that the BCL outperformed three baseline models on the 1-shot and 5-shot tasks in terms of F1 score.
translated by 谷歌翻译
对于指定的实体识别(NER),基于序列标签和基于跨度的范例大不相同。先前的研究表明,这两个范式具有明显的互补优势,但是据我们所知,很少有模型试图在单个NER模型中利用这些优势。在我们以前的工作中,我们提出了一种称为捆绑学习(BL)的范式来解决上述问题。 BL范式将两个NER范式捆绑在一起,从而使NER模型通过加权总结每个范式的训练损失来共同调整其参数。但是,三个关键问题仍未解决:BL何时起作用? BL为什么工作? BL可以增强现有的最新(SOTA)NER模型吗?为了解决前两个问题,我们实施了三个NER模型,涉及一个基于序列标签的模型-Seqner,Seqner,一个基于跨度的NER模型 - 机器人,以及将Seqner和Spanner捆绑在一起的BL-NER。我们根据来自五个域的11个NER数据集的实验结果得出两个关于这两个问题的结论。然后,我们将BL应用于现有的五个SOTA NER模型,以研究第三期,包括三个基于序列标签的模型和两个基于SPAN的模型。实验结果表明,BL始终提高其性能,表明可以通过将BL纳入当前的SOTA系统来构建新的SOTA NER系统。此外,我们发现BL降低了实体边界和类型预测错误。此外,我们比较了两种常用的标签标签方法以及三种类型的跨度语义表示。
translated by 谷歌翻译
Relation extraction (RE), which has relied on structurally annotated corpora for model training, has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning, where the solution involves pretraining the relation embedding by RE-based objective and finetuning on labeled data by classification-based objective. However, a critical challenge to this approach is the gap in objectives, which prevents the RE model from fully utilizing the knowledge in pretrained representations. In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning. Since in this kind of representation learning paradigm, one relation may easily form multiple clusters in the representation space, we further propose a multi-center contrastive loss that allows one relation to form multiple clusters to better align with pretraining. Experiments on two document-level RE datasets, BioRED and Re-DocRED, demonstrate the effectiveness of our method. Particularly, when using 1% end-task training data, our method outperforms PLM-based RE classifier by 10.5% and 5.8% on the two datasets, respectively.
translated by 谷歌翻译
跨度提取,旨在从纯文本中提取文本跨度(如单词或短语),是信息提取中的基本过程。最近的作品介绍了通过将跨度提取任务正式化为问题(QA正式化)的跨度提取任务来提高文本表示,以实现最先进的表现。然而,QA正规化并没有充分利用标签知识并遭受培训/推理的低效率。为了解决这些问题,我们介绍了一种新的范例来整合标签知识,并进一步提出一个小说模型,明确有效地将标签知识集成到文本表示中。具体而言,它独立地编码文本和标签注释,然后将标签知识集成到文本表示中,并使用精心设计的语义融合模块进行文本表示。我们在三个典型的跨度提取任务中进行广泛的实验:扁平的网,嵌套网和事件检测。实证结果表明,我们的方法在四个基准测试中实现了最先进的性能,而且分别将培训时间和推理时间降低76%和77%,与QA形式化范例相比。我们的代码和数据可在https://github.com/apkepers/lear中获得。
translated by 谷歌翻译
指定的实体识别任务是信息提取的核心任务之一。单词歧义和单词缩写是命名实体低识别率的重要原因。在本文中,我们提出了一种名为“实体识别模型WCL-BBCD”(与Bert-Bilstm-Crf-Dbpedia的单词对比学习),结合了对比度学习的概念。该模型首先在文本中训练句子对,计算句子对通过余弦的相似性中的单词对之间的相似性,以及通过相似性通过相似性来命名实体识别任务的BERT模型,以减轻单词歧义。然后,将微调的BERT模型与Bilstm-CRF模型相结合,以执行指定的实体识别任务。最后,将识别结果与先验知识(例如知识图)结合使用,以减轻单词缩写引起的低速问题的识别。实验结果表明,我们的模型在Conll-2003英语数据集和Ontonotes V5英语数据集上优于其他类似的模型方法。
translated by 谷歌翻译
到目前为止,命名实体识别(ner)已经参与了三种主要类型,包括平面,重叠(嵌套)和不连续的ner,主要是单独研究。最近,为统一的人员建立了一个日益增长的兴趣,并与一个单一模型同时解决上述三个工作。当前最佳性能的方法主要包括基于跨度和序列到序列的模型,不幸的是,前者仅关注边界识别,后者可能遭受暴露偏差。在这项工作中,我们通过将统一的ner建模为Word-Word关系分类来提出一种小说替代方案,即W ^ 2ner。通过有效地建模具有下面邻近字(NNW)和尾页字 - *(THW- *)关系的实体单词之间的邻近关系来解决统一网内的内核瓶颈。基于W ^ 2ner方案,我们开发了一个神经框架,其中统一的网格被建模为单词对的2D网格。然后,我们提出了多粒度的2D卷积,以便更好地精炼网格表示。最后,共同预测器用于足够原因的单词关系。我们对14个广泛使用的基准数据集进行了广泛的实验,用于平板,重叠和不连续的NER(8英语和6个中文数据集),我们的型号击败了所有当前的顶级表演基线,推动了最先进的表演统一的网。
translated by 谷歌翻译
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequence as input and output some good results by fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic aspect of text (e.g., coherence) and sentence-level structures. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. Inspired by the distinctiveness and permanence properties of linguistic feature, we represent text as a coherence graph to capture its entity consistency, which is further encoded by the pretrained model and graph neural network. To tackle the challenges of data limitations, we employ a contrastive learning framework and propose an improved contrastive loss for making full use of hard negative samples in training stage. The experiment results on two public datasets prove our approach outperforms the state-of-art methods significantly.
translated by 谷歌翻译
对比学习被出现为强大的代表学习方法,促进各种下游任务,特别是当监督数据有限时。如何通过数据增强构建有效的对比样本是其成功的关键。与视觉任务不同,语言任务中尚未对对比学习进行对比学习的数据增强方法。在本文中,我们提出了一种使用文本摘要构建语言任务的对比样本的新方法。我们使用这些样本进行监督的对比学习,以获得更好的文本表示,这极大地利用了具有有限注释的文本分类任务。为了进一步改进该方法,除了交叉熵损失之外,我们将从不同类中的样本混合并添加一个名为MIXSUM的额外正则化。真实世界文本分类数据集(Amazon-5,Yelp-5,AG新闻和IMDB)的实验展示了基于摘要的数据增强和MIXSUM正规化的提议对比学习框架的有效性。
translated by 谷歌翻译
The typical way for relation extraction is fine-tuning large pre-trained language models on task-specific datasets, then selecting the label with the highest probability of the output distribution as the final prediction. However, the usage of the Top-k prediction set for a given sample is commonly overlooked. In this paper, we first reveal that the Top-k prediction set of a given sample contains useful information for predicting the correct label. To effectively utilizes the Top-k prediction set, we propose Label Graph Network with Top-k Prediction Set, termed as KLG. Specifically, for a given sample, we build a label graph to review candidate labels in the Top-k prediction set and learn the connections between them. We also design a dynamic $k$-selection mechanism to learn more powerful and discriminative relation representation. Our experiments show that KLG achieves the best performances on three relation extraction datasets. Moreover, we observe that KLG is more effective in dealing with long-tailed classes.
translated by 谷歌翻译
链接的语音实体旨在识别和消除语言中的命名实体。常规方法严重遭受了不受限制的语音样式和ASR系统产生的嘈杂笔录。在本文中,我们提出了一种名为“知识增强命名实体识别”(KENER)的新颖方法,该方法致力于通过在实体识别阶段无痛地纳入适当的知识来改善鲁棒性,从而改善实体联系的整体性能。肯纳(Kener)首先检索未提及的句子的候选实体,然后利用实体描述作为额外的信息来帮助识别提及。当输入短或嘈杂时,由密集检索模块检索的候选实体特别有用。此外,我们研究了各种数据采样策略和设计有效的损失功能,以提高识别和歧义阶段中检索实体的质量。最后,将与过滤模块的链接作为最终保障措施应用,从而可以过滤出错误认可的提及。我们的系统在NLPCC-2022共享任务2的轨道1中获得第一名,并在轨道1中获得第一名。
translated by 谷歌翻译
Temporal knowledge graph, serving as an effective way to store and model dynamic relations, shows promising prospects in event forecasting. However, most temporal knowledge graph reasoning methods are highly dependent on the recurrence or periodicity of events, which brings challenges to inferring future events related to entities that lack historical interaction. In fact, the current moment is often the combined effect of a small part of historical information and those unobserved underlying factors. To this end, we propose a new event forecasting model called Contrastive Event Network (CENET), based on a novel training framework of historical contrastive learning. CENET learns both the historical and non-historical dependency to distinguish the most potential entities that can best match the given query. Simultaneously, it trains representations of queries to investigate whether the current moment depends more on historical or non-historical events by launching contrastive learning. The representations further help train a binary classifier whose output is a boolean mask to indicate related entities in the search space. During the inference process, CENET employs a mask-based strategy to generate the final results. We evaluate our proposed model on five benchmark graphs. The results demonstrate that CENET significantly outperforms all existing methods in most metrics, achieving at least $8.3\%$ relative improvement of Hits@1 over previous state-of-the-art baselines on event-based datasets.
translated by 谷歌翻译
事件提取(EE)是信息提取的重要任务,该任务旨在从非结构化文本中提取结构化事件信息。大多数先前的工作都专注于提取平坦的事件,同时忽略重叠或嵌套的事件。多个重叠和嵌套EE的模型包括几个连续的阶段来提取事件触发器和参数,这些阶段患有错误传播。因此,我们设计了一种简单而有效的标记方案和模型,以将EE作为单词关系识别,称为oneee。触发器或参数单词之间的关系在一个阶段同时识别出并行网格标记,从而产生非常快的事件提取速度。该模型配备了自适应事件融合模块,以生成事件感知表示表示和距离感知的预测指标,以整合单词关系识别的相对距离信息,从经验上证明这是有效的机制。对3个重叠和嵌套的EE基准测试的实验,即少数FC,GENIA11和GENIA13,表明Oneee实现了最新的(SOTA)结果。此外,ONEEE的推理速度比相同条件下的基线的推理速度快,并且由于它支持平行推断,因此可以进一步改善。
translated by 谷歌翻译
命名实体识别(NER)任务旨在识别属于人,位置,组织等预定语义类型的文本中的实体。平面实体的最新解决方案NER通常因捕获捕获基础文本中的细粒语义信息。现有的基于跨度的方法克服了这一限制,但是计算时间仍然是一个问题。在这项工作中,我们提出了一个基于跨度的新型NER框架,即全球指针(GP),该框架通过乘法注意机制来利用相对位置。最终目标是实现一个全球观点,以考虑开始和最终位置以预测实体。为此,我们设计了两个模块来识别给定实体的头部和尾部,以使训练和推理过程之间的不一致。此外,我们引入了一种新型的分类损失函数,以解决不平衡标签问题。在参数方面,我们引入了一种简单但有效的近似方法来减少训练参数。我们在各种基准数据集上广泛评估GP。我们的广泛实验表明,GP可以胜过现有的解决方案。此外,实验结果表明,与软马克斯和熵替代方案相比,引入的损失函数的功效。
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
Recent advances in Named Entity Recognition (NER) show that document-level contexts can significantly improve model performance. In many application scenarios, however, such contexts are not available. In this paper, we propose to find external contexts of a sentence by retrieving and selecting a set of semantically relevant texts through a search engine, with the original sentence as the query. We find empirically that the contextual representations computed on the retrieval-based input view, constructed through the concatenation of a sentence and its external contexts, can achieve significantly improved performance compared to the original input view based only on the sentence. Furthermore, we can improve the model performance of both input views by Cooperative Learning, a training method that encourages the two input views to produce similar contextual representations or output label distributions. Experiments show that our approach can achieve new state-of-the-art performance on 8 NER data sets across 5 domains.
translated by 谷歌翻译
关系提取(RE)是自然语言处理的基本任务。RE试图通过识别文本中的实体对之间的关系信息来将原始的,非结构化的文本转变为结构化知识。RE有许多用途,例如知识图完成,文本摘要,提问和搜索查询。RE方法的历史可以分为四个阶段:基于模式的RE,基于统计的RE,基于神经的RE和大型语言模型的RE。这项调查始于对RE的早期阶段的一些示例性作品的概述,突出了局限性和缺点,以使进度相关。接下来,我们回顾流行的基准测试,并严格检查用于评估RE性能的指标。然后,我们讨论遥远的监督,这是塑造现代RE方法发展的范式。最后,我们回顾了重点是降级和培训方法的最新工作。
translated by 谷歌翻译
放射学报告是非结构化的,并包含由放射科医生转录的成像发现和相应的诊断,包括临床事实和否定和/或不确定的陈述。从放射学报告中提取病理发现和诊断对于质量控制,人口健康和监测疾病进展至关重要。现有的作品,主要依赖于基于规则的系统或基于变压器的预训练模型微调,但不能考虑事实和不确定的信息,因此产生假阳性输出。在这项工作中,我们介绍了三种宗旨的增强技术,在产生了对比学习的增强时保留了事实和关键信息。我们介绍了Radbert-Cl,通过自我监督的对比损失将这些信息融入蓝莓。我们对MIMIC-CXR的实验显示了RADBERT-CL在多级多标签报告分类的微调上的卓越性能。我们说明,当有很少有标记的数据时,Radbert-Cl以常规的SOTA变压器(BERT / Bluebert)优于更大的边缘(6-11%)。我们还表明,Radbert-CL学习的表示可以在潜伏空间中捕获关键的医疗信息。
translated by 谷歌翻译