尽管基于术语的方法(例如BM25)在排名方面提供了强大的基准,但在某些条件下,它们由大型预训练的蒙版语言模型(MLMS)(例如BERT)主导。迄今为止,其有效性的来源尚不清楚。他们是通过建模句法方面真正理解含义的能力吗?我们通过以破坏查询和通道的自然序列顺序来操纵输入顺序和位置信息来回答这一点,并表明该模型仍然可以实现可比性的性能。总体而言,我们的结果凸显了句法方面在与BERT重新排行的有效性中没有关键作用。我们指出了其他机制,例如查询通行的交叉注意事项和更丰富的嵌入,这些机制是基于汇总上下文捕获单词含义的,而不管是订单词的主要属性,无论是其出色表现的主要归因。
translated by 谷歌翻译
Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.
translated by 谷歌翻译
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.
translated by 谷歌翻译
近年来,在应用预训练的语言模型(例如Bert)上,取得了巨大进展,以获取信息检索(IR)任务。在网页中通常使用的超链接已被利用用于设计预训练目标。例如,超链接的锚文本已用于模拟查询,从而构建了巨大的查询文档对以进行预训练。但是,作为跨越两个网页的桥梁,尚未完全探索超链接的潜力。在这项工作中,我们专注于建模通过超链接连接的两个文档之间的关系,并为临时检索设计一个新的预训练目标。具体而言,我们将文档之间的关系分为四组:无链接,单向链接,对称链接和最相关的对称链接。通过比较从相邻组采样的两个文档,该模型可以逐渐提高其捕获匹配信号的能力。我们提出了一个渐进的超链接预测({php})框架,以探索预训练中超链接的利用。对两个大规模临时检索数据集和六个提问数据集的实验结果证明了其优于现有的预训练方法。
translated by 谷歌翻译
我们对13个最近的模型进行了全面评估,用于使用两个流行的收藏(MS MARCO文档和Robust04)排名长期文档。我们的模型动物园包括两个专门的变压器模型(例如longformer),它们可以处理长文档而无需分配它们。一路上,我们记录了有关培训和比较此类模型的几个困难。有些令人惊讶的是,我们发现简单的第一个基线(满足典型变压器模型的输入序列约束的截断文档)非常有效。我们分析相关段落的分布(内部文档),以解释这种现象。我们进一步认为,尽管它们广泛使用,但Robust04和MS Marco文档对于基准长期模型并不是特别有用。
translated by 谷歌翻译
在这项工作中,我们提出了一个系统的实证研究,专注于最先进的多语言编码器在跨越多种不同语言对的交叉语言文档和句子检索任务的适用性。我们首先将这些模型视为多语言文本编码器,并在无监督的ad-hoc句子和文档级CLIR中基准性能。与监督语言理解相比,我们的结果表明,对于无监督的文档级CLIR - 一个没有针对IR特定的微调 - 预训练的多语言编码器的相关性判断,平均未能基于CLWE显着优于早期模型。对于句子级检索,我们确实获得了最先进的性能:然而,通过多语言编码器来满足高峰分数,这些编码器已经进一步专注于监督的时尚,以便句子理解任务,而不是使用他们的香草'现货'变体。在这些结果之后,我们介绍了文档级CLIR的本地化相关性匹配,在那里我们独立地对文件部分进行了查询。在第二部分中,我们评估了在一系列零拍语言和域转移CLIR实验中的英语相关数据中进行微调的微调编码器精细调整的微调我们的结果表明,监督重新排名很少提高多语言变压器作为无监督的基数。最后,只有在域名对比度微调(即,同一域名,只有语言转移),我们设法提高排名质量。我们在目标语言中单次检索的交叉定向检索结果和结果(零拍摄)交叉传输之间的显着实证差异,这指出了在单机数据上训练的检索模型的“单声道过度装备”。
translated by 谷歌翻译
We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neural models alone. Our approach exploits this improved performance when training a reranker, leading to a robust reranking model. The reranker, a cross-attention neural model, is shown to be robust to different first-stage retrieval systems, achieving better performance than rerankers simply trained upon the first-stage retrievers in the multi-stage systems. We present evaluations on a supervised passage retrieval task using MS MARCO and zero-shot retrieval tasks using BEIR. The empirical results show strong performance on both evaluations.
translated by 谷歌翻译
Word Embeddings于2013年在2013年宣传了Word2Vec,已成为NLP工程管道的主流。最近,随着BERT的发布,Word Embeddings已经从基于术语的嵌入空间移动到上下文嵌入空间 - 每个术语不再由单个低维向量表示,而是每个术语,而是\ \ EMPH {其上下文}。确定矢量权重。 BERT的设置和架构已被证明足以适用于许多自然语言任务。重要的是,对于信息检索(IR),与IR问题的先前深度学习解决方案相比,需要在神经净架构和培训制度的显着调整,“Vanilla BERT”已被证明以广泛的余量优于现有的检索算法,包括任务在传统的IR基线(如Robust04)上有很长的抵抗检索有效性的Corpora。在本文中,我们采用了最近提出的公理数据集分析技术 - 即,我们创建了每个诊断数据集,每个诊断数据集都满足检索启发式(术语匹配和语义) - 探索BERT能够学习的是什么。与我们的期望相比,我们发现BERT,当应用于最近发布的具有ad-hoc主题的大规模Web语料库时,\ emph {否}遵守任何探索的公理。与此同时,BERT优于传统的查询似然检索模型40 \%。这意味着IR的公理方法(及其为检索启发式创建的诊断数据集的扩展)可能无法适用于大型语料库。额外的 - 需要不同的公理。
translated by 谷歌翻译
已经表明,在一个域上训练的双编码器经常概括到其他域以获取检索任务。一种广泛的信念是,一个双编码器的瓶颈层,其中最终得分仅仅是查询向量和通道向量之间的点产品,它过于局限,使得双编码器是用于域外概括的有效检索模型。在本文中,我们通过缩放双编码器模型的大小{\ em同时保持固定的瓶颈嵌入尺寸固定的瓶颈的大小来挑战这一信念。令人惊讶的是,令人惊讶的是,缩放模型尺寸会对各种缩放提高检索任务,特别是对于域外泛化。实验结果表明,我们的双编码器,\ textbf {g} enovalizable \ textbf {t} eTrievers(gtr),优先级%colbert〜\ cite {khattab2020colbertt}和现有的稀疏和密集的索取Beir DataSet〜\ Cite {Thakur2021Beir}显着显着。最令人惊讶的是,我们的消融研究发现,GTR是非常数据的高效,因为它只需要10 \%MARCO监督数据,以实现最佳域的性能。所有GTR模型都在https://tfhub.dev/google/collections/gtr/1发布。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
问题答案(QA)是自然语言处理中最具挑战性的最具挑战性的问题之一(NLP)。问答(QA)系统试图为给定问题产生答案。这些答案可以从非结构化或结构化文本生成。因此,QA被认为是可以用于评估文本了解系统的重要研究区域。大量的QA研究致力于英语语言,调查最先进的技术和实现最先进的结果。然而,由于阿拉伯QA中的研究努力和缺乏大型基准数据集,在阿拉伯语问答进展中的研究努力得到了很大速度的速度。最近许多预先接受的语言模型在许多阿拉伯语NLP问题中提供了高性能。在这项工作中,我们使用四个阅读理解数据集来评估阿拉伯QA的最先进的接种变压器模型,它是阿拉伯语 - 队,ArcD,AQAD和TYDIQA-GoldP数据集。我们微调并比较了Arabertv2基础模型,ArabertV0.2大型型号和ARAElectra模型的性能。在最后,我们提供了一个分析,了解和解释某些型号获得的低绩效结果。
translated by 谷歌翻译
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot settings, E5 is the first model that outperforms the strong BM25 baseline on the BEIR retrieval benchmark without using any labeled data. When fine-tuned, E5 obtains the best results on the MTEB benchmark, beating existing embedding models with 40x more parameters.
translated by 谷歌翻译
在本文中,我们提出了一个新的密集检索模型,该模型通过深度查询相互作用学习了各种文档表示。我们的模型使用一组生成的伪Queries编码每个文档,以获取查询信息的多视文档表示。它不仅具有较高的推理效率,例如《香草双编码模型》,而且还可以在文档编码中启用深度查询文档的交互,并提供多方面的表示形式,以更好地匹配不同的查询。几个基准的实验证明了所提出的方法的有效性,表现出色的双重编码基准。
translated by 谷歌翻译
我们提出了一种以最小计算成本提高广泛检索模型的性能的框架。它利用由基本密度检索方法提取的预先提取的文档表示,并且涉及训练模型以共同评分每个查询的一组检索到的候选文档,同时在其他候选的上下文中暂时转换每个文档的表示。以及查询本身。当基于其与查询的相似性进行评分文档表示时,该模型因此意识到其“对等”文档的表示。我们表明,我们的方法导致基本方法的检索性能以及彼此隔离的评分候选文档进行了大量改善,如在一对培训环境中。至关重要的是,与基于伯特式编码器的术语交互重型器不同,它在运行时在任何第一阶段方法的顶部引发可忽略不计的计算开销,允许它与任何最先进的密集检索方法容易地结合。最后,同时考虑给定查询的一组候选文档,可以在检索中进行额外的有价值的功能,例如评分校准和减轻排名中的社会偏差。
translated by 谷歌翻译
查询聚焦的文本摘要(QFTS)任务旨在构建基于给定查询的文本文档摘要的构建系统。解决此任务的关键挑战是缺乏培训摘要模型的大量标记数据。在本文中,我们通过探索一系列域适应技术来解决这一挑战。鉴于最近在广泛的自然语言处理任务中进行预先接受的变压器模型的成功,我们利用此类模型为单文档和多文件方案的QFTS任务产生抽象摘要。对于域适应,我们使用预先训练的变压器的摘要模型应用了各种技术,包括转移学习,弱监督学习和远程监督。六个数据集的广泛实验表明,我们所提出的方法非常有效地为QFTS任务产生抽象摘要,同时在一组自动和人类评估指标上设置新的最先进的结果。
translated by 谷歌翻译
Multi-document summarization (MDS) has traditionally been studied assuming a set of ground-truth topic-related input documents is provided. In practice, the input document set is unlikely to be available a priori and would need to be retrieved based on an information need, a setting we call open-domain MDS. We experiment with current state-of-the-art retrieval and summarization models on several popular MDS datasets extended to the open-domain setting. We find that existing summarizers suffer large reductions in performance when applied as-is to this more realistic task, though training summarizers with retrieved inputs can reduce their sensitivity retrieval errors. To further probe these findings, we conduct perturbation experiments on summarizer inputs to study the impact of different types of document retrieval errors. Based on our results, we provide practical guidelines to help facilitate a shift to open-domain MDS. We release our code and experimental results alongside all data or model artifacts created during our investigation.
translated by 谷歌翻译
密集的检索方法可以克服词汇差距并导致显着改善的搜索结果。但是,它们需要大量的培训数据,这些数据不适用于大多数域。如前面的工作所示(Thakur等,2021b),密集检索的性能在域移位下严重降低。这限制了密集检索方法的使用,只有几个具有大型训练数据集的域。在本文中,我们提出了一种新颖的无监督域适配方法生成伪标签(GPL),其将查询发生器与来自跨编码器的伪标记相结合。在六种代表性域专用数据集中,我们发现所提出的GPL可以优于箱子外的最先进的密集检索方法,最高可达8.9点NDCG @ 10。 GPL需要来自目标域的少(未标记)数据,并且在其培训中比以前的方法更强大。我们进一步调查了六种最近训练方法在检索任务的域改编方案中的作用,其中只有三种可能会产生改善的结果。最好的方法,Tsdae(Wang等,2021)可以与GPL结合,在六个任务中产生了1.0点NDCG @ 10的另一个平均改善。
translated by 谷歌翻译
信息检索是自然语言处理中的重要组成部分,用于知识密集型任务,如问题应答和事实检查。最近,信息检索已经看到基于神经网络的密集检索器的出现,作为基于术语频率的典型稀疏方法的替代方案。这些模型在数据集和任务中获得了最先进的结果,其中提供了大型训练集。但是,它们不会很好地转移到没有培训数据的新域或应用程序,并且通常因未经监督的术语 - 频率方法(例如BM25)的术语频率方法而言。因此,自然问题是如果没有监督,是否有可能训练密集的索取。在这项工作中,我们探讨了对比学习的限制,作为培训无人监督的密集检索的一种方式,并表明它导致强烈的检索性能。更确切地说,我们在15个数据集中出现了我们的模型胜过BM25的Beir基准测试。此外,当有几千例的示例可用时,我们显示微调我们的模型,与BM25相比,这些模型导致强大的改进。最后,当在MS-Marco数据集上微调之前用作预训练时,我们的技术在Beir基准上获得最先进的结果。
translated by 谷歌翻译
Medical systematic reviews typically require assessing all the documents retrieved by a search. The reason is two-fold: the task aims for ``total recall''; and documents retrieved using Boolean search are an unordered set, and thus it is unclear how an assessor could examine only a subset. Screening prioritisation is the process of ranking the (unordered) set of retrieved documents, allowing assessors to begin the downstream processes of the systematic review creation earlier, leading to earlier completion of the review, or even avoiding screening documents ranked least relevant. Screening prioritisation requires highly effective ranking methods. Pre-trained language models are state-of-the-art on many IR tasks but have yet to be applied to systematic review screening prioritisation. In this paper, we apply several pre-trained language models to the systematic review document ranking task, both directly and fine-tuned. An empirical analysis compares how effective neural methods compare to traditional methods for this task. We also investigate different types of document representations for neural methods and their impact on ranking performance. Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods. However, BERT rankers and existing methods can actually be complementary, and thus, further improvements may be achieved if used in conjunction.
translated by 谷歌翻译
Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting distributions over vocabulary tokens are intuitive and contain rich semantic information. We find that this view can explain some of the failure cases of dense retrievers. For example, the inability of models to handle tail entities can be explained via a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in out-of-domain settings.
translated by 谷歌翻译