以查询为中心的摘要(QFS)旨在产生应答感兴趣的特定问题的摘要,从而实现更大的用户控制和个性化。虽然最近发布的数据集如QMSUM或Aquamuse,促进QFS中的研究工作,但该领域缺乏对适用建模方法的广泛空间的全面研究。在本文中,考虑到两种普遍的方法,我们对QFS进行了系统探索,探讨了QFS:两阶段的采掘解决方案和端到端模型。在这些类别中,我们调查现有方法,并呈现了在QMSUM数据集上实现最先进的性能的两个模型扩展,其边缘高达3.38 Rouge-1,3.72 Rouge-2和3.28 Rouge-L。通过定量实验,我们突出了不同模型配置之间的权衡,并探讨了摘要任务之间的转移能力。代码和检查点公开可用:https://github.com/salesforce/query-focused-sum。
translated by 谷歌翻译
查询聚焦的文本摘要(QFTS)任务旨在构建基于给定查询的文本文档摘要的构建系统。解决此任务的关键挑战是缺乏培训摘要模型的大量标记数据。在本文中,我们通过探索一系列域适应技术来解决这一挑战。鉴于最近在广泛的自然语言处理任务中进行预先接受的变压器模型的成功,我们利用此类模型为单文档和多文件方案的QFTS任务产生抽象摘要。对于域适应,我们使用预先训练的变压器的摘要模型应用了各种技术,包括转移学习,弱监督学习和远程监督。六个数据集的广泛实验表明,我们所提出的方法非常有效地为QFTS任务产生抽象摘要,同时在一组自动和人类评估指标上设置新的最先进的结果。
translated by 谷歌翻译
诸如学术文章和商业报告之类的长期文件一直是详细说明重要问题和需要额外关注的复杂主题的标准格式。自动汇总系统可以有效地将长文档置于简短而简洁的文本中,以封装最重要的信息,从而在帮助读者的理解中很重要。最近,随着神经体系结构的出现,已经做出了重大的研究工作,以推动自动文本摘要系统,以及有关将这些系统扩展到长期文档领域的挑战的大量研究。在这项调查中,我们提供了有关长期文档摘要的研究的全面概述,以及其研究环境的三个主要组成部分的系统评估:基准数据集,汇总模型和评估指标。对于每个组成部分,我们在长期汇总的背景下组织文献,并进行经验分析,以扩大有关当前研究进度的观点。实证分析包括一项研究基准数据集的内在特征,摘要模型的多维分析以及摘要评估指标的综述。根据总体发现,我们通过提出可能在这个快速增长的领域中提出未来探索的方向来得出结论。
translated by 谷歌翻译
Multi-document summarization (MDS) has traditionally been studied assuming a set of ground-truth topic-related input documents is provided. In practice, the input document set is unlikely to be available a priori and would need to be retrieved based on an information need, a setting we call open-domain MDS. We experiment with current state-of-the-art retrieval and summarization models on several popular MDS datasets extended to the open-domain setting. We find that existing summarizers suffer large reductions in performance when applied as-is to this more realistic task, though training summarizers with retrieved inputs can reduce their sensitivity retrieval errors. To further probe these findings, we conduct perturbation experiments on summarizer inputs to study the impact of different types of document retrieval errors. Based on our results, we provide practical guidelines to help facilitate a shift to open-domain MDS. We release our code and experimental results alongside all data or model artifacts created during our investigation.
translated by 谷歌翻译
Information overloading requires the need for summarizers to extract salient information from the text. Currently, there is an overload of dialogue data due to the rise of virtual communication platforms. The rise of Covid-19 has led people to rely on online communication platforms like Zoom, Slack, Microsoft Teams, Discord, etc. to conduct their company meetings. Instead of going through the entire meeting transcripts, people can use meeting summarizers to select useful data. Nevertheless, there is a lack of comprehensive surveys in the field of meeting summarizers. In this survey, we aim to cover recent meeting summarization techniques. Our survey offers a general overview of text summarization along with datasets and evaluation metrics for meeting summarization. We also provide the performance of each summarizer on a leaderboard. We conclude our survey with different challenges in this domain and potential research opportunities for future researchers.
translated by 谷歌翻译
Large pre-trained language models have recently enabled open-ended generation frameworks (e.g., prompt-to-text NLG) to tackle a variety of tasks going beyond the traditional data-to-text generation. While this framework is more general, it is under-specified and often leads to a lack of controllability restricting their real-world usage. We propose a new grounded keys-to-text generation task: the task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. To address this task, we introduce a new dataset, called EntDeGen. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions. Our EntDescriptor model is equipped with strong rankers to fetch helpful passages and generate entity descriptions. Experimental result shows a good correlation (60.14) between our proposed metric and human judgments of factuality. Our rankers significantly improved the factual correctness of generated descriptions (15.95% and 34.51% relative gains in recall and precision). Finally, our ablation study highlights the benefit of combining keys and groundings.
translated by 谷歌翻译
我们提出了一项实证研究,以适应现有的经过验证的文本对文本模型,以备长期输入。通过沿预训练管道的三个轴的全面研究 - 模型架构,优化目标和训练式语料库,我们提出了一种有效的食谱,以从现有的短篇小说模型中构建长篇小说模型。具体而言,我们用汇总仪的块关注替换了变压器中的全部注意力,并使用蒙版的跨度预测任务为模型预算,长度不同。就训练训练的语料库而言,我们发现,与使用通常在其域覆盖范围中通常受到限制的现有长文档语料库相比,使用大型开放域语料库的随机串联的短篇小说可以提高性能。通过这些发现,我们建立了一个长篇文本模型,该模型可以在长篇文本质量检查任务上实现竞争性能,并在五个长文本摘要数据集上建立新的最新技术,通常优于先前的方法,具有较大的模型大小。
translated by 谷歌翻译
Bidirectional Encoder Representations from Transformers (BERT; Devlin et al. 2019) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several intersentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves stateof-the-art results across the board in both extractive and abstractive settings. 1
translated by 谷歌翻译
In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries. In this paper, we introduce Socratic pretraining, a question-driven, unsupervised pretraining objective specifically designed to improve controllability in summarization tasks. By training a model to generate and answer relevant questions in a given context, Socratic pretraining enables the model to more effectively adhere to user-provided queries and identify relevant content to be summarized. We demonstrate the effectiveness of this approach through extensive experimentation on two summarization domains, short stories and dialogue, and multiple control strategies: keywords, questions, and factoid QA pairs. Our pretraining method relies only on unlabeled documents and a question generation system and outperforms pre-finetuning approaches that use additional supervised data. Furthermore, our results show that Socratic pretraining cuts task-specific labeled data requirements in half, is more faithful to user-provided queries, and achieves state-of-the-art performance on QMSum and SQuALITY.
translated by 谷歌翻译
对话是人类沟通与合作的重要组成部分。现有研究主要关注一对一时尚的短对话情景。然而,现实世界中的多人互动,例如会议或访谈,经常超过几千个字。仍然缺乏相应的研究和强大的工具来了解和处理这么长的对话。因此,在这项工作中,我们为长时间对话理解和总结提供了预先培训框架。考虑到长期交谈的性质,我们提出了一种基于窗口的去噪方法,用于生成预训练。对于对话框,它损坏了一个带有对话激发灵感噪声的文本窗口,并指导模型基于剩余对话的内容来重建此窗口。此外,为了更长的输入,我们增加了稀疏关注模型,这些模型以混合方式与传统的关注相结合。我们在长对话的五个数据集进行广泛的实验,涵盖对话摘要的任务,抽象问题回答和主题分割。实验,我们表明,我们的预先训练的模型DialogLM显着超越了数据集和任务的最先进的模型。我们的GitHub存储库(HTTPS:/github.com/microsoft/dialoglm上有源代码和所有预先训练的型号。
translated by 谷歌翻译
侧重于查询的摘要(QFS)需要生成使用一组相关文档的查询给出文本摘要。但是,在实践中,此类相关文件不易获得,但应首先从文档收集中检索。因此,我们展示了如何扩展此任务以使其更加逼真。因此,任务设置也类似于开放式域问题应答任务的设置,其中答案是顶部检索到的文档的摘要。要解决此扩展任务,我们将通过文本生成组合通过文本生成来产生给定输入查询的检索段落的摘要。我们展示了第一个对拟议任务的评估结果,并表明一些样本足以通过检索的通道进行微调的大型生成模型。
translated by 谷歌翻译
这些日子,自动会议总结变得越来越受欢迎。能够自动总结会议和提取关键信息的能力可以大大提高我们工作和生活的效率。在本文中,我们试验不同的方法来提高基于查询的会议概述的性能。我们从HMNET \ CITE {HMNET}开始了一个分层网络,该网络采用单词级变压器和转动级变压器,作为基线。我们探讨使用大型新闻摘要数据集进行预培训模型的有效性。我们调查将查询的嵌入品作为输入向量的一部分添加为基于查询的摘要。此外,我们使用中间聚类步骤扩展了QMSUM \ CITE {QMSUM}的定位 - 然后总结方法。最后,我们将基线模型与BART进行比较,这是一个有效的总结的最先进的语言模型。我们通过将查询嵌入物添加到模型的输入,通过使用BART作为替代语言模型来实现改进的性能,并且通过使用聚类方法在将文本送入摘要模型之前在话语级别提取关键信息。
translated by 谷歌翻译
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by ( 1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new stateof-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.
translated by 谷歌翻译
信息检索的任务是许多自然语言处理系统的重要组成部分,例如开放式域问题回答。尽管传统方法是基于手工制作的功能,但基于神经网络的连续表示最近获得了竞争结果。使用此类方法的一个挑战是获取监督数据以训练回猎犬模型,该模型对应于一对查询和支持文档。在本文中,我们提出了一种技术,以学习以知识蒸馏的启发,并不需要带注释的查询和文档对。我们的方法利用读者模型的注意分数,用于根据检索文档解决任务,以获取猎犬的合成标签。我们评估我们的方法回答,获得最新结果。
translated by 谷歌翻译
尽管经过验证的大型变压器模型已被证明具有很高的能力解决自然语言任务,但处理长序列输入仍然是一个重大挑战。这样的任务之一就是长输入摘要,其中输入比大多数预验证的模型的最大输入上下文更长。通过一系列广泛的实验,我们研究了哪些模型架构变化和预处理范式可以最有效地适应经过预定的变压器以进行长输入摘要。我们发现,带有全局编码器代币的交错,块状变压器可以达到良好的性能和效率平衡,并且在长序列上有意义地改善了下游摘要性能。根据我们的发现,我们介绍了Pegasus-X,这是Pegasus模型的扩展,并具有额外的长输入预处理,以处理最多16K令牌的输入。 Pegasus-X在长输入摘要任务上实现了强劲的性能,与更大的模型相当,同时添加了很少的其他参数,并且不需要模型并行训练。
translated by 谷歌翻译
This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-ofthe-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. * Equal contribution. † Contact person.
translated by 谷歌翻译
在本文中,我们建议利用对话的独特特征,共享参与者的常识性知识,以解决总结它们的困难。我们提出了病态的框架,该框架使用常识推论作为其他背景。与以前仅依赖于输入对话的工作相比,Sick使用外部知识模型来生成丰富的常识推断,并选择具有基于相似性选择方法的最可能的推理。基于生病的,病人++的理解为监督,在总结多任务学习环境中的对话时,添加了产生常识推断的任务。实验结果表明,通过注入常识性知识,我们的框架比现有方法产生更多信息和一致的摘要。
translated by 谷歌翻译
现有以查询为中心的摘要数据集的大小有限,使培训数据驱动的摘要模型提出了挑战。同时,以查询为重点的摘要语料库的手动构造昂贵且耗时。在本文中,我们使用Wikipedia自动收集超过280,000个示例的大型以查询为中心的摘要数据集(名为Wikiref),这可以用作数据增强的手段。我们还开发了一个基于BERT的以查询为重点的摘要模型(Q-bert),以从文档中提取句子作为摘要。为了更好地调整包含数百万个参数的巨大模型,我们仅识别和微调一个稀疏的子网络,这对应于整个模型参数的一小部分。三个DUC基准测试的实验结果表明,在Wikiref中预先培训的模型已经达到了合理的性能。在对特定基准数据集进行了微调后,具有数据增强的模型优于强大比较系统。此外,我们提出的Q-Bert模型和子网微调都进一步改善了模型性能。该数据集可在https://aka.ms/wikiref上公开获取。
translated by 谷歌翻译
实际一致性是实际设置中文本摘要模型的基本质量。在评估此维度的现有工作可以大致分为两行研究,基于征收的指标和问题应答(QA)的指标。然而,最近作品中提出的不同的实验设置导致对比的结论是哪个范例表现最佳。在这项工作中,我们进行了广泛的征集和基于QA的指标的比较,致力于仔细选择基于QA的度量的组件对于性能至关重要。在那些见解中,我们提出了一个优化的公制,我们称之为QAFacteval,这导致了对夏季事实一致性基准的基于QA的度量标准的平均平均平均改进。我们的解决方案提高了基于最佳的基于范围的公制,并在该基准测试中实现了最先进的性能。此外,我们发现基于QA和基于征求的度量提供了互补信号,并将两者组合成单个学习的度量,以进一步提升。通过定性和定量分析,我们将问题生成和可应答性分类视为基于QA的度量的未来工作的两个关键组成部分。
translated by 谷歌翻译
Text summarization is a user-preference based task, i.e., for one document, users often have different priorities for summary. As a key aspect of customization in summarization, granularity is used to measure the semantic coverage between the summary and source document. However, developing systems that can generate summaries with customizable semantic coverage is still an under-explored topic. In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum. We take events as the basic semantic units of the source documents and propose to rank these events by their salience. We also develop a model to summarize input documents with given events as anchors and hints. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner. Meanwhile, we annotate a new benchmark GranuDUC that contains multiple summaries at different granularities for each document cluster. Experimental results confirm the substantial superiority of GranuSum on multi-granularity summarization over strong baselines. Further, by exploiting the event information, GranuSum also exhibits state-of-the-art performance under the conventional unsupervised abstractive setting. Dataset for this paper can be found at: https://github.com/maszhongming/GranuDUC
translated by 谷歌翻译