Bidirectional Encoder Representations from Transformers (BERT; Devlin et al. 2019) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several intersentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves stateof-the-art results across the board in both extractive and abstractive settings. 1
translated by 谷歌翻译
We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question "What is the article about?". We collect a real-world, large scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures longrange dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans. 1
translated by 谷歌翻译
对比学习模型在无监督的视觉表示学习中取得了巨大成功,这使得相同图像的不同视图的特征表示之间的相似性最大化,同时最小化不同图像的视图的特征表示之间的相似性。在文本摘要中,输出摘要是输入文档的较短形式,它们具有类似的含义。在本文中,我们提出了对监督抽象文本摘要的对比学习模型,在那里我们查看文档,它的金摘要及其模型生成的摘要,与相同的平均表示的不同视图,并在培训期间最大化它们之间的相似性。我们在三个不同的摘要数据集上改进了一个强序列到序列文本生成模型(即,BART)。人类评估还表明,与其对应物相比,我们的模型达到了更好的忠实性评级,没有对比的目标。
translated by 谷歌翻译
查询聚焦的文本摘要(QFTS)任务旨在构建基于给定查询的文本文档摘要的构建系统。解决此任务的关键挑战是缺乏培训摘要模型的大量标记数据。在本文中,我们通过探索一系列域适应技术来解决这一挑战。鉴于最近在广泛的自然语言处理任务中进行预先接受的变压器模型的成功,我们利用此类模型为单文档和多文件方案的QFTS任务产生抽象摘要。对于域适应,我们使用预先训练的变压器的摘要模型应用了各种技术,包括转移学习,弱监督学习和远程监督。六个数据集的广泛实验表明,我们所提出的方法非常有效地为QFTS任务产生抽象摘要,同时在一组自动和人类评估指标上设置新的最先进的结果。
translated by 谷歌翻译
诸如学术文章和商业报告之类的长期文件一直是详细说明重要问题和需要额外关注的复杂主题的标准格式。自动汇总系统可以有效地将长文档置于简短而简洁的文本中,以封装最重要的信息,从而在帮助读者的理解中很重要。最近,随着神经体系结构的出现,已经做出了重大的研究工作,以推动自动文本摘要系统,以及有关将这些系统扩展到长期文档领域的挑战的大量研究。在这项调查中,我们提供了有关长期文档摘要的研究的全面概述,以及其研究环境的三个主要组成部分的系统评估:基准数据集,汇总模型和评估指标。对于每个组成部分,我们在长期汇总的背景下组织文献,并进行经验分析,以扩大有关当前研究进度的观点。实证分析包括一项研究基准数据集的内在特征,摘要模型的多维分析以及摘要评估指标的综述。根据总体发现,我们通过提出可能在这个快速增长的领域中提出未来探索的方向来得出结论。
translated by 谷歌翻译
This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-ofthe-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. * Equal contribution. † Contact person.
translated by 谷歌翻译
现有以查询为中心的摘要数据集的大小有限,使培训数据驱动的摘要模型提出了挑战。同时,以查询为重点的摘要语料库的手动构造昂贵且耗时。在本文中,我们使用Wikipedia自动收集超过280,000个示例的大型以查询为中心的摘要数据集(名为Wikiref),这可以用作数据增强的手段。我们还开发了一个基于BERT的以查询为重点的摘要模型(Q-bert),以从文档中提取句子作为摘要。为了更好地调整包含数百万个参数的巨大模型,我们仅识别和微调一个稀疏的子网络,这对应于整个模型参数的一小部分。三个DUC基准测试的实验结果表明,在Wikiref中预先培训的模型已经达到了合理的性能。在对特定基准数据集进行了微调后,具有数据增强的模型优于强大比较系统。此外,我们提出的Q-Bert模型和子网微调都进一步改善了模型性能。该数据集可在https://aka.ms/wikiref上公开获取。
translated by 谷歌翻译
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a;Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
translated by 谷歌翻译
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by ( 1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new stateof-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.
translated by 谷歌翻译
与单案摘要相比,抽象性多文件摘要(MDS)对其冗长和链接的来源的表示和覆盖范围提出了挑战。这项研究开发了一个平行的层次变压器(PHT),具有MDS的注意对齐。通过合并单词和段落级的多头注意,PHT的层次结构可以更好地处理令牌和文档级别的依赖项。为了指导解码到更好的源文档覆盖范围,然后将注意力调整机制引入以校准光束搜索,并预测的最佳注意力分布。根据Wikisum数据,进行了全面的评估,以测试拟议的体系结构对MD的改进。通过更好地处理内部和跨文档的信息,结果胭脂和人类评估都表明,我们的分层模型以相对较低的计算成本生成较高质量的摘要。
translated by 谷歌翻译
传达相关和忠实信息的能力对于有条件生成的许多任务至关重要,但对于神经SEQ-seq seq模型仍然难以捉摸,这些模型的输出通常显示出幻觉,并且无法正确涵盖重要细节。在这项工作中,我们主张规划作为有用的中间表示,以使有条件的一代减少不透明和扎根。我们的作品提出了将文本计划作为一系列提问(QA)对的新概念化。我们用QA蓝图作为内容选择(即〜说什么)和计划(即〜按什么顺序)来增强现有数据集(例如,用于摘要)。我们通过利用最先进的问题生成技术并将输入输出对自动获取蓝图,并将其转换为输入 - 蓝图输出输出元组。我们开发了基于变压器的模型,每个模型都在它们如何将蓝图合并到生成的输出中(例如,作为全局计划或迭代)。跨指标和数据集的评估表明,蓝图模型比不采取计划并允许对生成输出进行更严格控制的替代方案更为事实。
translated by 谷歌翻译
维基百科是可理解知识的重要自由来源。尽管如此,巴西葡萄牙维基百科仍然缺乏对许多科目的描述。为了扩大巴西维基百科,我们贡献了Plsum,这是一种从多个描述性网站生成类似的Wiki的抽象摘要的框架。该框架具有提取阶段,然后是抽象。特别是,对于抽象阶段,我们微调并比较了变压器神经网络,PTT5和啰覆的最近最近的变化。为了微调和评估模型,我们创建了一个具有数千个示例的数据集,将参考网站链接到维基百科。我们的结果表明,可以从巴西葡萄牙语网上内容生成有意义的抽象摘要。
translated by 谷歌翻译
学术研究是解决以前从未解决过的问题的探索活动。通过这种性质,每个学术研究工作都需要进行文献审查,以区分其Novelties尚未通过事先作品解决。在自然语言处理中,该文献综述通常在“相关工作”部分下进行。鉴于研究文件的其余部分和引用的论文列表,自动相关工作生成的任务旨在自动生成“相关工作”部分。虽然这项任务是在10年前提出的,但直到最近,它被认为是作为科学多文件摘要问题的变种。然而,即使在今天,尚未标准化了自动相关工作和引用文本生成的问题。在这项调查中,我们进行了一个元研究,从问题制定,数据集收集,方法方法,绩效评估和未来前景的角度来比较相关工作的现有文献,以便为读者洞察到国家的进步 - 最内容的研究,以及如何进行未来的研究。我们还调查了我们建议未来工作要考虑整合的相关研究领域。
translated by 谷歌翻译
以查询为中心的摘要(QFS)旨在产生应答感兴趣的特定问题的摘要,从而实现更大的用户控制和个性化。虽然最近发布的数据集如QMSUM或Aquamuse,促进QFS中的研究工作,但该领域缺乏对适用建模方法的广泛空间的全面研究。在本文中,考虑到两种普遍的方法,我们对QFS进行了系统探索,探讨了QFS:两阶段的采掘解决方案和端到端模型。在这些类别中,我们调查现有方法,并呈现了在QMSUM数据集上实现最先进的性能的两个模型扩展,其边缘高达3.38 Rouge-1,3.72 Rouge-2和3.28 Rouge-L。通过定量实验,我们突出了不同模型配置之间的权衡,并探讨了摘要任务之间的转移能力。代码和检查点公开可用:https://github.com/salesforce/query-focused-sum。
translated by 谷歌翻译
文本摘要的重写方法结合了提取性和抽象的方法,使用抽象模型提高了提取性摘要的简洁性和可读性。退出重写系统将每个提取性句子作为唯一的输入,它相对集中,但可能会失去必要的背景知识和话语上下文。在本文中,我们调查了上下文化的重写,该重写消耗了整个文档并考虑了摘要上下文。我们将上下文重写正式化为具有组标签对齐的SEQ2SEQ,将组标签引入了模拟对齐方式的解决方案,并通过基于内容的地址来识别提取句子。结果表明,我们的方法显着优于非上下文重写系统,而无需加强学习,从而在多个提取器上实现了胭脂分数的强烈改进。
translated by 谷歌翻译
Modern multi-document summarization (MDS) methods are based on transformer architectures. They generate state of the art summaries, but lack explainability. We focus on graph-based transformer models for MDS as they gained recent popularity. We aim to improve the explainability of the graph-based MDS by analyzing their attention weights. In a graph-based MDS such as GraphSum, vertices represent the textual units, while the edges form some similarity graph over the units. We compare GraphSum's performance utilizing different textual units, i. e., sentences versus paragraphs, on two news benchmark datasets, namely WikiSum and MultiNews. Our experiments show that paragraph-level representations provide the best summarization performance. Thus, we subsequently focus oAnalysisn analyzing the paragraph-level attention weights of GraphSum's multi-heads and decoding layers in order to improve the explainability of a transformer-based MDS model. As a reference metric, we calculate the ROUGE scores between the input paragraphs and each sentence in the generated summary, which indicate source origin information via text similarity. We observe a high correlation between the attention weights and this reference metric, especially on the the later decoding layers of the transformer architecture. Finally, we investigate if the generated summaries follow a pattern of positional bias by extracting which paragraph provided the most information for each generated summary. Our results show that there is a high correlation between the position in the summary and the source origin.
translated by 谷歌翻译
多文件摘要(MDS)是信息聚合的有效工具,它从与主题相关文档集群生成信息和简洁的摘要。我们的调查是,首先,系统地概述了最近的基于深度学习的MDS模型。我们提出了一种新的分类学,总结神经网络的设计策略,并进行全面的最先进的概要。我们突出了在现有文献中很少讨论的各种客观函数之间的差异。最后,我们提出了与这个新的和令人兴奋的领域有关的几个方向。
translated by 谷歌翻译
Though many algorithms can be used to automatically summarize legal case decisions, most fail to incorporate domain knowledge about how important sentences in a legal decision relate to a representation of its document structure. For example, analysis of a legal case summarization dataset demonstrates that sentences serving different types of argumentative roles in the decision appear in different sections of the document. In this work, we propose an unsupervised graph-based ranking model that uses a reweighting algorithm to exploit properties of the document structure of legal case decisions. We also explore the impact of using different methods to compute the document structure. Results on the Canadian Legal Case Law dataset show that our proposed method outperforms several strong baselines.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Current abstractive summarization systems present important weaknesses which prevent their deployment in real-world applications, such as the omission of relevant information and the generation of factual inconsistencies (also known as hallucinations). At the same time, automatic evaluation metrics such as CTC scores have been recently proposed that exhibit a higher correlation with human judgments than traditional lexical-overlap metrics such as ROUGE. In this work, we intend to close the loop by leveraging the recent advances in summarization metrics to create quality-aware abstractive summarizers. Namely, we propose an energy-based model that learns to re-rank summaries according to one or a combination of these metrics. We experiment using several metrics to train our energy-based re-ranker and show that it consistently improves the scores achieved by the predicted summaries. Nonetheless, human evaluation results show that the re-ranking approach should be used with care for highly abstractive summaries, as the available metrics are not yet sufficiently reliable for this purpose.
translated by 谷歌翻译