The diverse demands of different summarization tasks and their high annotation costs are driving a need for few-shot summarization. However, despite the emergence of many summarization tasks and datasets, the current training paradigm for few-shot summarization systems ignores potentially shareable knowledge in heterogeneous datasets. To this end, we propose \textsc{UniSumm}, a unified few-shot summarization model pre-trained with multiple summarization tasks and can be prefix-tuned to excel at any few-shot summarization datasets. Meanwhile, to better evaluate few-shot summarization systems, under the principles of diversity and robustness, we assemble and publicize a new benchmark \textsc{SummZoo}. It consists of $8$ diverse summarization tasks with multiple sets of few-shot samples for each task, covering both monologue and dialogue domains. Experimental results and ablation studies show that \textsc{UniSumm} outperforms strong baseline systems by a large margin across all tasks in \textsc{SummZoo} under both automatic and human evaluations. We release our code and benchmark at \url{https://github.com/microsoft/UniSumm}.
translated by 谷歌翻译
预训练的语言模型(PLM)在自然语言生成(NLG)任务中取得了显着的成功。到目前为止,大多数PLM都使用大型一般语料库以无监督的方式进行了预培训。同时,与无监督的模型相比,预先训练的模型越来越多地显示出较低的数据表现出色。受监督预训练的成功的激励,我们提出了自然语言生成的多任务监督预训练(MVP)。为了预先培训文本生成模型MVP,我们从七个生成任务中收集了45个数据集的标记预训练语料库。对于每个任务,我们进一步预先训练特定的软提示,以刺激执行特定任务的模型能力。广泛的实验证明了我们在许多NLG任务中有监督的预训练的有效性,并且我们的一般方法在17个数据集中的12个中实现了最先进的性能。
translated by 谷歌翻译
Query-focused summarization has been considered as an important extension for text summarization. It aims to generate a concise highlight for a given query. Different from text summarization, query-focused summarization has long been plagued by the problem of lacking high-quality large-scale datasets. In this paper, we investigate the idea that whether we can integrate and transfer the knowledge of text summarization and question answering to assist the few-shot learning in query-focused summarization. Here, we propose prefix-merging, a prefix-based pretraining strategy for few-shot learning in query-focused summarization. Drawn inspiration from prefix-tuning, we are allowed to integrate the task knowledge from text summarization and question answering into a properly designed prefix and apply the merged prefix to query-focused summarization. With only a small amount of trainable parameters, prefix-merging outperforms fine-tuning on query-focused summarization. We further discuss the influence of different prefix designs and propose a visualized explanation for how prefix-merging works.
translated by 谷歌翻译
当前有效的微调方法(例如,适配器,前缀调整等)通过培训一小组神经语言模型的额外参数进行优化的条件文本生成,同时冻结其余效率。虽然在某些一代任务中显示出强大表现,但它们不会概括所有一代任务。在这项工作中,我们表明可以提高基于迅速的条件文本生成,简单而有效的方法模拟了人类书面文本的话语结构建模。我们介绍了两个关键设计选择:首先,我们表明人写文本的更高级别的话语结构可以用前缀参数上的\ Textit {分层阻塞}建模,使得能够跨越输入和输出文本的不同部分,并产生更长度的输出几代人。其次,我们通过在网络上的不同层的前缀参数上引入\ texit {注意稀疏性}来提出稀疏的前缀调整,并分别学习SoftMax函数上的稀疏变换。我们发现稀疏的注意力使前缀调整能够更好地控制输入内容(突出事实),从而更有效地调整前缀参数。在各种文本生成任务上的实验表明,前缀参数的结构化设计可以实现可比的结果,以微调所有参数,同时即使在低资源设置中也表现出所有生成任务的标准前缀调整。
translated by 谷歌翻译
在本文中,我们建议利用对话的独特特征,共享参与者的常识性知识,以解决总结它们的困难。我们提出了病态的框架,该框架使用常识推论作为其他背景。与以前仅依赖于输入对话的工作相比,Sick使用外部知识模型来生成丰富的常识推断,并选择具有基于相似性选择方法的最可能的推理。基于生病的,病人++的理解为监督,在总结多任务学习环境中的对话时,添加了产生常识推断的任务。实验结果表明,通过注入常识性知识,我们的框架比现有方法产生更多信息和一致的摘要。
translated by 谷歌翻译
Dialogue summarization has recently garnered significant attention due to its wide range of applications. However, existing methods for summarizing dialogues are suboptimal because they do not take into account the inherent structure of dialogue and rely heavily on labeled data, which can lead to poor performance in new domains. In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain. To pre-train DIONYSUS, we create two pseudo summaries for each dialogue example: one is produced by a fine-tuned summarization model, and the other is a collection of dialogue turns that convey important information. We then choose one of these pseudo summaries based on the difference in information distribution across different types of dialogues. This selected pseudo summary serves as the objective for pre-training DIONYSUS using a self-supervised approach on a large dialogue corpus. Our experiments show that DIONYSUS outperforms existing methods on six datasets, as demonstrated by its ROUGE scores in zero-shot and few-shot settings.
translated by 谷歌翻译
Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum.
translated by 谷歌翻译
抽象性摘要领域的最新进展利用了预训练的语言模型,而不是从头开始训练模型。但是,这样的模型训练和伴随着大量的开销。研究人员提出了一些轻巧的替代方案,例如较小的适配器来减轻缺点。尽管如此,就提高效率而没有绩效不愉快的牺牲,使用使用适配器是否有利于总结的任务。在这项工作中,我们对具有不同复杂性的摘要任务进行了多方面的调查:语言,域和任务转移。在我们的实验中,对预训练的语言模型进行微调通常比使用适配器更好。性能差距与所使用的训练数据量正相关。值得注意的是,在极低的资源条件下,适配器超过微调。我们进一步提供了有关多语言,模型收敛性和鲁棒性的见解,希望能阐明抽象性摘要中微调或适配器的实用选择。
translated by 谷歌翻译
查询聚焦的文本摘要(QFTS)任务旨在构建基于给定查询的文本文档摘要的构建系统。解决此任务的关键挑战是缺乏培训摘要模型的大量标记数据。在本文中,我们通过探索一系列域适应技术来解决这一挑战。鉴于最近在广泛的自然语言处理任务中进行预先接受的变压器模型的成功,我们利用此类模型为单文档和多文件方案的QFTS任务产生抽象摘要。对于域适应,我们使用预先训练的变压器的摘要模型应用了各种技术,包括转移学习,弱监督学习和远程监督。六个数据集的广泛实验表明,我们所提出的方法非常有效地为QFTS任务产生抽象摘要,同时在一组自动和人类评估指标上设置新的最先进的结果。
translated by 谷歌翻译
诸如学术文章和商业报告之类的长期文件一直是详细说明重要问题和需要额外关注的复杂主题的标准格式。自动汇总系统可以有效地将长文档置于简短而简洁的文本中,以封装最重要的信息,从而在帮助读者的理解中很重要。最近,随着神经体系结构的出现,已经做出了重大的研究工作,以推动自动文本摘要系统,以及有关将这些系统扩展到长期文档领域的挑战的大量研究。在这项调查中,我们提供了有关长期文档摘要的研究的全面概述,以及其研究环境的三个主要组成部分的系统评估:基准数据集,汇总模型和评估指标。对于每个组成部分,我们在长期汇总的背景下组织文献,并进行经验分析,以扩大有关当前研究进度的观点。实证分析包括一项研究基准数据集的内在特征,摘要模型的多维分析以及摘要评估指标的综述。根据总体发现,我们通过提出可能在这个快速增长的领域中提出未来探索的方向来得出结论。
translated by 谷歌翻译
跨语性摘要是用一种语言(例如英语)以不同语言(例如中文)生成一种语言(例如英语)的摘要。在全球化背景下,这项任务吸引了计算语言学界的越来越多的关注。然而,对于这项任务仍然缺乏全面的审查。因此,我们在该领域的数据集,方法和挑战上介绍了第一个系统的批判性审查。具体而言,我们分别根据不同的构造方法和解决方案范例仔细组织现有的数据集和方法。对于每种类型的数据集或方法,我们彻底介绍并总结了以前的努力,并将它们相互比较以提供更深入的分析。最后,我们还讨论了有希望的方向,并提供了我们的思想,以促进未来的研究。这项调查适用于跨语性摘要的初学者和专家,我们希望它将成为起点,也可以为对该领域感兴趣的研究人员和工程师提供新的想法。
translated by 谷歌翻译
Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OAsum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.
translated by 谷歌翻译
A primary objective of news articles is to establish the factual record for an event, frequently achieved by conveying both the details of the specified event (i.e., the 5 Ws; Who, What, Where, When and Why regarding the event) and how people reacted to it (i.e., reported statements). However, existing work on news summarization almost exclusively focuses on the event details. In this work, we propose the novel task of summarizing the reactions of different speakers, as expressed by their reported statements, to a given event. To this end, we create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements from various public figures obtained from 633 news articles discussing 132 events. We propose an automatic silver training data generation approach for our task, which helps smaller models like BART achieve GPT-3 level performance on this task. Finally, we introduce a pipeline-based framework for summarizing reported speech, which we empirically show to generate summaries that are more abstractive and factual than baseline query-focused summarization approaches.
translated by 谷歌翻译
Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.
translated by 谷歌翻译
With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models whereas only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the pre-trained unsupervised PEGASUS by 4.37% to 7.27% relative mean ROUGE across four widely-adopted summarization benchmarks, and achieves relative gains of 7.51% (up to 23.73%) averaged over 30 transfer setups.
translated by 谷歌翻译
Text summarization is a user-preference based task, i.e., for one document, users often have different priorities for summary. As a key aspect of customization in summarization, granularity is used to measure the semantic coverage between the summary and source document. However, developing systems that can generate summaries with customizable semantic coverage is still an under-explored topic. In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum. We take events as the basic semantic units of the source documents and propose to rank these events by their salience. We also develop a model to summarize input documents with given events as anchors and hints. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner. Meanwhile, we annotate a new benchmark GranuDUC that contains multiple summaries at different granularities for each document cluster. Experimental results confirm the substantial superiority of GranuSum on multi-granularity summarization over strong baselines. Further, by exploiting the event information, GranuSum also exhibits state-of-the-art performance under the conventional unsupervised abstractive setting. Dataset for this paper can be found at: https://github.com/maszhongming/GranuDUC
translated by 谷歌翻译
本文介绍了Z-Code ++,这是一种针对抽象文本摘要优化的新的预训练的语言模型。该模型使用三种技术扩展了艺术编码器模型的状态。首先,我们使用两阶段的预训练过程来改善模型在低资源摘要任务上的性能。该模型首先是使用文本语料库进行语言理解的预先培训的,然后在汇总语料库中不断预先培训,以进行基础文本生成。其次,我们用分离的注意力层代替编码器中的自我发项层,其中每个单词都使用两个向量分别代表其内容和位置。第三,我们使用融合编码器,这是一种以层次方式编码长序列的简单而有效的方法。 Z-Code ++在13个文本摘要任务中的9个跨5种语言中创建了新的艺术状态。我们的模型的参数有效,因为它的表现优于XSUM上600倍较大的Palm-540b,并且在Samsum上的易经的200倍GPT3-175B较大。在零射击和少量设置中,我们的模型大大优于竞争模型。
translated by 谷歌翻译
知识密集型语言任务(苏格兰信)通常需要大量信息来提供正确的答案。解决此问题的一种流行范式是将搜索系统与机器读取器相结合,前者检索支持证据,后者检查它们以产生答案。最近,读者组成部分在大规模预培养的生成模型的帮助下见证了重大进展。同时,搜索组件中的大多数现有解决方案都依赖于传统的``索引 - retrieve-then-Rank''管道,该管道遭受了巨大的内存足迹和端到端优化的困难。受到最新构建基于模型的IR模型的努力的启发,我们建议用新颖的单步生成模型替换传统的多步搜索管道,该模型可以极大地简化搜索过程并以端到端的方式进行优化。我们表明,可以通过一组经过适当设计的预训练任务来学习强大的生成检索模型,并被采用以通过进一步的微调来改善各种下游苏格兰短裙任务。我们将预训练的生成检索模型命名为Copusbrain,因为有关该语料库的所有信息均以其参数进行编码,而无需构造其他索引。经验结果表明,在苏格兰语基准上的检索任务并建立了新的最新性能,Copusbrain可以极大地超过强大的基准。我们还表明,在零农源和低资源设置下,科体班运行良好。
translated by 谷歌翻译
输出长度对于对话摘要系统至关重要。对话摘要长度由多个因素决定,包括对话复杂性,摘要目标和个人偏好。在这项工作中,我们从三个角度来对话摘要长度。首先,我们分析了现有模型的输出与相应的人类参考之间的长度差异,并发现摘要模型由于其预训练的目标而倾向于产生更多的详细摘要。其次,我们通过比较不同的模型设置来确定摘要长度预测的显着特征。第三,我们尝试使用长度意识的摘要,并在现有模型上显示出显着改进,如果汇总长度可以很好地整合。分析和实验是在流行的对话和Samsum数据集中进行的,以验证我们的发现。
translated by 谷歌翻译
GPT-3等模型的零和少量提示的最新成功导致了NLP研究的范式转移。在本文中,我们研究了其对文本摘要的影响,重点是新闻摘要的经典基准领域。首先,我们研究了零击GPT-3与在大型摘要数据集中训练的微调模型的比较。我们表明,不仅人类压倒性地更喜欢GPT-3摘要,而且这些摘要也不遭受普通数据集特异性问题(例如事实差的问题)。接下来,我们研究这对评估意味着什么,尤其是黄金标准测试集的作用。我们的实验表明,基于参考和无参考的自动指标,例如最近提出的基于质量检查或基于质量的事实方法无法可靠地评估零击摘要。最后,我们讨论了未来的研究挑战,除了通用摘要之外,特别是基于关键字和方面的摘要,表明了优势微调方法与零拍的提示相比如何。为了支持进一步的研究,我们发布:(a)在4个标准摘要基准中,从微调和零摄像模型中产生的10K生成的摘要,(b)1K人类偏好判断和比较不同系统的普通系统,以进行通用和关键字的不同系统。基于摘要。
translated by 谷歌翻译