许多古典童话,小说和剧本都利用对话来推进故事情节并建立角色。我们提出了第一个研究,以探索机器是否可以理解和产生故事中的对话,这需要捕获不同角色的特征及其之间的关系。为此,我们提出了两项​​新任务,包括蒙版对话生成和对话演讲者的认可,即分别产生对话转弯和预测说话者的指定对话转弯。我们构建了一个新的数据集拨号故事,该数据集由105K中国故事组成,其中包含大量对话,以支持评估。我们通过对拨号故事进行自动和手动评估测试现有模型来显示提出的任务的困难。此外,我们建议学习明确的角色表示,以提高这些任务的绩效。广泛的实验和案例研究表明,我们的方法可以产生更连贯和信息丰富的对话,并获得比强基础更高的说话者识别精度。
translated by 谷歌翻译
We have a Christmas gift for Harry Potter fans all over the world. In this paper, we present Harry Potter Dialogue (HPD), a dataset that helps train Harry Potter-like dialogue agents. Such a task is typically viewed as a variant of personalized dialogue agents, but they differ significantly in three respects: 1) Harry lived in a virtual world of wizards, thus, real-world commonsense may not apply to Harry's conversations; 2) Harry's behavior is strongly linked to background information in conversations: the scene, its attributes and its relationship to other speakers; and 3) Such backgrounds are dynamically altered as the storyline goes on. The HPD dataset, as the first dataset to facilitate the study of dialogue agent construction for characters within a story, provides rich contextual information about each dialogue session such as scenes, character attributes, and relations. More importantly, all the background information will change over the course of the story. In addition, HPD could support both dialogue generation and retrieval tasks. We evaluate baselines such as Dialog-GPT and BOB to determine the extent to which they can generate Harry Potter-like responses. The experimental results disappoint us in that although the generated responses are fluent, they still seem out of character for Harry. Besides, we validate the current most robust dialogue agent, ChatGPT, which also can't generate plausible Harry-Potter-like responses in some cases, either. Our results suggest that there is much scope for future research.
translated by 谷歌翻译
尽管在产生流利的文本方面取得了进步,但现有的预训练模型倾向于在产生诸如故事和新闻之类的叙述时将不连贯的事件序列附加到相关实体上。我们猜想,这些问题是由将实体表示为浅表词的静态嵌入而导致的,同时忽略了对其不断变化的状态建模,即随着文本的展开,即它们所携带的信息。因此,我们将变压器模型扩展到动态执行实体状态更新和叙事生成的句子实现。我们提出了一个对比框架,以在离散空间中学习状态表示,并将其他注意层插入解码器中以更好地利用这些状态。两个叙述数据集的实验表明,与有意义的实体状态的指导相比,我们的模型可以产生更多的连贯和多样化的叙事。
translated by 谷歌翻译
Due to the lack of human resources for mental health support, there is an increasing demand for employing conversational agents for support. Recent work has demonstrated the effectiveness of dialogue models in providing emotional support. As previous studies have demonstrated that seekers' persona is an important factor for effective support, we investigate whether there are benefits to modeling such information in dialogue models for support. In this paper, our empirical analysis verifies that persona has an important impact on emotional support. Therefore, we propose a framework for dynamically inferring and modeling seekers' persona. We first train a model for inferring the seeker's persona from the conversation history. Accordingly, we propose PAL, a model that leverages persona information and, in conjunction with our strategy-based controllable generation method, provides personalized emotional support. Automatic and manual evaluations demonstrate that our proposed model, PAL, achieves state-of-the-art results, outperforming the baselines on the studied benchmark. Our code and data are publicly available at https://github.com/chengjl19/PAL.
translated by 谷歌翻译
非平行文本样式转移是自然语言生成的重要任务。但是,先前的研究集中在令牌或句子级别上,例如句子情绪和形式转移,但在话语水平上忽略了长时间的转移。长文本通常涉及更复杂的作者语言偏好,例如话语结构,而不是句子。在本文中,我们制定了非并行故事作者风格转移的任务,该任务需要将输入故事传输到指定的作者样式的同时,同时维护源语义。为了解决这个问题,我们提出了一个名为StoryTrans的一代模型,该模型利用话语表示捕获源内容信息并将其传输到具有可学习样式嵌入的目标样式中。我们使用额外的培训目标将文学的文学特征与学习的话语表示,以防止模型退化为自动编码器。此外,为了增强内容保存,我们设计了一个面具和填充框架,以将源文本的特定于特定于样式的关键字定为生成。此外,我们分别用中文和英语构建了此任务的新数据集。广泛的实验表明,我们的模型在样式传输和内容保存的总体性能方面优于强大的基线。
translated by 谷歌翻译
对事件序列的预测对于信息检索和自然语言处理中的许多现实世界应用至关重要。在事件序列预测中,未来的活动生成(FEG)是一项具有挑战性的任务,因为它不仅需要流利的文本生成,而且需要常识性推理才能保持整个事件故事的逻辑连贯性。在本文中,我们提出了一个新颖的可解释的FEG框架COEP。它突出并整合了两种类型的事件知识,对直接事件事件关系的顺序知识以及推论知识,这些知识反映了事件之间的中间角色心理学(例如意图,原因,反应),这些心理本质地将故事推向了故事。为了减轻知识遗忘问题,我们为每种类型的知识设计了两个模块,即IM和GM,它们是通过及时调整组合的。首先,IM专注于理解推论知识,以产生常识性解释并为通用汽车提供软提示向量。我们还设计了一种对比歧视器,以提高概括能力。其次,GM通过用IM的指导对直接顺序知识进行建模来生成未来事件。自动和人类评估表明,我们的方法可以产生更连贯,具体和逻辑的未来事件。
translated by 谷歌翻译
最先进的对话模型仍然对事实准确性和自我矛盾甚至困难。轶事,他们已被观察到在整个话语中未能维持性质身份;更具体地,可能会涉及其对话者的作用。在这项工作中,我们正规化和量化这种缺陷,并通过人类评估实验表明这确实是一个问题。相比之下,我们展示了专门识别谁在谈话的歧视模型可以表现良好;此外,这些可以用作自动指标。最后,我们评估了各种缓解方法,包括模型架构,培训协议和解码策略的变化。根据人类的注释者,我们最好的车型减少了近65%的误认为是近65%,同时提高了参与度。尽管有这些结果,但我们发现维持性格身份仍然是一个具有挑战性的问题。
translated by 谷歌翻译
Dialogue models are able to generate coherent and fluent responses, but they can still be challenging to control and may produce non-engaging, unsafe results. This unpredictability diminishes user trust and can hinder the use of the models in the real world. To address this, we introduce DialGuide, a novel framework for controlling dialogue model behavior using natural language rules, or guidelines. These guidelines provide information about the context they are applicable to and what should be included in the response, allowing the models to generate responses that are more closely aligned with the developer's expectations and intent. We evaluate DialGuide on three tasks in open-domain dialogue response generation: guideline selection, response generation, and response entailment verification. Our dataset contains 10,737 positive and 15,467 negative dialogue context-response-guideline triplets across two domains - chit-chat and safety. We provide baseline models for the tasks and benchmark their performance. We also demonstrate that DialGuide is effective in the dialogue safety domain, producing safe and engaging responses that follow developer guidelines.
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译
在本文中,我们使用大规模播放脚本数据集来提出从对话中提出戏剧发电的新颖任务。使用超过一百万行的对话和提示,我们将提示生成问题作为受控文本生成任务方法,并展示如何使用如何使用对话/提示鉴别器的语言模型来增强对话的影响。此外,我们还探讨了主题关键字和情绪的使用,以获得受控文本生成。广泛的定量和定性实验表明,语言模型可以成功地用于在高度专业化的域中生成合理的和属性控制的文本,例如播放脚本。配套材料可在:https://catlab-team.github.io/cuegen。
translated by 谷歌翻译
Abstractive dialogue summarization has received increasing attention recently. Despite the fact that most of the current dialogue summarization systems are trained to maximize the likelihood of human-written summaries and have achieved significant results, there is still a huge gap in generating high-quality summaries as determined by humans, such as coherence and faithfulness, partly due to the misalignment in maximizing a single human-written summary. To this end, we propose to incorporate different levels of human feedback into the training process. This will enable us to guide the models to capture the behaviors humans care about for summaries. Specifically, we ask humans to highlight the salient information to be included in summaries to provide the local feedback , and to make overall comparisons among summaries in terms of coherence, accuracy, coverage, concise and overall quality, as the global feedback. We then combine both local and global feedback to fine-tune the dialog summarization policy with Reinforcement Learning. Experiments conducted on multiple datasets demonstrate the effectiveness and generalization of our methods over the state-of-the-art supervised baselines, especially in terms of human judgments.
translated by 谷歌翻译
对于谈话的AI和虚拟助手以现实的方式与人类沟通,他们必须表现出人类特征,例如情感和个性的表达。目前对构建人类对话剂的尝试呈现出显着的困难。我们提出基于Tropes的人为水平属性(HLA)作为学习对话代理的方法,这些方法可以模仿虚构人物的个性。 Tropes是由观察员的次要观察和确定的虚构个性的特征。通过将详细的HLA数据与特定字符的对话数据组合,我们提供了一个数据集,HLA-Chat,模型字符配置文件,并提供对话代理通过HLA学习角色语言样式的能力。然后,我们介绍了一个三组件系统,Aloha(代表人工学习人为学习),它结合了字符空间映射,角色社区检测和语言样式检索,以构建特定字符(或个性)特定语言模型。我们的初步实验表明Aloha的两种变化与我们提出的数据集相结合,可以在识别所选择的目标字符的正确对话响应时占据基线模型,并且无论字符的身份,节目类型如何,都是稳定的对话。
translated by 谷歌翻译
预先接受训练的语言模型的最新进展具有显着改善的神经反应生成。但是,现有方法通常将对话背景视为令牌的线性序列,并通过令牌级自我关注学习生成下一个单词。这些令牌级编码阻碍了话语中话语水平一致性的探索。本文介绍了对话贝特,这是一种新的会话响应生成模型,可以增强以前的基于PLM的对话模型。 DialogBert采用分层变压器架构。为了有效地捕捉话语中的话语水平一致性,我们提出了两种培训目标,包括蒙面的话语回归和分布式话语秩序与原始BERT训练相比。在三个多转对谈话数据集上的实验表明,在定量评估方面,我们的方法非常优于BART和Dialogpt等基线。人类评估表明,DialogBert比具有显着利润率的基线产生更加连贯,信息和人类的反应。
translated by 谷歌翻译
Crosstalk是一种传统的中国戏剧表演艺术。它通常由两个表演者以对话的形式执行。凭借对话的典型特征,串扰也被设计为有趣的观众。在这项研究中,我们介绍了Crossdial,这是第一个开源数据集,其中包含来自网络上最经典的中国串扰。此外,我们定义了两个新任务,提供了两个基准,并研究了当前的对话生成模型在串扰生成领域的能力。实验结果和案例研究表明,串扰的生成对于直接方法来说是具有挑战性的,并且仍然是未来工作的有趣主题。
translated by 谷歌翻译
生成摘要中的事实不一致严重限制了抽象对话摘要的实际应用。尽管通过使用预先训练的模型实现了显着进展,但在人类评估期间发现了大量的幻觉含量。预先接受的模型最常见的是微调文本摘要的跨熵损失,这可能不是最佳策略。在这项工作中,我们为带注释数据提供了事实错误的类型,以突出显示错误的类型并远离对事实的二进制了解。我们进一步提出了一种培训策略,通过新颖的对比微调,改善了摘要的事实一致性和整体素质。基于我们的语言信息的错误类型,我们设计了各个目标的不同模块化目标。具体而言,我们利用硬阴性样本具有误差,以减少事实不一致的产生。为了捕获扬声器之间的关键信息,我们还设计了特定于对话的损失。使用人类评估和自动忠实度量指标,我们表明我们的模型在对话摘要,Samsum语料库中大大降低了各种事实错误。此外,我们的模型可以推广到会议概述,AMI语料库,它产生的分数明显高于两个数据集关于单词 - 重叠度量标准的基线。
translated by 谷歌翻译
The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.
translated by 谷歌翻译
会话代理已成为简单任务允许情况的一般人群的组成部分。然而,这些系统尚未对各种和少数群体的任何社会影响,例如,帮助患有神经系统障碍的人,例如ALS和言语,语言和社交交流障碍的人。语言模型技术可以发挥巨大作用,以帮助这些用户进行日常沟通和社交互动。要启用此群体,我们构建了一个对话系统,可以使用CUES或关键字的用户控制。我们构建可以在用于控制响应生成的对话响应上下文中建立相关提示的模型,并可以加快通信。我们还介绍了一个关键字丢失来限制模型输出。我们在定性和定量上展示我们的模型可以有效地将关键字诱导到模型响应中,而不会降低响应的质量。在使用退行性障碍的人的使用情况的背景下,我们展示了对我们的提示或关键字预测器和可控对话系统的人类评估,并显示我们的模型比没有控制的模型更好地表现更好。我们的研究表明,在结束到结束响应生成模型的关键字控制是强大的,可以使用户能够与退行性疾病启用和赋予日常通信的日常沟通。
translated by 谷歌翻译
Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.
translated by 谷歌翻译
对话是人类沟通与合作的重要组成部分。现有研究主要关注一对一时尚的短对话情景。然而,现实世界中的多人互动,例如会议或访谈,经常超过几千个字。仍然缺乏相应的研究和强大的工具来了解和处理这么长的对话。因此,在这项工作中,我们为长时间对话理解和总结提供了预先培训框架。考虑到长期交谈的性质,我们提出了一种基于窗口的去噪方法,用于生成预训练。对于对话框,它损坏了一个带有对话激发灵感噪声的文本窗口,并指导模型基于剩余对话的内容来重建此窗口。此外,为了更长的输入,我们增加了稀疏关注模型,这些模型以混合方式与传统的关注相结合。我们在长对话的五个数据集进行广泛的实验,涵盖对话摘要的任务,抽象问题回答和主题分割。实验,我们表明,我们的预先训练的模型DialogLM显着超越了数据集和任务的最先进的模型。我们的GitHub存储库(HTTPS:/github.com/microsoft/dialoglm上有源代码和所有预先训练的型号。
translated by 谷歌翻译
Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.
translated by 谷歌翻译