开放域对话系统旨在以开放式的方式通过自然语言文本与人类互动。但是,广泛成功的神经网络可能对对话系统无法正常工作,因为它们倾向于产生通用响应。在这项工作中,我们提出了一个相等大小的艰难期望 - 最大化(EQHARD-EM)算法来训练多样化对话生成的多次模型。我们的算法以艰苦的方式将样品分配给解码器,并强加了等同的约束,以确保所有解码器都经过良好的训练。我们提供详细的理论分析以证明我们的方法是合理的。此外,对两个大规模开放域对话数据集进行了实验,验证了我们的eqhard-em算法是否会产生高质量的不同响应。
translated by 谷歌翻译
Natural Language Generation (NLG) represents a large collection of tasks in the field of NLP. While many of these tasks have been tackled well by the cross-entropy (CE) loss, the task of dialog generation poses a few unique challenges for this loss function. First, CE loss assumes that for any given input, the only possible output is the one available as the ground truth in the training dataset. In general, this is not true for any task, as there can be multiple semantically equivalent sentences, each with a different surface form. This problem gets exaggerated further for the dialog generation task, as there can be multiple valid responses (for a given context) that not only have different surface forms but are also not semantically equivalent. Second, CE loss does not take the context into consideration while processing the response and, hence, it treats all ground truths with equal importance irrespective of the context. But, we may want our final agent to avoid certain classes of responses (e.g. bland, non-informative or biased responses) and give relatively higher weightage for more context-specific responses. To circumvent these shortcomings of the CE loss, in this paper, we propose a novel loss function, CORAL, that directly optimizes recently proposed estimates of human preference for generated responses. Using CORAL, we can train dialog generation models without assuming non-existence of response other than the ground-truth. Also, the CORAL loss is computed based on both the context and the response. Extensive comparisons on two benchmark datasets show that the proposed methods outperform strong state-of-the-art baseline models of different sizes.
translated by 谷歌翻译
预培训语言模型的浪潮一直不断提高机器生成的对话的质量,然而,一些产生的响应仍然遭受过度重复,有时重复从话语中重复单词,有时重复自我产生的响应中的单词,或者两个都。不当重复单词可以显着降低生成文本的质量。受到惩罚的采样是一种流行的解决方案,减少了推理期间现有词的采样概率,但是,它非常容易受到静态的不适当的设置。将其设置得太高可以产生奇怪和不切实际的句子,同时将其设置得太低,使得抑制重复微不足道的任务。要解决上述方法的缺点,我们设计了一个上下文感知的分类器,以明确决定何时允许重复和何时采用惩罚的采样。这种分类器可以容易地与现有的解码方法集成,在保持文本的分集的同时在适当的情况下减少重复。实验结果表明,我们的方法可以产生更高质量和更真实的对话。
translated by 谷歌翻译
Chatbots are expected to be knowledgeable across multiple domains, e.g. for daily chit-chat, exchange of information, and grounding in emotional situations. To effectively measure the quality of such conversational agents, a model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains. Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another. This calls for a dedicated network architecture for domain generalization. To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. To validate the idea, we construct a high-quality multi-domain dialogue dataset leveraging data augmentation and pseudo-labeling. The PoE network is comprehensively assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. It achieves state-of-the-art performance in terms of mean Spearman correlation over all the evaluation datasets. It exhibits better zero-shot generalization than existing state-of-the-art ADEMs and the ability to easily adapt to new domains with few-shot transfer learning.
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
本文对过去二十年来对自然语言生成(NLG)的研究提供了全面的审查,特别是与数据到文本生成和文本到文本生成深度学习方法有关,以及NLG的新应用技术。该调查旨在(a)给出关于NLG核心任务的最新综合,以及该领域采用的建筑;(b)详细介绍各种NLG任务和数据集,并提请注意NLG评估中的挑战,专注于不同的评估方法及其关系;(c)强调一些未来的强调和相对近期的研究问题,因为NLG和其他人工智能领域的协同作用而增加,例如计算机视觉,文本和计算创造力。
translated by 谷歌翻译
We investigate response generation for multi-turn dialogue in generative-based chatbots. Existing generative models based on RNNs (Recurrent Neural Networks) usually employ the last hidden state to summarize the sequences, which makes models unable to capture the subtle variability observed in different dialogues and cannot distinguish the differences between dialogues that are similar in composition. In this paper, we propose a Pseudo-Variational Gated Recurrent Unit (PVGRU) component without posterior knowledge through introducing a recurrent summarizing variable into the GRU, which can aggregate the accumulated distribution variations of subsequences. PVGRU can perceive the subtle semantic variability through summarizing variables that are optimized by the devised distribution consistency and reconstruction objectives. In addition, we build a Pseudo-Variational Hierarchical Dialogue (PVHD) model based on PVGRU. Experimental results demonstrate that PVGRU can broadly improve the diversity and relevance of responses on two benchmark datasets.
translated by 谷歌翻译
尽管条件变异自动编码器(CVAE)模型比传统的SEQ2SEQ模型可以产生更多的多样化响应,但响应通常与输入词的相关性低或与问题不合逻辑。进行因果分析以研究背后的原因,并提供了一种寻找调解人并减轻对话中混杂偏见的方法。具体而言,我们建议预测调解人,以保留相关信息,并自动将调解人纳入生成过程中。此外,动态主题图指导条件变异自动编码器(TGG-CVAE)模型用于补充语义空间并减少响应中的混杂偏置。广泛的实验表明,所提出的模型能够产生相关和信息性的响应,并且在自动指标和人类评估方面优于最先进的响应。
translated by 谷歌翻译
Crosstalk是一种传统的中国戏剧表演艺术。它通常由两个表演者以对话的形式执行。凭借对话的典型特征,串扰也被设计为有趣的观众。在这项研究中,我们介绍了Crossdial,这是第一个开源数据集,其中包含来自网络上最经典的中国串扰。此外,我们定义了两个新任务,提供了两个基准,并研究了当前的对话生成模型在串扰生成领域的能力。实验结果和案例研究表明,串扰的生成对于直接方法来说是具有挑战性的,并且仍然是未来工作的有趣主题。
translated by 谷歌翻译
Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.
translated by 谷歌翻译
Conditional variational models, using either continuous or discrete latent variables, are powerful for open-domain dialogue response generation. However, previous works show that continuous latent variables tend to reduce the coherence of generated responses. In this paper, we also found that discrete latent variables have difficulty capturing more diverse expressions. To tackle these problems, we combine the merits of both continuous and discrete latent variables and propose a Hybrid Latent Variable (HLV) method. Specifically, HLV constrains the global semantics of responses through discrete latent variables and enriches responses with continuous latent variables. Thus, we diversify the generated responses while maintaining relevance and coherence. In addition, we propose Conditional Hybrid Variational Transformer (CHVT) to construct and to utilize HLV with transformers for dialogue generation. Through fine-grained symbolic-level semantic information and additive Gaussian mixing, we construct the distribution of continuous variables, prompting the generation of diverse expressions. Meanwhile, to maintain the relevance and coherence, the discrete latent variable is optimized by self-separation training. Experimental results on two dialogue generation datasets (DailyDialog and Opensubtitles) show that CHVT is superior to traditional transformer-based variational mechanism w.r.t. diversity, relevance and coherence metrics. Moreover, we also demonstrate the benefit of applying HLV to fine-tuning two pre-trained dialogue models (PLATO and BART-base).
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译
Recent advances in large-scale pre-training provide large models with the potential to learn knowledge from the raw text. It is thus natural to ask whether it is possible to leverage these large models as knowledge bases for downstream tasks. In this work, we answer the aforementioned question in unsupervised knowledge-grounded conversation. We explore various methods that best elicit knowledge from large models. Our human study indicates that, though hallucinations exist, large models post the unique advantage of being able to output common sense and summarize facts that cannot be directly retrieved from the search engine. To better exploit such generated knowledge in dialogue generation, we treat the generated knowledge as a noisy knowledge source and propose the posterior-based reweighing as well as the noisy training strategy. Empirical results on two benchmarks show advantages over the state-of-the-art methods.
translated by 谷歌翻译
预先接受训练的语言模型的最新进展具有显着改善的神经反应生成。但是,现有方法通常将对话背景视为令牌的线性序列,并通过令牌级自我关注学习生成下一个单词。这些令牌级编码阻碍了话语中话语水平一致性的探索。本文介绍了对话贝特,这是一种新的会话响应生成模型,可以增强以前的基于PLM的对话模型。 DialogBert采用分层变压器架构。为了有效地捕捉话语中的话语水平一致性,我们提出了两种培训目标,包括蒙面的话语回归和分布式话语秩序与原始BERT训练相比。在三个多转对谈话数据集上的实验表明,在定量评估方面,我们的方法非常优于BART和Dialogpt等基线。人类评估表明,DialogBert比具有显着利润率的基线产生更加连贯,信息和人类的反应。
translated by 谷歌翻译
深度神经语言模型的最新进展与大规模数据集的能力相结合,加速了自然语言生成系统的发展,这些系统在多种任务和应用程序上下文中产生流利和连贯的文本(在各种成功程度上)。但是,为所需的用户控制这些模型的输出仍然是一个开放的挑战。这不仅对于自定义生成语言的内容和样式至关重要,而且对于他们在现实世界中的安全可靠部署至关重要。我们提出了一项关于受约束神经语言生成的新兴主题的广泛调查,在该主题中,我们通过区分条件和约束(后者是在输出文本上而不是输入的可检验条件),正式定义和分类自然语言生成问题,目前是可检验的)约束文本生成任务,并查看受限文本生成的现有方法和评估指标。我们的目的是强调这个新兴领域的最新进展和趋势,以告知最有希望的方向和局限性,以推动受约束神经语言生成研究的最新作品。
translated by 谷歌翻译
Personalized chatbots focus on endowing the chatbots with a consistent personality to behave like real users and further act as personal assistants. Previous studies have explored generating implicit user profiles from the user's dialogue history for building personalized chatbots. However, these studies only use the response generation loss to train the entire model, thus it is prone to suffer from the problem of data sparsity. Besides, they overemphasize the final generated response's quality while ignoring the correlations and fusions between the user's dialogue history, leading to rough data representations and performance degradation. To tackle these problems, we propose a self-supervised learning framework MCP for capturing better representations from users' dialogue history for personalized chatbots. Specifically, we apply contrastive sampling methods to leverage the supervised signals hidden in user dialog history, and generate the pre-training samples for enhancing the model. We design three pre-training tasks based on three types of contrastive pairs from user dialogue history, namely response pairs, sequence augmentation pairs, and user pairs. We pre-train the utterance encoder and the history encoder towards the contrastive objectives and use these pre-trained encoders for generating user profiles while personalized response generation. Experimental results on two real-world datasets show a significant improvement in our proposed model MCP compared with the existing methods.
translated by 谷歌翻译
语言是人类交流的主要工具,其中幽默是最有吸引力的部分之一。使用计算机,又称自然语言生成(NLG)的人类产生自然语言,已广泛用于对话系统,聊天机器人,机器翻译以及计算机AID创建,例如Idea Generations,剧本。但是,自然语言的幽默方面相对不足,尤其是在预训练的语言模型时代。在这项工作中,我们旨在初步测试NLG是否可以像人类一样产生幽默。我们构建了一个新的数据集,该数据集由众多数字化的中国可笑的串扰脚本(称为c $^3 $简称),该脚本适用于1800年代以来名为“ Xiangsheng”的流行中国表演艺术。 (为了方便非中国扬声器,我们在本文中称为“ Xiangsheng”的“ Crosstalk”。)我们基准了各种一代方法,包括训练seq2seq,微调中级PLMS和大型PLMS(大型PLMS)(有无微调)。此外,我们还进行了人类评估,表明1)大规模预处理在很大程度上提高了串扰的产生质量; 2)即使是从最佳PLM产生的脚本也远非我们的期望,只有65%的人类创建的串扰质量。我们得出结论,使用大型PLM可以在很大程度上改善幽默的产生,但仍处于起步阶段。 \ url {https://github.com/anonno2/crosstalk-generation}公开可用数据和基准代码。
translated by 谷歌翻译
真实的人类对话数据是复杂,异质和嘈杂的,从该数据中构建开放域对话系统仍然是一项艰巨的任务。实际上,此类对话数据仍然包含大量信息和知识,但是,它们没有得到充分探索。在本文中,我们展示了现有的开放域对话生成方法,这些方法记住上下文响应配对的数据,并使用自动回归或编码模型模型不利于培训数据。与当前的方法不同,使用外部知识,我们探索了一个检索生成培训框架,该培训框架可以通过将它们视为“证据”来利用异质和嘈杂的培训数据。特别是,我们使用Bertscore进行检索,这给出了证据和一代的更好品质。公开可用数据集的实验表明,我们的方法可以帮助模型产生更好的响应,即使此类培训数据通常会留下深刻的印象为低质量数据。这种性能增益与通过扩大训练组更好的改进的绩效增益相当,甚至更好。我们还发现,模型性能与检索到的证据的相关性有正相关。此外,我们的方法在零拍实验上表现良好,这表明我们的方法对现实世界数据可能更强大。
translated by 谷歌翻译
在本文中,我们介绍了基于大型预训练的语言模型(PLM)pangu-alpha(Zeng等,2021)的中国预训练的开放域对话生成模型。与其他对大量对话数据进行培训的预训练的对话模型不同,我们旨在通过继承PLM的有价值的语言能力和知识来构建强大的对话模型,并以相对较少的数据和计算成本构建强大的对话模型。为此,我们训练大型PLM Pangu-Alpha的Pangu-bot,该机器人已被证明在各种中国自然语言任务上表现出色。我们研究了pangu-bot产生的响应的不同方面,包括响应质量,知识和安全性。我们表明,Pangu-Bot优于最先进的中国对话系统(CDIALGPT(Wang等,2020),Eva(Zhou等,2021),EVA2.0(Gu等,2022)) W.R.T.以上三个方面。我们还证明,可以轻松地部署pangu-bot,以在没有进一步训练的情况下产生情感反应。在整个经验分析中,我们还指出,Pangu-bot响应质量,知识正确性和安全性仍然远非完美,进一步的探索对于建立可靠且智能的对话系统是必不可少的。我们的型号和代码将在https://github.com/huawei-noah/pretretaining-language-model/tree/master/master/pangu-bot上提供。
translated by 谷歌翻译
We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent opendomain dialogue systems.
translated by 谷歌翻译