In this paper, we analyze neural network-based dialogue systems trained in an end-to-end manner using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words 1. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering. We provide baselines in two different environments: one where models are trained to select the correct next response from a list of candidate responses, and one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation. These are both evaluated on a recall task that we call next utterance classification (NUC), and using vector-based metrics that capture the topicality of the responses. We observe that current end-to-end models are 1. This work is an extension of a paper appearing in SIGDIAL (Lowe et al., 2015). This paper further includes results on generative dialogue models, more extensive evaluation of the retrieval models using vector-based generative metrics, and a qualitative examination of responses from the generative models and classification errors made by the Dual Encoder model. Experiments are performed on a new version of the corpus, the Ubuntu Dialogue Corpus v2, which is publicly available: https://github.com/rkadlec/ubuntu-ranking-dataset-creator. The early dataset has been updated to add features and fix bugs, which are detailed in Section 3. This is an open-access article distributed under the terms of a Creative Commons Attribution License (http : //creativecommons.org/licenses/by/3.0/). LOWE, POW, SERBAN, CHARLINN, LIU AND PINEAU unable to completely solve these tasks; thus, we provide a qualitative error analysis to determine the primary causes of error for end-to-end models evaluated on NUC, and examine sample utterances from the generative models. As a result of this analysis, we suggest some promising directions for future research on the Ubuntu Dialogue Corpus, which can also be applied to end-to-end dialogue systems in general.
translated by 谷歌翻译
当前的会话系统可以遵循简单的命令并回答基本问题,但是他们难以保持关于特定主题的连贯和开放式对话。正在组织ConversationalIntelligence(ConvAI)挑战等竞赛,以推动研究发展朝着这一目标迈进。本文详细介绍了参加2017年ConvAI挑战的RLLChatbott。这项研究的目标是更好地理解当前深度学习和强化学习工具如何用于构建一个健壮而灵活的开放域会话代理。我们提供了一个详尽的描述,说明如何使用集合模型从大多数公共领域数据集构建和训练对话系统。除了新颖的消息排名和选择方法之外,这项工作的第一个贡献是对不同文本生成模型的详细描述和分析。此外,还提供了一个新的开源会话数据集。与我们负责选择每次交互返回的消息的基线模型相比,对这些数据的培训显着提高了排名和选择机制的Recall @ k得分。
translated by 谷歌翻译
序列到序列(Seq2Seq)模型在生成自然对话交换方面取得了显着的成功。尽管由这些神经网络模型产生了语法上形成的响应,但它们是非文本的,简短的和通用的。在这项工作中,我们引入了一个TopicalHierarchical Recurrent Encoder Decoder(THRED),一个新颖的,完全数据驱动的多转响应生成系统,旨在产生上下文和主题感知响应。我们的模型建立在基本的Seq2Seq模型之上,通过分层联合注意机制对其进行处理,该机制将历史概念和先前的交互结合到响应生成中。通过我们的模型,我们提供了一个干净,高质量的会话数据,来自Reddit评论。我们评估THRED两个新的自动化指标,称为语义相似度和响应回声指数,以及人道评估。我们的实验表明,与强大的基线相比,所提出的模型能够生成更多样化和上下文相关的响应。
translated by 谷歌翻译
在本文中,我们探讨了深度神经网络在自然语言生成中的应用。具体来说,我们实现了两个序列到序列的神经变分模型 - 变分自动编码器(VAE)和变量编码器 - 解码器(VED)。用于文本生成的VAE难以训练,因为与损失函数的Kullback-Leibler(KL)发散项相关的问题消失为零。我们通过实施优化启发式(例如KL权重退火和字丢失)成功地训练VAE。我们还通过随机采样,线性插值和来自输入的邻域的采样来证明这种连续潜在空间的有效性。我们认为,如果VAE的设计不合适,可能会导致绕过连接,导致在训练期间忽略后期空间。我们通过实验证明了解码器隐藏状态初始化的例子,这种绕过连接将VAE降级为确定性模型,从而减少了生成的句子的多样性。我们发现传统的注意机制使用序列序列VED模型作为旁路连接,从而改进了模型的潜在空间。为了避免这个问题,我们提出了变分注意机制,其中关注上下文向量被建模为可以从分布中采样的随机变量。 Weshow凭经验使用自动评估指标,即熵和不同测量指标,我们的变分注意模型产生比确定性注意模型更多样化的输出句子。通过人类评估研究进行的定性分析证明,我们的模型同时产生的质量高,并且与确定性的注意力对应物产生的质量一样流畅。
translated by 谷歌翻译
This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter. We also describe two neural learning architectures suitable for analyzing this dataset, and provide benchmark performance on the task of selecting the best next response.
translated by 谷歌翻译
We propose simple and flexible training and decoding methods for influencing output style and topic in neural encoder-decoder based language generation. This capability is desirable in a variety of applications , including conversational systems, where successful agents need to produce language in a specific style and generate responses steered by a human puppeteer or external knowledge. We decompose the neural generation process into empirically easier sub-problems: a faithfulness model and a decoding method based on selective-sampling. We also describe training and sampling algorithms that bias the generation process with a specific language style restriction, or a topic restriction. Human evaluation results show that our proposed methods are able to restrict style and topic without degrading output quality in conversational tasks.
translated by 谷歌翻译
Automatically evaluating the quality of dialogue responses for unstructureddomains is a challenging problem. Unfortunately, existing automatic evaluationmetrics are biased and correlate very poorly with human judgements of responsequality. Yet having an accurate automatic evaluation procedure is crucial fordialogue research, as it allows rapid prototyping and testing of new modelswith fewer expensive human evaluations. In response to this challenge, weformulate automatic dialogue evaluation as a learning problem. We present anevaluation model (ADEM) that learns to predict human-like scores to inputresponses, using a new dataset of human response scores. We show that the ADEMmodel's predictions correlate significantly, and at a level much higher thanword-overlap metrics such as BLEU, with human judgements at both the utteranceand system-level. We also show that ADEM can generalize to evaluating dialoguemodels unseen during training, an important step for automatic dialogueevaluation.
translated by 谷歌翻译
构建可与人类通信的系统是人工智能的核心问题。这项工作提出了一种新颖的神经网络架构,用于端到端多转对话对话设置中的响应选择。该体系结构应用了上下文关注,并结合了域特定字描述提供的附加外部知识。它使用双向门控循环单元(GRU)来编码上下文响应,并学习参与给定潜在响应表示的上下文单词,反之亦然。此外,它使用另一个GRU对域关键字描述进行编码来合并外部域特定信息。这允许在响应中更好地表示特定于域的关键字,从而提高整体性能。实验结果表明,我们的模型在多转对话中优于所有其他最先进的响应选择方法。
translated by 谷歌翻译
我们调查对话响应生成系统的评估指标,其中监督标签(例如任务完成)不可用。响应生成的近期工作采用了机器翻译的度量标准,以模拟模型生成的对单个目标响应的响应。我们证明这些指标与非technicalTwitter域中的人类判断非常弱相关,而在技术Ubuntu域中根本不相关。我们提供定量和定性结果,突出显示未成熟度量的具体弱点,并为对话系统的自动评估指标的未来发展提供建议。
translated by 谷歌翻译
构建开放式多圈对话系统是人工智能中最有趣和最具挑战性的任务之一。许多研究人员一直致力于建立这样的对话系统,但很少有人在正在进行的对话中对会话流进行建模。此外,人们在谈话中谈论高度相关的方面是常见的。主题是连贯的,自然漂移的,这表明了对话流建模的必要性。为此,我们提出了具有强化学习方法(RLCw)的多转换词驱动的会话系统,该方法努力选择具有最大未来信用的自适应提示词,从而提高生成的响应的质量。我们引入了一个新的方法来衡量提示词在有效性和相关性方面的质量。为了进一步优化长期对话模型,本文采用了强化方法。在real-realedataset上的实验表明,我们的模型在模拟转弯,多样性和人道评估方面始终优于一组竞争基线。
translated by 谷歌翻译
In this paper, we propose a novel end-to-end neural architecture for ranking candidate answers , that adapts a hierarchical recurrent neu-ral network and a latent topic clustering module. With our proposed model, a text is encoded to a vector representation from an word-level to a chunk-level to effectively capture the entire meaning. In particular, by adapting the hierarchical structure, our model shows very small performance degradations in longer text comprehension while other state-of-the-art recurrent neural network models suffer from it. Additionally, the latent topic clustering module extracts semantic information from target samples. This clustering module is useful for any text related tasks by allowing each data sample to find its nearest topic cluster, thus helping the neural network model analyze the entire data. We evaluate our models on the Ubuntu Dialogue Corpus and consumer electronic domain question answering dataset, which is related to Samsung products. The proposed model shows state-of-the-art results for ranking question-answer pairs.
translated by 谷歌翻译
深度学习方法采用多个处理层来学习数据的层次表示,并在manydomains中产生了最先进的结果。最近,各种模型设计和方法在自然语言处理(NLP)的背景下蓬勃发展。在本文中,我们回顾了已经用于大量NLP任务的重要深度学习相关模型和方法,并提供了他们演变的演练。我们对各种模型进行了比较,比较和对比,并对NLP深度学习的过去,现在和未来进行了详细的理解。
translated by 谷歌翻译
Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advances in Deep Learning have sparked an interest in the use of neural networks in modeling language, particularly for personal-ized conversational agents that can retain contex-tual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training , data cleaning, diversity-based reranking, evaluation setting, etc. Based on the tradeoffs of different models, we propose a new neural generative dialog model conditioned on speakers as well as context history that outperforms previous models on both retrieval and generative metrics. Our findings indicate that pretraining speaker embeddings on larger datasets, as well as bootstrapping word and speaker embeddings, can significantly improve performance (up to 3 points in perplexity), and that promoting diversity in using Mutual Information based techniques has a very strong effect in ranking metrics.
translated by 谷歌翻译
The past decade has witnessed the boom of human-machine interactions, particularly via dialog systems. In this paper, we study the task of response generation in open-domain multi-turn dialog systems. Many research efforts have been dedicated to building intelligent dialog systems, yet few shed light on deepening or widening the chatting topics in a conversational session, which would attract users to talk more. To this end, this paper presents a novel deep scheme consisting of three channels, namely global, wide, and deep ones. The global channel encodes the complete historical information within the given context, the wide one employs an attention-based recurrent neural network model to predict the keywords that may not appear in the historical context, and the deep one trains a Multi-layer Perceptron model to select some keywords for an in-depth discussion. Thereafter, our scheme integrates the outputs of these three channels to generate desired responses. To justify our model, we conducted extensive experiments to compare our model with several state-of-the-art baselines on two datasets: one is constructed by ourselves and the other is a public benchmark dataset. Experimental results demonstrate that our model yields promising performance by widening or deepening the topics of interest.
translated by 谷歌翻译
While recent neural encoder-decoder models have shown great promise in mod-eling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy de-coders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making. 1
translated by 谷歌翻译
We introduce the multiresolution recurrent neural network, which extends thesequence-to-sequence framework to model natural language generation as twoparallel discrete stochastic processes: a sequence of high-level coarse tokens,and a sequence of natural language tokens. There are many ways to estimate orlearn the high-level coarse tokens, but we argue that a simple extractionprocedure is sufficient to capture a wealth of high-level discourse semantics.Such procedure allows training the multiresolution recurrent neural network bymaximizing the exact joint log-likelihood over both sequences. In contrast tothe standard log- likelihood objective w.r.t. natural language tokens (wordperplexity), optimizing the joint log-likelihood biases the model towardsmodeling high-level abstractions. We apply the proposed model to the task ofdialogue response generation in two challenging domains: the Ubuntu technicalsupport domain, and Twitter conversations. On Ubuntu, the model outperformscompeting approaches by a substantial margin, achieving state-of-the-artresults according to both automatic evaluation metrics and a human evaluationstudy. On Twitter, the model appears to generate more relevant and on-topicresponses according to automatic evaluation metrics. Finally, our experimentsdemonstrate that the proposed model is more adept at overcoming the sparsity ofnatural language and is better able to capture long-term structure.
translated by 谷歌翻译
人们越来越关注使用神经网络和深度学习技术来创建对话系统。会话推荐是科学探索与自然语言对话的有趣设置,因为相关的话语涉及目标驱动的对话,这种对话自然地转变为更自由形式的聊天。本文提供了两种分布。首先,直到现在还没有公开可用的大规模数据集,其中包括以建议为中心的现实对话。为了解决这个问题并促进我们在这里的探索,我们收集了ReDial,这是一个由10,000多个以提供电影推荐主题为中心的对话组成的数据集。我们将此数据提供给社区以供进一步研究。其次,我们使用此数据集来探索会话建议的多个方面。特别是,我们探索适合于构建会话推荐系统的新神经架构,机制和方法。我们的数据集允许我们系统地探测模型子组件,解决整个问题域的不同部分,范围从:情感分析和冷启动推荐生成到在现实世界中如何使用自然语言的详细方面。我们将这些子组件组合成一个完整的对话系统并检查其行为。
translated by 谷歌翻译
We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, and using a long short-term memory recurrent neural network as a decoder. Several tasks are considered, including sentence reconstruction and future sentence prediction. Further, a hierarchical encoder-decoder model is proposed to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic con-volutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods.
translated by 谷歌翻译
A recent ''third wave'' of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, Kezban Dilek Onal and Ye Zhang contributed equally. Maarten de Rijke and Matthew Lease contributed equally. we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.
translated by 谷歌翻译
我们针对开放域会话代理的现有编码器 - 解码器模型提出了三种增强,旨在有效地建模一致性和促进输出多样性:(1)我们引入一种一致性度量作为对话上下文与生成的响应之间的嵌入相似性,(2)我们根据相干性度量过滤我们的训练语料库,以获得局部相干和词汇多样化的上下文 - 响应对,(3)然后我们使用条件变量自动编码器模型训练响应生成器,该模型将相干性度量作为潜在变量并使用上下文门来保证与背景的主题一致性和促进双重多样性。在OpenSubtitles语料库上的实验表明,在BLEU评分以及连贯性和多样性的指标方面,竞争神经模型得到了实质性的改进。
translated by 谷歌翻译