While recent neural encoder-decoder models have shown great promise in mod-eling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy de-coders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making. 1
translated by 谷歌翻译
The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon vari-ational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover inter-pretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation. 1
translated by 谷歌翻译
Developing a dialogue agent that is capable of making autonomous decisions and communicating by natural language is one of the long-term goals of machine learning research. Traditional approaches either rely on hand-crafting a small state-action set for applying reinforcement learning that is not scalable or constructing determin-istic models for learning dialogue sentences that fail to capture natural conversational variability. In this paper, we propose a Latent Intention Dialogue Model (LIDM) that employs a discrete latent variable to learn underlying dialogue intentions in the framework of neural variational inference. In a goal-oriented dialogue scenario, these latent intentions can be interpreted as actions guiding the generation of machine responses, which can be further refined autonomously by reinforcement learning. The experimental evaluation of LIDM shows that the model out-performs published benchmarks for both corpus-based and human evaluation, demonstrating the effectiveness of discrete latent variable models for learning goal-oriented dialogues.
translated by 谷歌翻译
This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simulator. Results show that the proposed method out-performs the modular-based baseline and learns a distributed representation of the latent dialog state.
translated by 谷歌翻译
最近对话生成的神经模型为会话代理生成响应提供了很大的希望,但往往是短视的,一次预测一个话语而忽略它们对未来结果的影响。对未来的对话方向进行建模对于产生连贯,有趣的对话至关重要,这种对话导致了对话的传统NLP模型利用强化学习。在本文中,我们展示了如何整合这些目标,应用深度强化学习来模拟聊天机器人对话中的未来向前。该模型模拟两个虚拟试验者之间的对话,使用政策梯度方法奖励显示三个有用会话属性的序列:信息性(非重复性转向),连贯性和易于回答(与前瞻性功能相关)。我们在多样性,长度以及人类评判方面评估我们的模型,表明所提出的算法产生更多的交互式响应并管理在对话模拟中更持久的对话。这项工作标志着在对话的长期成功的基础上学习神经对话模型的第一步。
translated by 谷歌翻译
随着计算机视觉算法从像素的被动分析转向基于语义的主动推理,需要推理的信息算法的广度已经显着扩展。这方面的关键挑战之一是能够识别做出决策所需的信息,并选择将恢复此信息的行动。我们提出了一种强化学习方法,它保持对其内部信息的分布,明确地表示其知道和需要知道的模糊性,以实现其目标。然后根据从该分布中采样的粒子生成潜在的动作。对于每个潜在的动作,计算预期答案的分布,并且与现有的内部信息相比,获得所获得的信息的值。我们证明了这种方法适用于两个视觉语言问题,这些问题最近引起了人们的兴趣,视觉对话和视觉查询生成。在这两种情况下,该方法都会积极选择最能降低其内部不确定性的行动,并且在实现挑战目标方面优于其竞争对手。
translated by 谷歌翻译
在本文中,我们探讨了深度神经网络在自然语言生成中的应用。具体来说,我们实现了两个序列到序列的神经变分模型 - 变分自动编码器(VAE)和变量编码器 - 解码器(VED)。用于文本生成的VAE难以训练,因为与损失函数的Kullback-Leibler(KL)发散项相关的问题消失为零。我们通过实施优化启发式(例如KL权重退火和字丢失)成功地训练VAE。我们还通过随机采样,线性插值和来自输入的邻域的采样来证明这种连续潜在空间的有效性。我们认为,如果VAE的设计不合适,可能会导致绕过连接,导致在训练期间忽略后期空间。我们通过实验证明了解码器隐藏状态初始化的例子,这种绕过连接将VAE降级为确定性模型,从而减少了生成的句子的多样性。我们发现传统的注意机制使用序列序列VED模型作为旁路连接,从而改进了模型的潜在空间。为了避免这个问题,我们提出了变分注意机制,其中关注上下文向量被建模为可以从分布中采样的随机变量。 Weshow凭经验使用自动评估指标,即熵和不同测量指标,我们的变分注意模型产生比确定性注意模型更多样化的输出句子。通过人类评估研究进行的定性分析证明,我们的模型同时产生的质量高,并且与确定性的注意力对应物产生的质量一样流畅。
translated by 谷歌翻译
最近,已经提出了几种基于深度学习的模型,用于对话的端到端学习。虽然可以从数据中训练这些模型而无需任何额外的注释,但很难解释它们。另一方面,存在传统的基于状态的对话系统,其中对话的状态是离散的并且因此易于解释。但是,这些状态需要在数据中手工制作和注释。为了实现两个世界的最佳效果,我们提出潜在状态跟踪网络(LSTN),使用该网络以无人监督的方式学习可解释的模型。该模型在对话的每个回合定义了一个离散的潜变量,它可以采用一组有限的值。由于训练数据中不存在这些离散变量,因此我们使用EM算法以无人监督的方式训练我们的模型。在实验中,我们表明LSTN可以帮助实现对话模型的可解释性,而不会像端到端方法那样降低性能。
translated by 谷歌翻译
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other's reward functions must reach an agreement (or a deal) via natural language dialogue. For the first time, we show it is possible to train end-to-end models for negotiation , which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance. Our code and dataset are publicly available. 1
translated by 谷歌翻译
我们介绍了基于端到端神经网络的模型,用于模拟面向任务的对话系统的用户。对话系统中的用户模拟从两个不同的角度来看是至关重要的:(i)自动评估不同的对话模型,以及(ii)培训面向任务的对话系统。我们设计了分层序列到序列模型,它首先使用Recurrent NeuralNetworks(RNN)对初始用户目标进行编码,并将系统变成固定长度表示。然后,它使用另一个RNN层对对话历史进行编码。在每个回合中,用户响应从对话级RNN的隐藏表示中解码。这种分层用户模拟器(HUS)方法允许模型捕获用户目标的未发现部分,而不需要明确的对话状态跟踪。我们通过利用潜变量模型进一步开发若干变体,以将随机变化注入用户响应以促进模拟用户响应的多样性,以及新颖的目标规则化机制,以惩罚用户响应与初始用户目标的差异。我们通过系统地将每个用户模拟器与用不同目标和用户训练的各种对话系统策略进行交互来评估电影票预订域上的建议模型。
translated by 谷歌翻译
Generative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by in-terleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users.
translated by 谷歌翻译
This paper introduces zero-shot dialog generation (ZSDG), as a step towards neu-ral dialog systems that can instantly generalize to new situations with minimal data. ZSDG enables an end-to-end generative dialog system to generalize to a new domain for which only a domain description is provided and no training dialogs are available. Then a novel learning framework , Action Matching, is proposed. This algorithm can learn a cross-domain embedding space that models the semantics of dialog responses which, in turn, lets a neural dialog generation model generalize to new domains. We evaluate our methods on a new synthetic dialog dataset, and an existing human-human dialog dataset. Results show that our method has superior performance in learning dialog models that rapidly adapt their behavior to new domains and suggests promising future research. 1
translated by 谷歌翻译
Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results. However, most current systems rely on restricted forms of user feedback, such as binary relevance responses, or feedback based on a fixed set of relative attributes, which limits their impact. In this paper, we introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. We formulate the task of dialog-based interactive image retrieval as a reinforcement learning problem, and reward the dialog system for improving the rank of the target image during each dialog turn. To mitigate the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train our system with a user simulator, which is itself trained to describe the differences between target and candidate images. The efficacy of our approach is demonstrated in a footwear retrieval application. Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.
translated by 谷歌翻译
序列到序列(seq2seq)模型在建立端到端可训练对话系统方面已经变得非常流行。尽管高效地研究人机通信的骨干,但它们仍然存在强烈支持短通用响应的问题。在本文中,我们认为良好的反应应该顺利地连接前面的对话历史和随后的对话。我们通过互信息最大化来加强这种联系。为了回避离散自然语言标记的不可区分性,我们引入了一个辅助连续代码空间,并将这样的代码空间映射到可生成的先前分布以用于生成目的。两个对话数据集上的实验验证了我们模型的有效性,其中生成的响应与dialogcontext密切相关,并导致更多交互式对话。
translated by 谷歌翻译
End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.
translated by 谷歌翻译
我们提出ConvLab,一个开源的多域端到端对话系统平台,使研究人员能够使用可重复使用的组件快速建立实验,并比较大量不同的方法,从传统的管道系统到端到端的神经模型,在通用环境中。 ConvLab提供一组完全注释的数据集和相关的预先训练的参考模型。作为展示,我们使用用户对话框扩展MultiWOZ数据集注释以训练所有组件模型并演示如何使ConvLab轻松,轻松地在多域端到端对话框设置中进行复杂实验。
translated by 谷歌翻译
Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. We collect a large corpus of Twitter conversations that include emojis in the response and assume the emojis convey the underlying emotions of the sentence. We investigate several conditional variational autoencoders training on these conversations , which allow us to use emojis to control the emotion of the generated text. Experimentally , we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions.
translated by 谷歌翻译
许多视觉语言任务可以简化为自然语言输出的序列预测问题。特别是,图像处理的最新进展使用深度强化学习(RL)来减轻训练期间的“暴露偏差”:在每个预测中暴露出地面真实子序列,这在仅预测到的子序列时引入了测试中的偏差。然而,现有的基于RL的图像字幕方法仅关注语言策略而不关注视觉策略(例如,视觉注意),因此难以捕获对于诸如视觉关系之类的组成推理(例如,“骑马的人”)至关重要的视觉上下文。 )和比较(例如,“小猫”)。为填补空白,我们为序列级图像字幕提出了一个Context-Aware Visual Policynetwork(CAVP)。在每个时间步,CAVP明确地将先前的视觉注意力作为上下文考虑,然后确定上下文是否有助于当前的单词生成给予当前的视觉注意。与仅在每一步修复单个图像区域的传统视觉注意相比,CAVP可以随时间关注复杂的视觉合成。整个图像字幕模型--- CAVP和随后的语言策略网络---可以通过使用关于任何标题评估度量的行为者 - 评论者策略梯度方法来有效地优化结束。我们使用各种指标和定性视觉上下文的合理可视化,通过MS-COCO离线分离和在线服务器上的最新技术演示证明了CAVP的有效性。该代码可在https://github.com/daqingliu/CAVP获得
translated by 谷歌翻译
我们解决了学习分层深度神经网络策略以加强学习的问题。与明确限制层次结构的较低层以强制它们使用更高级别调制信号的方法相比,我们框架中的每个层都经过训练以直接解决任务,但通过最大的熵增强学习目标获得了一系列不同的策略。每个层也使用潜在随机变量进行增强,这些变量在该层的训练期间从先前分布中采样。最大熵目标导致这些潜变量被合并到层的策略中,并且较高层级层可以通过该后期空间直接控制较低层的行为。此外,通过将潜在变量到动作的映射约束为可逆的,较高层保持完全表达性:较高层和较低层都不受其行为的约束。 Ourexperimental评估表明,我们可以通过添加额外的层来改进单层策略在标准基准任务上的性能,并且我们的方法可以通过在高熵技能优化的低熵技术之上学习更高级别的策略来解决更复杂的稀疏奖励任务级目标。
translated by 谷歌翻译
在对话框中,任何时候都可以有多个有效的下一个话语。目前用于对话的端到端神经方法没有考虑到这一点。他们假设在任何时候只有一个正确的下一个假设。在这项工作中,我们在面向目标的对话框中专注于这个问题,其中有不同的路径来达到目标​​。我们提出了一种新方法,它使用监督学习和强化学习方法的组合来解决这个问题。我们还提出了一个新的,更有效的testbed,permuted-bAbI对话任务,通过在原始-bAbI对话任务中引入多个有效的下一个话语,这允许在更真实的环境中评估面向对象的对话系统。我们表明,现有端到端神经方法的性能显着下降,从原始-BAbI对话任务的每对话准确率为81.5%,对于超额-bAbI对话任务为30.3%。我们还表明,我们提出的方法提高了性能,并且在置换-bAbI对话框中实现了每对话精度47.3%。
translated by 谷歌翻译