The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon vari-ational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover inter-pretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation. 1
translated by 谷歌翻译
While recent neural encoder-decoder models have shown great promise in mod-eling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy de-coders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making. 1
translated by 谷歌翻译
Developing a dialogue agent that is capable of making autonomous decisions and communicating by natural language is one of the long-term goals of machine learning research. Traditional approaches either rely on hand-crafting a small state-action set for applying reinforcement learning that is not scalable or constructing determin-istic models for learning dialogue sentences that fail to capture natural conversational variability. In this paper, we propose a Latent Intention Dialogue Model (LIDM) that employs a discrete latent variable to learn underlying dialogue intentions in the framework of neural variational inference. In a goal-oriented dialogue scenario, these latent intentions can be interpreted as actions guiding the generation of machine responses, which can be further refined autonomously by reinforcement learning. The experimental evaluation of LIDM shows that the model out-performs published benchmarks for both corpus-based and human evaluation, demonstrating the effectiveness of discrete latent variable models for learning goal-oriented dialogues.
translated by 谷歌翻译
最近对话生成的神经模型为会话代理生成响应提供了很大的希望,但往往是短视的,一次预测一个话语而忽略它们对未来结果的影响。对未来的对话方向进行建模对于产生连贯,有趣的对话至关重要,这种对话导致了对话的传统NLP模型利用强化学习。在本文中,我们展示了如何整合这些目标,应用深度强化学习来模拟聊天机器人对话中的未来向前。该模型模拟两个虚拟试验者之间的对话,使用政策梯度方法奖励显示三个有用会话属性的序列:信息性(非重复性转向),连贯性和易于回答(与前瞻性功能相关)。我们在多样性,长度以及人类评判方面评估我们的模型,表明所提出的算法产生更多的交互式响应并管理在对话模拟中更持久的对话。这项工作标志着在对话的长期成功的基础上学习神经对话模型的第一步。
translated by 谷歌翻译
This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simulator. Results show that the proposed method out-performs the modular-based baseline and learns a distributed representation of the latent dialog state.
translated by 谷歌翻译
我们介绍了基于端到端神经网络的模型,用于模拟面向任务的对话系统的用户。对话系统中的用户模拟从两个不同的角度来看是至关重要的:(i)自动评估不同的对话模型,以及(ii)培训面向任务的对话系统。我们设计了分层序列到序列模型,它首先使用Recurrent NeuralNetworks(RNN)对初始用户目标进行编码,并将系统变成固定长度表示。然后,它使用另一个RNN层对对话历史进行编码。在每个回合中,用户响应从对话级RNN的隐藏表示中解码。这种分层用户模拟器(HUS)方法允许模型捕获用户目标的未发现部分,而不需要明确的对话状态跟踪。我们通过利用潜变量模型进一步开发若干变体,以将随机变化注入用户响应以促进模拟用户响应的多样性,以及新颖的目标规则化机制,以惩罚用户响应与初始用户目标的差异。我们通过系统地将每个用户模拟器与用不同目标和用户训练的各种对话系统策略进行交互来评估电影票预订域上的建议模型。
translated by 谷歌翻译
随着计算机视觉算法从像素的被动分析转向基于语义的主动推理,需要推理的信息算法的广度已经显着扩展。这方面的关键挑战之一是能够识别做出决策所需的信息,并选择将恢复此信息的行动。我们提出了一种强化学习方法,它保持对其内部信息的分布,明确地表示其知道和需要知道的模糊性,以实现其目标。然后根据从该分布中采样的粒子生成潜在的动作。对于每个潜在的动作,计算预期答案的分布,并且与现有的内部信息相比,获得所获得的信息的值。我们证明了这种方法适用于两个视觉语言问题,这些问题最近引起了人们的兴趣,视觉对话和视觉查询生成。在这两种情况下,该方法都会积极选择最能降低其内部不确定性的行动,并且在实现挑战目标方面优于其竞争对手。
translated by 谷歌翻译
在本文中,我们探讨了深度神经网络在自然语言生成中的应用。具体来说,我们实现了两个序列到序列的神经变分模型 - 变分自动编码器(VAE)和变量编码器 - 解码器(VED)。用于文本生成的VAE难以训练,因为与损失函数的Kullback-Leibler(KL)发散项相关的问题消失为零。我们通过实施优化启发式(例如KL权重退火和字丢失)成功地训练VAE。我们还通过随机采样,线性插值和来自输入的邻域的采样来证明这种连续潜在空间的有效性。我们认为,如果VAE的设计不合适,可能会导致绕过连接,导致在训练期间忽略后期空间。我们通过实验证明了解码器隐藏状态初始化的例子,这种绕过连接将VAE降级为确定性模型,从而减少了生成的句子的多样性。我们发现传统的注意机制使用序列序列VED模型作为旁路连接,从而改进了模型的潜在空间。为了避免这个问题,我们提出了变分注意机制,其中关注上下文向量被建模为可以从分布中采样的随机变量。 Weshow凭经验使用自动评估指标,即熵和不同测量指标,我们的变分注意模型产生比确定性注意模型更多样化的输出句子。通过人类评估研究进行的定性分析证明,我们的模型同时产生的质量高,并且与确定性的注意力对应物产生的质量一样流畅。
translated by 谷歌翻译
This paper introduces zero-shot dialog generation (ZSDG), as a step towards neu-ral dialog systems that can instantly generalize to new situations with minimal data. ZSDG enables an end-to-end generative dialog system to generalize to a new domain for which only a domain description is provided and no training dialogs are available. Then a novel learning framework , Action Matching, is proposed. This algorithm can learn a cross-domain embedding space that models the semantics of dialog responses which, in turn, lets a neural dialog generation model generalize to new domains. We evaluate our methods on a new synthetic dialog dataset, and an existing human-human dialog dataset. Results show that our method has superior performance in learning dialog models that rapidly adapt their behavior to new domains and suggests promising future research. 1
translated by 谷歌翻译
Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results. However, most current systems rely on restricted forms of user feedback, such as binary relevance responses, or feedback based on a fixed set of relative attributes, which limits their impact. In this paper, we introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. We formulate the task of dialog-based interactive image retrieval as a reinforcement learning problem, and reward the dialog system for improving the rank of the target image during each dialog turn. To mitigate the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train our system with a user simulator, which is itself trained to describe the differences between target and candidate images. The efficacy of our approach is demonstrated in a footwear retrieval application. Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.
translated by 谷歌翻译
Generative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by in-terleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users.
translated by 谷歌翻译
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other's reward functions must reach an agreement (or a deal) via natural language dialogue. For the first time, we show it is possible to train end-to-end models for negotiation , which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance. Our code and dataset are publicly available. 1
translated by 谷歌翻译
最近,已经提出了几种基于深度学习的模型,用于对话的端到端学习。虽然可以从数据中训练这些模型而无需任何额外的注释,但很难解释它们。另一方面,存在传统的基于状态的对话系统,其中对话的状态是离散的并且因此易于解释。但是,这些状态需要在数据中手工制作和注释。为了实现两个世界的最佳效果,我们提出潜在状态跟踪网络(LSTN),使用该网络以无人监督的方式学习可解释的模型。该模型在对话的每个回合定义了一个离散的潜变量,它可以采用一组有限的值。由于训练数据中不存在这些离散变量,因此我们使用EM算法以无人监督的方式训练我们的模型。在实验中,我们表明LSTN可以帮助实现对话模型的可解释性,而不会像端到端方法那样降低性能。
translated by 谷歌翻译
在对话框中,任何时候都可以有多个有效的下一个话语。目前用于对话的端到端神经方法没有考虑到这一点。他们假设在任何时候只有一个正确的下一个假设。在这项工作中,我们在面向目标的对话框中专注于这个问题,其中有不同的路径来达到目标​​。我们提出了一种新方法,它使用监督学习和强化学习方法的组合来解决这个问题。我们还提出了一个新的,更有效的testbed,permuted-bAbI对话任务,通过在原始-bAbI对话任务中引入多个有效的下一个话语,这允许在更真实的环境中评估面向对象的对话系统。我们表明,现有端到端神经方法的性能显着下降,从原始-BAbI对话任务的每对话准确率为81.5%,对于超额-bAbI对话任务为30.3%。我们还表明,我们提出的方法提高了性能,并且在置换-bAbI对话框中实现了每对话精度47.3%。
translated by 谷歌翻译
序列到序列(seq2seq)模型在建立端到端可训练对话系统方面已经变得非常流行。尽管高效地研究人机通信的骨干,但它们仍然存在强烈支持短通用响应的问题。在本文中,我们认为良好的反应应该顺利地连接前面的对话历史和随后的对话。我们通过互信息最大化来加强这种联系。为了回避离散自然语言标记的不可区分性,我们引入了一个辅助连续代码空间,并将这样的代码空间映射到可生成的先前分布以用于生成目的。两个对话数据集上的实验验证了我们模型的有效性,其中生成的响应与dialogcontext密切相关,并导致更多交互式对话。
translated by 谷歌翻译
我们考虑谈判设置,其中两个代理人使用自然语言对商品进行交易。代理商需要决定高级策略(例如,提出\ $ 50)和该策略的执行(例如,生成“全新的自行车。仅售50美元。”)。最近关于谈判训练神经模型的工作,但它们的端到端性质使得难以控制第一类,并且强化学习倾向于导致退化的解决方案。在本文中,我们提出了一种基于粗略对话行为的模块化方法(例如,建议(价格= 50)),将战略和生成分离。我们表明,我们可以使用有监督的学习,强化学习或领域特定的知识灵活地设置策略,而不会简并,而我们基于检索的生成可以保持情境意识并产生多样性。我们在最近提出的DEALORNODEAL游戏中测试我们的方法,并且我们还在Craigslist上收集基于真实项目的更丰富的数据集。人道评估表明,与以前的方法相比,我们的系统实现了更高的任务成功率和更多人类的谈判行为。
translated by 谷歌翻译
现有的关于机器人导航视觉和语言基础的研究重点是改进合成环境中的无模型深层强化学习(DRL)模型。然而,无模型DRL模型并不考虑现实世界环境中的动态,并且它们经常无法扩展到新场景。在本文中,我们采取了一种激进的方法来解决合成研究与现实世界实践之间的差距---我们提出了一种先进的,有计划的混合强化学习模型,该模型结合了无模型和基于模型的强化学习来解决现实世界的观点 - 语言导航任务。我们的前瞻模块将预先确认的策略模型与预测下一个状态和奖励的环境模型紧密集成。实验结果表明,我们提出的方法明显优于基线,并在真实的房间到房间数据集上实现了最佳。此外,我们的可扩展方法更易于转移到看不见的环境。
translated by 谷歌翻译
视觉语言导航(VLN)是导航具体代理以在真实3D环境中执行自然语言指令的任务。在本文中,我们研究如何解决这项任务的三个关键挑战:跨模态基础,不适定反馈和泛化问题。首先,我们提出了一种新的加强交叉模态匹配(RCM)方法来实现交叉模态通过强化学习(RL)在本地和全球接地。特别地,匹配评论者用于提供内在的向前以促进指令和轨迹之间的全局匹配,并且推理导航器用于在局部视觉场景中执行跨模态接地。对VLN基准数据集的评估表明,我们的RCM模型在SPL上显着优于现有方法10%,并实现了最新的性能。为了提高学术政策的普遍性,我们进一步引入了自我监督模仿学习(SIL)方法,通过模仿自己过去的好决策来探索看不见的环境。我们证明SIL可以接近更好,更有效的政策,这极大地降低了看到和未看到的环境之间的成功率差距(从30.7%到11.7%)。
translated by 谷歌翻译
End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.
translated by 谷歌翻译
End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-to-end approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. In addition, HCNs can be optimized with supervised learning, reinforcement learning, or a mixture of both. HCNs attain state-of-the-art performance on the bAbI dialog dataset (Bordes and Weston, 2016), and outperform two commercially deployed customer-facing dialog systems.
translated by 谷歌翻译