Diverse data formats and ontologies of task-oriented dialogue (TOD) datasets hinder us from developing general dialogue models that perform well on many datasets and studying knowledge transfer between datasets. To address this issue, we present ConvLab-3, a flexible dialogue system toolkit based on a unified TOD data format. In ConvLab-3, different datasets are transformed into one unified format and loaded by models in the same way. As a result, the cost of adapting a new model or dataset is significantly reduced. Compared to the previous releases of ConvLab (Lee et al., 2019b; Zhu et al., 2020b), ConvLab-3 allows developing dialogue systems with much more datasets and enhances the utility of the reinforcement learning (RL) toolkit for dialogue policies. To showcase the use of ConvLab-3 and inspire future work, we present a comprehensive study with various settings. We show the benefit of pre-training on other datasets for few-shot fine-tuning and RL, and encourage evaluating policy with diverse user simulators.
translated by 谷歌翻译
用户模拟器(USS)通常用于通过增强学习训练面向任务的对话系统(DSS)。相互作用通常是在语义层面上以提高效率的,但是从语义动作到自然语言仍然存在差距,这会导致培训和部署环境之间的不匹配。在培训期间,将自然语言生成(NLG)模块与USS结合在一起可以部分解决此问题。但是,由于US的策略和NLG是单独优化的,因此在给定的情况下,这些模拟的用户话语可能不够自然。在这项工作中,我们提出了一个基于生成变压器的用户模拟器(Gentus)。 Gentus由编码器结构组成,这意味着它可以共同优化用户策略和自然语言。 Gentus既产生语义动作又产生自然语言话语,从而保留了解释性和增强语言的变化。另外,通过将输入和输出表示为单词序列以及使用大型的预训练语言模型,我们可以在功能表示中实现普遍性。我们通过自动指标和人类评估评估绅士。我们的结果表明,绅士会产生更多的自然语言,并能够以零拍的方式转移到看不见的本体论中。此外,通过加强学习为培训专业用户模拟器打开大门,可以进一步塑造其行为。
translated by 谷歌翻译
在与用户进行交流时,以任务为导向的对话系统必须根据对话历史记录在每个回合时跟踪用户的需求。这个称为对话状态跟踪(DST)的过程至关重要,因为它直接告知下游对话政策。近年来,DST引起了很大的兴趣,文本到文本范式作为受欢迎的方法。在本评论论文中,我们首先介绍任务及其相关的数据集。然后,考虑到最近出版的大量出版物,我们确定了2021 - 2022年研究的重点和研究进展。尽管神经方法已经取得了重大进展,但我们认为对话系统(例如概括性)的某些关键方面仍未得到充实。为了激励未来的研究,我们提出了几种研究途径。
translated by 谷歌翻译
以任务为导向的对话系统(TDSS)主要在离线设置或人类评估中评估。评估通常仅限于单转或非常耗时。作为替代方案,模拟用户行为的用户模拟器使我们能够考虑一组广泛的用户目标,以生成类似人类的对话以进行模拟评估。使用现有的用户模拟器来评估TDSS是具有挑战性的,因为用户模拟器主要旨在优化TDSS的对话策略,并且评估功能有限。此外,对用户模拟器的评估是一个开放的挑战。在这项工作中,我们提出了一个用于端到端TDS评估的隐喻用户模拟器,如果它在与系统的交互中模拟用户的类似思维,则定义模拟器是隐喻的。我们还提出了一个基于测试人员的评估框架,以生成变体,即具有不同功能的对话系统。我们的用户模拟器构建了一个隐喻的用户模型,该模型通过参考遇到新项目时的先验知识来帮助模拟器进行推理。我们通过检查模拟器与变体之间的模拟相互作用来估计模拟器的质量。我们的实验是使用三个TDS数据集进行的。与基于议程的模拟器和三个数据集上的SEQ2SEQ模型相比,隐喻用户模拟器与手动评估的一致性更好。我们的测试人员框架展示了效率,并且可以更好地概括和可扩展性,因为它可以适用于多个域中的对话和多个任务,例如对话建议和电子商务对话。
translated by 谷歌翻译
许多研究提出了通过使用强化学习在系统中的共同训练模块来优化整个管道任务对话系统对话性能的方法。但是,这些方法受到限制,因为它们只能应用于使用可训练的神经方法实施的模块。为了解决此问题,我们提出了一种方法,以优化由使用任意方法进行对话性能的模块组成的管道系统。使用我们的方法,在此系统中安装了称为后处理网络(PPN)的基于神经的组件(PPN),以后处理每个模块的输出。所有PPN均已更新,以通过使用强化学习来提高系统的整体对话性能,而不必每个模块可区分。通过对MultiWoz数据集的对话模拟和人类评估,我们表明我们的方法可以改善由各种模块组成的管道系统的对话性能。
translated by 谷歌翻译
Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.
translated by 谷歌翻译
在本文中,我们建议将面向任务导向的对话系统作为纯粹的自然语言生成任务,以便充分利用像GPT-2这样的大规模预训练模型,并简化了复杂的光学化预备。然而,直接应用这种方法严重遭受了通过删除了替代令牌而导致的对话实体不一致,以及在微调期间灾害模型的灾难性遗忘问题,导致表现不令人满意。为了缓解这些问题,我们设计了一种新颖的GPT-Adapter-CopyNet网络,它将轻量级适配器和CopyNet模块包含到GPT-2中,以实现转移学习和对话实体生成的更好性能。在DSTC8轨道1基准和多种数据集上进行的实验结果表明,我们的建议方法显着优于基线模型,在自动和人类评估中具有显着性能。
translated by 谷歌翻译
Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.
translated by 谷歌翻译
如何有效地构建和使用对话数据,以及如何在不同域中在不同域中部署模型可能是建立面向任务的对话系统的两个关键问题。在本文中,我们提出了一种新颖的手动指导对话方案,以减轻这些问题,在该方案中,代理商从对话和手册中学习任务。该手册是一个非结构化的文本文档,可指导代理在对话过程中与用户和数据库进行交互。我们提出的方案降低了对话模型对细粒领域本体的依赖性,并使它们更灵活以适应各种领域。然后,我们为完全注销的多域数据集Magdial贡献以支持我们的方案。它介绍了三个对话建模子任务:指令匹配,参数填充和响应生成。对这些子任务进行建模与人类代理的行为模式一致。实验表明,手动引导对话方案提高了构建对话系统中的数据效率和域可伸缩性。数据集和基准将公开用于促进未来的研究。
translated by 谷歌翻译
本文介绍了端到端以任务为导向的对话(TOD)的本体学预验证的语言模型(OPAL)。与Chit-Chat对话模型不同,面向任务的对话模型至少满足两个特定于任务的模块:对话状态跟踪器(DST)和响应生成器(RG)。对话状态由域插槽值三元组成,它们被认为是用户搜索与域相关数据库的约束。带有带注释的对话状态的大规模面向任务的对话数据通常是无法访问的。它可以防止针对任务对话的审慎语言模型的开发。我们提出了一种简单而有效的预处理方法来减轻此问题,该方法由两个预审进阶段组成。第一阶段是在大规模上下文文本数据上预处理,其中文本的结构化信息是由信息提取工具提取的。为了弥合训练方法和下游任务之间的差距,我们设计了两个预训练的任务:类似于本体的三重恢复和下一文本生成,分别模拟了DST和RG。第二阶段是在TOD数据上微调验证的模型。实验结果表明,即使没有CAMREST676和MULTIWOZ基准的任何TOD数据,我们提出的方法即使没有任何TOD数据,我们提出的方法也可以提高竞争性能。
translated by 谷歌翻译
当在现实世界中以任务为导向的对话系统中实现自然语言(NLG)组件时,不仅需要在训练数据上学习的自然话语,而且还需要适应对话环境(例如,环境中的噪音)听起来)和用户(例如,理解能力水平较低的用户)。受到语言生成任务的强化学习(RL)的最新进展的启发,我们提出了Antor,这是一种通过强化学习来适应以任务为导向对话的自然语言生成的方法。在Antor中,与用户对系统话语的理解相对应的自然语言理解(NLU)模块已纳入RL的目标函数中。如果将NLG的意图正确传达给了NLU,该意图理解了系统的话语,则NLG将获得积极的回报。我们在Multiwoz数据集上进行了实验,并确认Antor可以对语音识别错误和用户的不同词汇水平产生适应性话语。
translated by 谷歌翻译
面向任务导向的对话系统已经受到获得大规模和高质量的注释对话的困难困扰。此外,大多数公开的数据集仅包括书面对话,这不足以反映实际口头对话系统中的实际人类行为。在本文中,我们提出了面向任务的对话数据增强(TOD-DA),这是一种新型模型 - 不可知的数据增强范例,以提高面向任务对话建模的鲁棒性。 TOD-DA由两个模块组成:1)对话丰富,以扩展关于易于执行数据稀疏性的任务对话的培训数据,用于宽松数据稀疏性和2)口语对话模拟器,以模仿各种粒度的口语样式表达和语音识别错误,以弥合书面之间的差距和口头对话。通过这样的设计,我们的方法在DSTC10 Track2的两个任务中排名第一,这是针对口语对话的任务对话建模的基准,展示了我们提出的TOD-DA的优势和有效性。
translated by 谷歌翻译
与具有粗粒度信息的Crosswoz(中文)和多发性(英文)数据集相比,没有数据集,可以正确处理细粒度和分层级别信息。在本文中,我们在香港发布了一份粤语知识驱动的对话数据集(KDDRES),将多转谈话中的信息放在一个特定的餐厅。我们的语料库包含0.8k次谈话,它来自10家餐厅,提供不同地区的各种风格。除此之外,我们还设计了细粒度的插槽和意图,以更好地捕获语义信息。基准实验和数据统计分析显示了我们数据集的多样性和丰富的注释。我们认为,KDDRE的出版可以是当前对话数据集的必要补充,以及社会中小企业(中小企业)更适合和更有价值,如为每家餐馆建立定制的对话系统。语料库和基准模型是公开可用的。
translated by 谷歌翻译
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
translated by 谷歌翻译
We introduce BotSIM, a modular, open-source Bot SIMulation environment with dialog generation, user simulation and conversation analytics capabilities. BotSIM aims to serve as a one-stop solution for large-scale data-efficient end-to-end evaluation, diagnosis and remediation of commercial task-oriented dialog (TOD) systems to significantly accelerate commercial bot development and evaluation, reduce cost and time-to-market. BotSIM adopts a layered design comprising the infrastructure layer, the adaptor layer and the application layer. The infrastructure layer hosts key models and components to support BotSIM's major functionalities via a streamlined "generation-simulation-remediation" pipeline. The adaptor layer is used to extend BotSIM to accommodate new bot platforms. The application layer provides a suite of command line tools and a Web App to significantly lower the entry barrier for BotSIM users such as bot admins or practitioners. In this report, we focus on the technical designs of various system components. A detailed case study using Einstein BotBuilder is also presented to show how to apply BotSIM pipeline for bot evaluation and remediation. The detailed system descriptions can be found in our system demo paper. The toolkit is available at: https://github.com/salesforce/BotSIM .
translated by 谷歌翻译
最近,培训预培训方法在以任务为导向的对话框(TOD)系统中表现出了很大的成功。但是,大多数现有的预培训模型用于TOD专注于对话的理解或对话生成,但并非两者兼而有之。在本文中,我们提出了Space-3,这是一种新型的统一的半监督预培训的预训练的对话模型,从大规模对话CORPORA中学习有限的注释,可以有效地对广泛的下游对话任务进行微调。具体而言,Space-3由单个变压器中的四个连续组件组成,以维护TOD系统中的任务流:(i)对话框编码模块编码对话框历史记录,(ii)对话框理解模块以从任一用户中提取语义向量查询或系统响应,(iii)一个对话框策略模块,以生成包含响应高级语义的策略向量,以及(iv)对话框生成模块以产生适当的响应。我们为每个组件设计一个专门的预训练目标。具体而言,我们预先培训对话框编码模块,使用跨度掩码语言建模,以学习上下文化对话框信息。为了捕获“结构化对话框”语义,我们通过额外的对话注释通过新颖的树诱导的半监视对比度学习目标来预先培训对话框理解模块。此外,我们通过将其输出策略向量与响应响应的语义向量之间的L2距离最小化以进行策略优化,从而预先培训对话策略模块。最后,对话框生成模型由语言建模预先训练。结果表明,Space-3在八个下游对话框基准中实现最新性能,包括意图预测,对话框状态跟踪和端到端对话框建模。我们还表明,在低资源设置下,Space-3比现有模型具有更强的射击能力。
translated by 谷歌翻译
Dialogue State Tracking (DST), a key component of task-oriented conversation systems, represents user intentions by determining the values of pre-defined slots in an ongoing dialogue. Existing approaches use hand-crafted templates and additional slot information to fine-tune and prompt large pre-trained language models and elicit slot values from the dialogue context. Significant manual effort and domain knowledge is required to design effective prompts, limiting the generalizability of these approaches to new domains and tasks. In this work, we propose DiSTRICT, a generalizable in-context tuning approach for DST that retrieves highly relevant training examples for a given dialogue to fine-tune the model without any hand-crafted templates. Experiments with the MultiWOZ benchmark datasets show that DiSTRICT outperforms existing approaches in various zero-shot and few-shot settings using a much smaller model, thereby providing an important advantage for real-world deployments that often have limited resource availability.
translated by 谷歌翻译
最近,通过“向导”模拟游戏收集了一类以任务为导向的对话(TOD)数据集。但是,《巫师》数据实际上是模拟的数据,因此与现实生活中的对话根本不同,这些对话更加嘈杂和随意。最近,Seretod挑战赛是组织的,并发布了Mobilecs数据集,该数据集由来自中国移动的真实用户和客户服务人员之间的真实世界对话框组成。基于Mobilecs数据集,Seretod挑战具有两个任务,不仅评估了对话系统本身的构建,而且还检查了对话框成绩单中的信息提取,这对于建立TOD的知识库至关重要。本文主要介绍了Mobilecs数据集对这两项任务的基线研究。我们介绍了如何构建两个基线,遇到的问题以及结果。我们预计基线可以促进令人兴奋的未来研究,以建立针对现实生活任务的人类机器人对话系统。
translated by 谷歌翻译
Incorporating external knowledge into the response generation process is essential to building more helpful and reliable dialog agents. However, collecting knowledge-grounded conversations is often costly, calling for a better pre-trained model for grounded dialog generation that generalizes well w.r.t. different types of knowledge. In this work, we propose KPT (Keyword-guided Pre-Training), a novel self-supervised pre-training method for grounded dialog generation without relying on extra knowledge annotation. Specifically, we use a pre-trained language model to extract the most uncertain tokens in the dialog as keywords. With these keywords, we construct two kinds of knowledge and pre-train a knowledge-grounded response generation model, aiming at handling two different scenarios: (1) the knowledge should be faithfully grounded; (2) it can be selectively used. For the former, the grounding knowledge consists of keywords extracted from the response. For the latter, the grounding knowledge is additionally augmented with keywords extracted from other utterances in the same dialog. Since the knowledge is extracted from the dialog itself, KPT can be easily performed on a large volume and variety of dialogue data. We considered three data sources (open-domain, task-oriented, conversational QA) with a total of 2.5M dialogues. We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages. Our comprehensive experiments and analyses demonstrate that KPT consistently outperforms state-of-the-art methods on these tasks with diverse grounding knowledge.
translated by 谷歌翻译
以任务为导向的对话系统通常采用对话状态跟踪器(DST)成功完成对话。最近的最新DST实现依赖于各种服务的模式来改善模型的鲁棒性并处理对新域的零击概括[1],但是这种方法[2,3]通常需要多个大型变压器模型和长时间输入序列以表现良好。我们提出了一个基于多任务BERT的单个模型,该模型共同解决了意图预测的三个DST任务,请求的插槽预测和插槽填充。此外,我们提出了对对话历史和服务模式的高效和简约编码,该编码被证明可以进一步提高性能。对SGD数据集的评估表明,我们的方法的表现优于基线SGP-DST,比最新的方法相比表现良好,同时在计算上的效率更高。进行了广泛的消融研究,以检查我们模型成功的促成因素。
translated by 谷歌翻译