We present BotSIM, a data-efficient end-to-end Bot SIMulation toolkit for commercial text-based task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM's effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM's "generation-simulation-remediation" paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLi5iSoly30. We have open-sourced the toolkit at https://github.com/salesforce/botsim
translated by 谷歌翻译
We introduce BotSIM, a modular, open-source Bot SIMulation environment with dialog generation, user simulation and conversation analytics capabilities. BotSIM aims to serve as a one-stop solution for large-scale data-efficient end-to-end evaluation, diagnosis and remediation of commercial task-oriented dialog (TOD) systems to significantly accelerate commercial bot development and evaluation, reduce cost and time-to-market. BotSIM adopts a layered design comprising the infrastructure layer, the adaptor layer and the application layer. The infrastructure layer hosts key models and components to support BotSIM's major functionalities via a streamlined "generation-simulation-remediation" pipeline. The adaptor layer is used to extend BotSIM to accommodate new bot platforms. The application layer provides a suite of command line tools and a Web App to significantly lower the entry barrier for BotSIM users such as bot admins or practitioners. In this report, we focus on the technical designs of various system components. A detailed case study using Einstein BotBuilder is also presented to show how to apply BotSIM pipeline for bot evaluation and remediation. The detailed system descriptions can be found in our system demo paper. The toolkit is available at: https://github.com/salesforce/BotSIM .
translated by 谷歌翻译
用户模拟器(USS)通常用于通过增强学习训练面向任务的对话系统(DSS)。相互作用通常是在语义层面上以提高效率的,但是从语义动作到自然语言仍然存在差距,这会导致培训和部署环境之间的不匹配。在培训期间,将自然语言生成(NLG)模块与USS结合在一起可以部分解决此问题。但是,由于US的策略和NLG是单独优化的,因此在给定的情况下,这些模拟的用户话语可能不够自然。在这项工作中,我们提出了一个基于生成变压器的用户模拟器(Gentus)。 Gentus由编码器结构组成,这意味着它可以共同优化用户策略和自然语言。 Gentus既产生语义动作又产生自然语言话语,从而保留了解释性和增强语言的变化。另外,通过将输入和输出表示为单词序列以及使用大型的预训练语言模型,我们可以在功能表示中实现普遍性。我们通过自动指标和人类评估评估绅士。我们的结果表明,绅士会产生更多的自然语言,并能够以零拍的方式转移到看不见的本体论中。此外,通过加强学习为培训专业用户模拟器打开大门,可以进一步塑造其行为。
translated by 谷歌翻译
最近,通过“向导”模拟游戏收集了一类以任务为导向的对话(TOD)数据集。但是,《巫师》数据实际上是模拟的数据,因此与现实生活中的对话根本不同,这些对话更加嘈杂和随意。最近,Seretod挑战赛是组织的,并发布了Mobilecs数据集,该数据集由来自中国移动的真实用户和客户服务人员之间的真实世界对话框组成。基于Mobilecs数据集,Seretod挑战具有两个任务,不仅评估了对话系统本身的构建,而且还检查了对话框成绩单中的信息提取,这对于建立TOD的知识库至关重要。本文主要介绍了Mobilecs数据集对这两项任务的基线研究。我们介绍了如何构建两个基线,遇到的问题以及结果。我们预计基线可以促进令人兴奋的未来研究,以建立针对现实生活任务的人类机器人对话系统。
translated by 谷歌翻译
以任务为导向的对话系统(TDSS)主要在离线设置或人类评估中评估。评估通常仅限于单转或非常耗时。作为替代方案,模拟用户行为的用户模拟器使我们能够考虑一组广泛的用户目标,以生成类似人类的对话以进行模拟评估。使用现有的用户模拟器来评估TDSS是具有挑战性的,因为用户模拟器主要旨在优化TDSS的对话策略,并且评估功能有限。此外,对用户模拟器的评估是一个开放的挑战。在这项工作中,我们提出了一个用于端到端TDS评估的隐喻用户模拟器,如果它在与系统的交互中模拟用户的类似思维,则定义模拟器是隐喻的。我们还提出了一个基于测试人员的评估框架,以生成变体,即具有不同功能的对话系统。我们的用户模拟器构建了一个隐喻的用户模型,该模型通过参考遇到新项目时的先验知识来帮助模拟器进行推理。我们通过检查模拟器与变体之间的模拟相互作用来估计模拟器的质量。我们的实验是使用三个TDS数据集进行的。与基于议程的模拟器和三个数据集上的SEQ2SEQ模型相比,隐喻用户模拟器与手动评估的一致性更好。我们的测试人员框架展示了效率,并且可以更好地概括和可扩展性,因为它可以适用于多个域中的对话和多个任务,例如对话建议和电子商务对话。
translated by 谷歌翻译
Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.
translated by 谷歌翻译
本文介绍了为法律领域设计域特定的对话代理面临的挑战的关键原则和解决方案。它包括范围,平台,架构和输入数据的准备问题。它提供回答用户查询和记录用户信息,包括联系人详细信息和与案例相关信息的功能。它利用亚马逊Web服务(AWS)Lex后建立的深度学习技术与AWS Lambda相结合。由于缺乏公开的数据,我们确定了两种方法,包括众包实验和存档的查询,以制定许多语言资源。这包括训练数据集,对话代理的一组预定响应,一组回归测试用例和进一步的对话测试集。我们提出了一种分层BOT结构,便于多级别委派并在回归测试集上报告模型准确性。此外,我们突出显示添加到BOT的功能,以改善对话流程和整体用户体验。
translated by 谷歌翻译
Diverse data formats and ontologies of task-oriented dialogue (TOD) datasets hinder us from developing general dialogue models that perform well on many datasets and studying knowledge transfer between datasets. To address this issue, we present ConvLab-3, a flexible dialogue system toolkit based on a unified TOD data format. In ConvLab-3, different datasets are transformed into one unified format and loaded by models in the same way. As a result, the cost of adapting a new model or dataset is significantly reduced. Compared to the previous releases of ConvLab (Lee et al., 2019b; Zhu et al., 2020b), ConvLab-3 allows developing dialogue systems with much more datasets and enhances the utility of the reinforcement learning (RL) toolkit for dialogue policies. To showcase the use of ConvLab-3 and inspire future work, we present a comprehensive study with various settings. We show the benefit of pre-training on other datasets for few-shot fine-tuning and RL, and encourage evaluating policy with diverse user simulators.
translated by 谷歌翻译
这项工作提出了一个新的对话数据集,即cookdial,该数据集促进了对任务知识了解的面向任务的对话系统的研究。该语料库包含260个以人类对任务为导向的对话框,其中代理给出了配方文档,指导用户烹饪菜肴。 Cookdial中的对话框展示了两个独特的功能:(i)对话流与支持文档之间的程序对齐; (ii)复杂的代理决策涉及分割长句子,解释硬说明并在对话框上下文中解决核心。此外,我们在假定的面向任务的对话框系统中确定了三个具有挑战性的(子)任务:(1)用户问题理解,(2)代理操作框架预测和(3)代理响应生成。对于这些任务中的每一个,我们都会开发一个神经基线模型,我们在cookdial数据集上进行了评估。我们公开发布烹饪数据集,包括对话框和食谱文档的丰富注释,以刺激对特定于域的文档接地对话框系统的进一步研究。
translated by 谷歌翻译
Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.
translated by 谷歌翻译
在过去的十年中,对对话系统的兴趣已经大大增长。从扩展过程中,也有兴趣开发和改进意图分类和插槽填充模型,这是两个组件,这些组件通常在以任务为导向的对话框系统中使用。此外,良好的评估基准对于帮助比较和分析结合此类模型的系统很重要。不幸的是,该领域的许多文献仅限于对相对较少的基准数据集的分析。为了促进针对任务的对话系统的更强大的分析,我们对意图分类和插槽填充任务进行了公开可用数据集的调查。我们分类每个数据集的重要特征,并就每个数据集的适用性,优势和劣势进行讨论。我们的目标是,这项调查有助于提高这些数据集的可访问性,我们希望它们能够在未来评估意图分类和填充插槽模型中用于以任务为导向的对话框系统。
translated by 谷歌翻译
与EMNLP2022 SERETOD车间共同划分的半监督和增强任务的对话系统的挑战。
translated by 谷歌翻译
Many efforts have been made to construct dialog systems for different types of conversations, such as task-oriented dialog (TOD) and open-domain dialog (ODD). To better mimic human-level conversations that usually fuse various dialog modes, it is essential to build a system that can effectively handle both TOD and ODD and access different knowledge sources. To address the lack of available data for the fused task, we propose a framework for automatically generating dialogues that combine knowledge-grounded ODDs and TODs in various settings. Additionally, we introduce a unified model PivotBot that is capable of appropriately adopting TOD and ODD modes and accessing different knowledge sources in order to effectively tackle the fused task. Evaluation results demonstrate the superior ability of the proposed model to switch seamlessly between TOD and ODD tasks.
translated by 谷歌翻译
我们提出了一个开放域的社交聊天机器人Chirpy Cardinal。为了既有信息又有信息,我们的机器人以一种真实的,情感上的方式与用户聊天。通过将受控的神经产生与脚手架,手写的对话整合在一起,我们让用户和机器人都轮流推动对话,从而产生引人入胜且流利的体验。Chirpy Cardinal部署在Alexa奖Socialbot Grand Challenge的第四次迭代中,每天处理数千次对话,在9个机器人中排名第二,平均用户评级为3.58/5。
translated by 谷歌翻译
End-to-end task bots are typically learned over a static and usually limited-size corpus. However, when deployed in dynamic, changing, and open environments to interact with users, task bots tend to fail when confronted with data that deviate from the training corpus, i.e., out-of-distribution samples. In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations. We propose SL-AGENT, a novel self-learning framework for building end-to-end task bots. SL-AGENT consists of a dialog model and a pre-trained reward model to predict the quality of an agent response. It enables task bots to automatically adapt to changing environments by learning from the unlabeled human-bot dialog logs accumulated after deployment via reinforcement learning with the incorporated reward model. Experimental results on four well-studied dialog tasks show the effectiveness of SL-AGENT to automatically adapt to changing environments, using both automatic and human evaluations. We will release code and data for further research.
translated by 谷歌翻译
体现的代理需要能够在自然语言中互动理解任务描述,并提出适当的后续问题以获取必要的信息,以有效地成功完成各种用户的任务。在这项工作中,我们提出了一组对话框,用于建模此类对话框,并注释教学数据集,其中包括3,000多个位置,以任务为导向的对话(总计包含39.5k个话语),并具有对话框ACT。 Teach-da是对Dialog ACT的第一个大型数据集注释,用于具体任务完成。此外,我们在培训模型中证明了该注释的数据集在标记给定话语的对话框行为中的使用,预测给定对话框历史记录的下一个响应的对话框行为,并使用对话框行为指导代理商的非第二语言行为。特别是,我们对对话记录任务的教学执行执行的实验,该模型预测在体现任务完成环境中要执行的低级操作的顺序,证明对话框行为可以将最终任务成功提高2分,以提高最终任务成功率到没有对话行为的系统。
translated by 谷歌翻译
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
translated by 谷歌翻译
我们在面向任务为导向的对话框(TOD)的端到端学习中提出了一种新问题,其中对话系统模仿故障排除代理,该故障排除代理通过诊断其问题(例如,汽车而未启动)帮助用户。这些对话框基于特定于域的流程图,该代理在对话期间应该遵循代理。我们的任务暴露了神经TOD的新颖技术挑战,例如在没有显式注释的情况下对流程图的话语接地,当用户询问澄清问题时,提及额外的手动页面,以及在测试时间遵循看不见的流程图。我们释放由2,738个对话框组成的数据集(浮雕),该对话框为12个不同的故障排除流程图。我们还设计了一个神经模型,扑腾,它使用检索增强的生成架构来训练对话框。我们的实验发现,Flonet可以对未来的流程图进行零射流传输,并为未来的研究设定强大的基线。
translated by 谷歌翻译
与具有粗粒度信息的Crosswoz(中文)和多发性(英文)数据集相比,没有数据集,可以正确处理细粒度和分层级别信息。在本文中,我们在香港发布了一份粤语知识驱动的对话数据集(KDDRES),将多转谈话中的信息放在一个特定的餐厅。我们的语料库包含0.8k次谈话,它来自10家餐厅,提供不同地区的各种风格。除此之外,我们还设计了细粒度的插槽和意图,以更好地捕获语义信息。基准实验和数据统计分析显示了我们数据集的多样性和丰富的注释。我们认为,KDDRE的出版可以是当前对话数据集的必要补充,以及社会中小企业(中小企业)更适合和更有价值,如为每家餐馆建立定制的对话系统。语料库和基准模型是公开可用的。
translated by 谷歌翻译
Recent advances in neural approaches greatly improve task-oriented dialogue (TOD) systems which assist users to accomplish their goals. However, such systems rely on costly manually labeled dialogs which are not available in practical scenarios. In this paper, we present our models for Track 2 of the SereTOD 2022 challenge, which is the first challenge of building semi-supervised and reinforced TOD systems on a large-scale real-world Chinese TOD dataset MobileCS. We build a knowledge-grounded dialog model to formulate dialog history and local KB as input and predict the system response. And we perform semi-supervised pre-training both on the labeled and unlabeled data. Our system achieves the first place both in the automatic evaluation and human interaction, especially with higher BLEU (+7.64) and Success (+13.6\%) than the second place.
translated by 谷歌翻译