由于面向任务导向的对话系统在我们的生活中越来越受欢迎,提出并探索了更现实的任务。然而,出现了新的实际挑战。例如,由于在现有公共数据集中缺少这种情况,当前对话系统无法在查询数据库时有效处理多个搜索结果。在本文中,我们提出了数据库搜索结果(DSR)歧义,这是一个专注于消除数据库搜索结果的新任务,这通过允许它们从多个选项中选择了多个选项而不是只有一个来增强用户体验。为研究这项任务,我们增强了受到流行的面向任务的对话数据集(Multimoz和SGD),转弯,由(a)通过预定义的语法和(b)为子集收集人类释义的(b)来解析歧义。我们发现,我们的增强对话数据的培训提高了模型处理模糊方案的能力,而不会牺牲未修改的转弯。此外,即使在没有域名数据的情况下,也有助于我们的模型帮助我们的模型提高DSR消歧的性能,表明它可以被学习为普遍对话技能。我们的数据和代码将公开可用。
translated by 谷歌翻译
对话式AI中的现有研究主要将面向任务的对话框(TOD)和问题答案(QA)视为单独的任务。为了构建可以完成用户任务和支持信息寻求信息的对话代理的目标,构建一个可以访问各种外部知识的系统,构建一个处理TOD和QA的系统非常重要。在这项工作中,我们提出了一项新任务,开放式TOD(OB-TOD),将TOD与QA任务相结合,并将外部知识源扩展到包括明确的知识源(例如Web)和隐式知识源(例如,例如,预训练的语言模型)。我们创建了一个新的数据集ob-multiwoz,在这里,我们在其中丰富了Tod会议,并使用类似QA的信息寻求基于外部知识的经验。我们提出了一个统一的模型Opera(开放式末端到端任务对话框),可以适当地访问明确和隐性的外部知识,以解决定义的任务。实验结果表明,与闭环基线相比,Opera的表现出色,并说明了两种知识类型的价值。
translated by 谷歌翻译
这项工作提出了一个新的对话数据集,即cookdial,该数据集促进了对任务知识了解的面向任务的对话系统的研究。该语料库包含260个以人类对任务为导向的对话框,其中代理给出了配方文档,指导用户烹饪菜肴。 Cookdial中的对话框展示了两个独特的功能:(i)对话流与支持文档之间的程序对齐; (ii)复杂的代理决策涉及分割长句子,解释硬说明并在对话框上下文中解决核心。此外,我们在假定的面向任务的对话框系统中确定了三个具有挑战性的(子)任务:(1)用户问题理解,(2)代理操作框架预测和(3)代理响应生成。对于这些任务中的每一个,我们都会开发一个神经基线模型,我们在cookdial数据集上进行了评估。我们公开发布烹饪数据集,包括对话框和食谱文档的丰富注释,以刺激对特定于域的文档接地对话框系统的进一步研究。
translated by 谷歌翻译
Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.
translated by 谷歌翻译
通常观察到的最先进的自然语言技术问题,例如亚马逊alexa和苹果公司,是他们的服务不会因语言障碍而扩展到大多数发展中国家的公民。这种种群因其语言缺乏可用资源来构建NLP产品。本文介绍了allwoz,一个多语言多域面向任务的客户服务对话框数据集覆盖八种语言:英语,普通话,韩语,越南语,印地语,法国,葡萄牙语和泰国。此外,我们通过使用mt5与元学习来创建多语言数据集的基准。
translated by 谷歌翻译
Many efforts have been made to construct dialog systems for different types of conversations, such as task-oriented dialog (TOD) and open-domain dialog (ODD). To better mimic human-level conversations that usually fuse various dialog modes, it is essential to build a system that can effectively handle both TOD and ODD and access different knowledge sources. To address the lack of available data for the fused task, we propose a framework for automatically generating dialogues that combine knowledge-grounded ODDs and TODs in various settings. Additionally, we introduce a unified model PivotBot that is capable of appropriately adopting TOD and ODD modes and accessing different knowledge sources in order to effectively tackle the fused task. Evaluation results demonstrate the superior ability of the proposed model to switch seamlessly between TOD and ODD tasks.
translated by 谷歌翻译
在过去的十年中,对对话系统的兴趣已经大大增长。从扩展过程中,也有兴趣开发和改进意图分类和插槽填充模型,这是两个组件,这些组件通常在以任务为导向的对话框系统中使用。此外,良好的评估基准对于帮助比较和分析结合此类模型的系统很重要。不幸的是,该领域的许多文献仅限于对相对较少的基准数据集的分析。为了促进针对任务的对话系统的更强大的分析,我们对意图分类和插槽填充任务进行了公开可用数据集的调查。我们分类每个数据集的重要特征,并就每个数据集的适用性,优势和劣势进行讨论。我们的目标是,这项调查有助于提高这些数据集的可访问性,我们希望它们能够在未来评估意图分类和填充插槽模型中用于以任务为导向的对话框系统。
translated by 谷歌翻译
One of the biggest challenges of natural language generation (NLG) is the proper handling of named entities. Named entities are a common source of grammar mistakes such as wrong prepositions, wrong article handling, or incorrect entity inflection. Without factoring linguistic representation, such errors are often underrepresented when evaluating on a small set of arbitrarily picked argument values, or when translating a dataset from a linguistically simpler language, like English, to a linguistically complex language, like Russian. However, for some applications, broadly precise grammatical correctness is critical -- native speakers may find entity-related grammar errors silly, jarring, or even offensive. To enable the creation of more linguistically diverse NLG datasets, we release a Corpus of Linguistically Significant Entities (CLSE) annotated by linguist experts. The corpus includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. To demonstrate one possible use of CLSE, we produce an augmented version of the Schema-Guided Dialog Dataset, SGD-CLSE. Using the CLSE's entities and a small number of human translations, we create a linguistically representative NLG evaluation benchmark in three languages: French (high-resource), Marathi (low-resource), and Russian (highly inflected language). We establish quality baselines for neural, template-based, and hybrid NLG systems and discuss the strengths and weaknesses of each approach.
translated by 谷歌翻译
最近,通过“向导”模拟游戏收集了一类以任务为导向的对话(TOD)数据集。但是,《巫师》数据实际上是模拟的数据,因此与现实生活中的对话根本不同,这些对话更加嘈杂和随意。最近,Seretod挑战赛是组织的,并发布了Mobilecs数据集,该数据集由来自中国移动的真实用户和客户服务人员之间的真实世界对话框组成。基于Mobilecs数据集,Seretod挑战具有两个任务,不仅评估了对话系统本身的构建,而且还检查了对话框成绩单中的信息提取,这对于建立TOD的知识库至关重要。本文主要介绍了Mobilecs数据集对这两项任务的基线研究。我们介绍了如何构建两个基线,遇到的问题以及结果。我们预计基线可以促进令人兴奋的未来研究,以建立针对现实生活任务的人类机器人对话系统。
translated by 谷歌翻译
面向任务的对话框(TOD)系统通常需要与外部知识库的互动,以检索必要的实体(例如餐厅)信息以支持响应生成。大多数当前的端到端TOD系统要么明确检索KB信息,要么将其嵌入模型参数中以进行隐式访问。后一种方法显示出更高的灵活性和效率。在这两种方法中,系统都可以通过冲突的实体信息产生响应。为了解决此问题,我们建议先生成实体自动加压,并利用它来指导端到端系统中的响应生成。为了确保实体的一致性,我们对实体生成强加了三位一体的约束。我们还引入了logit串联策略,以促进梯度反向传播进行端到端培训。 Multiwoz 2.1单一和CAMREST的实验表明,我们的系统可以产生更多的高质量和实体一致的响应。
translated by 谷歌翻译
Incorporating external knowledge into the response generation process is essential to building more helpful and reliable dialog agents. However, collecting knowledge-grounded conversations is often costly, calling for a better pre-trained model for grounded dialog generation that generalizes well w.r.t. different types of knowledge. In this work, we propose KPT (Keyword-guided Pre-Training), a novel self-supervised pre-training method for grounded dialog generation without relying on extra knowledge annotation. Specifically, we use a pre-trained language model to extract the most uncertain tokens in the dialog as keywords. With these keywords, we construct two kinds of knowledge and pre-train a knowledge-grounded response generation model, aiming at handling two different scenarios: (1) the knowledge should be faithfully grounded; (2) it can be selectively used. For the former, the grounding knowledge consists of keywords extracted from the response. For the latter, the grounding knowledge is additionally augmented with keywords extracted from other utterances in the same dialog. Since the knowledge is extracted from the dialog itself, KPT can be easily performed on a large volume and variety of dialogue data. We considered three data sources (open-domain, task-oriented, conversational QA) with a total of 2.5M dialogues. We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages. Our comprehensive experiments and analyses demonstrate that KPT consistently outperforms state-of-the-art methods on these tasks with diverse grounding knowledge.
translated by 谷歌翻译
由于人类参与者的参与,收集培训对话系统的数据可能非常昂贵,并且需要广泛的注释。特别是在文档接地的对话系统中,人类专家需要仔细阅读非结构化文件以回答用户的问题。结果,现有的文档接地对话对话数据集相对较小,并且妨碍了对话系统的有效培训。在本文中,我们提出了一种通过生成对话模型在文档上接地的自动数据增强技术。对话模型由用户BOT和代理机器人组成,可以在给定输入文档的情况下合成不同的对话,然后用于训练下游模型。在补充原始数据集时,我们的方法可以实现对传统数据增强方法的显着改进。我们还在低资源环境中实现了良好的性能。
translated by 谷歌翻译
在语言处理的神经方法上的最新进展引发了人们对建立智能开放域聊天机器人的兴趣的复兴。但是,即使是最先进的神经聊天机器人也无法在对话框中每个回合产生令人满意的响应。一个实用的解决方案是为相同上下文生成多个响应候选者,然后执行响应排名/选择以确定哪个候选者是最好的。先前的响应选择中的工作通常使用从现有对话框形成的合成数据来训练响应排名者,通过使用地面真理响应作为单个适当的响应并通过随机选择或使用对抗方法来构建不适当的响应。在这项工作中,我们策划了一个数据集,其中为适当的(正)和不适当(负)手动注释了为相同对话框上下文产生的多个响应发生器的响应。我们认为,这样的培训数据可以更好地匹配实际的用例示例,从而使模型能够有效地对响应进行排名。有了这个新数据集,我们对最先进的响应选择方法进行了系统的评估,并证明,使用多个积极候选者和使用手动验证的硬性负面候选者的两种策略都可以与使用相比,可以带来重大的绩效提高对抗性训练数据,例如,召回@1分别增加了3%和13%。
translated by 谷歌翻译
生成的开放域对话系统可以从外部知识中受益,但是缺乏外部知识资源和寻找相关知识的困难限制了该技术的发展。为此,我们使用动态服务信息提出了一个知识驱动的对话任务。具体而言,我们使用大量的服务API,可以作为外部知识来源提供高覆盖范围和时空敏感性。对话系统生成查询以请求外部服务以及用户信息,获取相关知识,并基于此知识生成响应。为了实现此方法,我们收集并发布了第一个开放式域中国服务知识对话数据集Dusinc。同时,我们构建了一个基线模型柏拉图 - 线,该模型实现了对话的自动利用。自动评估和人类评估都表明,我们提出的新方法可以显着改善开放域对话的效果,并且与对话预培训模型Plato-2相比,人类评估中的会话级总数提高了59.29%。数据集和基准模型将被开源。
translated by 谷歌翻译
End-to-end task bots are typically learned over a static and usually limited-size corpus. However, when deployed in dynamic, changing, and open environments to interact with users, task bots tend to fail when confronted with data that deviate from the training corpus, i.e., out-of-distribution samples. In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations. We propose SL-AGENT, a novel self-learning framework for building end-to-end task bots. SL-AGENT consists of a dialog model and a pre-trained reward model to predict the quality of an agent response. It enables task bots to automatically adapt to changing environments by learning from the unlabeled human-bot dialog logs accumulated after deployment via reinforcement learning with the incorporated reward model. Experimental results on four well-studied dialog tasks show the effectiveness of SL-AGENT to automatically adapt to changing environments, using both automatic and human evaluations. We will release code and data for further research.
translated by 谷歌翻译
以任务为导向的对话系统(TDSS)主要在离线设置或人类评估中评估。评估通常仅限于单转或非常耗时。作为替代方案,模拟用户行为的用户模拟器使我们能够考虑一组广泛的用户目标,以生成类似人类的对话以进行模拟评估。使用现有的用户模拟器来评估TDSS是具有挑战性的,因为用户模拟器主要旨在优化TDSS的对话策略,并且评估功能有限。此外,对用户模拟器的评估是一个开放的挑战。在这项工作中,我们提出了一个用于端到端TDS评估的隐喻用户模拟器,如果它在与系统的交互中模拟用户的类似思维,则定义模拟器是隐喻的。我们还提出了一个基于测试人员的评估框架,以生成变体,即具有不同功能的对话系统。我们的用户模拟器构建了一个隐喻的用户模型,该模型通过参考遇到新项目时的先验知识来帮助模拟器进行推理。我们通过检查模拟器与变体之间的模拟相互作用来估计模拟器的质量。我们的实验是使用三个TDS数据集进行的。与基于议程的模拟器和三个数据集上的SEQ2SEQ模型相比,隐喻用户模拟器与手动评估的一致性更好。我们的测试人员框架展示了效率,并且可以更好地概括和可扩展性,因为它可以适用于多个域中的对话和多个任务,例如对话建议和电子商务对话。
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
我们在面向任务为导向的对话框(TOD)的端到端学习中提出了一种新问题,其中对话系统模仿故障排除代理,该故障排除代理通过诊断其问题(例如,汽车而未启动)帮助用户。这些对话框基于特定于域的流程图,该代理在对话期间应该遵循代理。我们的任务暴露了神经TOD的新颖技术挑战,例如在没有显式注释的情况下对流程图的话语接地,当用户询问澄清问题时,提及额外的手动页面,以及在测试时间遵循看不见的流程图。我们释放由2,738个对话框组成的数据集(浮雕),该对话框为12个不同的故障排除流程图。我们还设计了一个神经模型,扑腾,它使用检索增强的生成架构来训练对话框。我们的实验发现,Flonet可以对未来的流程图进行零射流传输,并为未来的研究设定强大的基线。
translated by 谷歌翻译
我们提出了Tacobot,这是为首届Alexa Prive Taskbot Challenge构建的面向任务的对话系统,该系统可帮助用户完成多步骤烹饪和家庭装修任务。Tacobot的设计采用以用户为中心的原则,并渴望提供协作且易于访问的对话体验。为此,它具有准确的语言理解,灵活的对话管理和引人入胜的响应生成。此外,Tacobot还以强大的搜索引擎和自动化的端到端测试套件为支持。在引导Tacobot的开发中,我们探索了一系列数据增强策略,以训练先进的神经语言处理模型,并通过收集的真实对话不断改善对话经验。在半决赛结束时,Tacobot的平均评分为3.55/5.0。
translated by 谷歌翻译
本文研究了以任务为导向的对话系统中的曝光偏差问题,其中模型在多个转弯中生成的内容驱动对话框上下文远离训练时间的地面真相分布,从而引入了错误传播并损害了TOD系统的稳健性。为了弥合训练和推理多转弯任务导向对话框之间的差距,我们建议会话级抽样,该采样将模型明确地暴露于培训期间对话框上下文的采样生成的内容。此外,我们采用基于辍学的一致性正规化与屏蔽策略R掩码,以进一步提高模型的鲁棒性和性能。拟议的UBARV2在标准化评估基准Multiwoz上实现了最先进的性能,并且广泛的实验显示了所提出的方法的有效性。
translated by 谷歌翻译