随着世界各地的更多用户正在与日常生活中的对话代理商进行互动,需要更好的言语理解,要求重新关注自动语音识别(ASR)和自然语言理解的研究之间的动态(NLU)。我们简要介绍了这些研究领域,并制定了他们之间的当前关系。鉴于我们在本文中进行的观察,我们认为(1)NLU应该认识到对话系统的管道上游使用的ASR模型的存在,(2)ASR应该能够从NLU中发现的错误(3)(3)需要对口语输入提供语义注释的端到端数据集,(4)ASR和NLU研究社区之间应该更强大的协作。
translated by 谷歌翻译
在过去的十年中,对对话系统的兴趣已经大大增长。从扩展过程中,也有兴趣开发和改进意图分类和插槽填充模型,这是两个组件,这些组件通常在以任务为导向的对话框系统中使用。此外,良好的评估基准对于帮助比较和分析结合此类模型的系统很重要。不幸的是,该领域的许多文献仅限于对相对较少的基准数据集的分析。为了促进针对任务的对话系统的更强大的分析,我们对意图分类和插槽填充任务进行了公开可用数据集的调查。我们分类每个数据集的重要特征,并就每个数据集的适用性,优势和劣势进行讨论。我们的目标是,这项调查有助于提高这些数据集的可访问性,我们希望它们能够在未来评估意图分类和填充插槽模型中用于以任务为导向的对话框系统。
translated by 谷歌翻译
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community, but have not received as much attention as lower-level tasks like speech and speaker recognition. In particular, there are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark datasets for several tasks. In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape. We contribute four tasks: question answering and summarization involve inference over longer speech sequences; named entity localization addresses the speech-specific task of locating the targeted content in the signal; dialog act classification identifies the function of a given speech utterance. We follow the blueprint of the Spoken Language Understanding Evaluation (SLUE) benchmark suite. In order to facilitate the development of SLU models that leverage the success of pre-trained speech representations, we will be publishing for each task (i) annotations for a relatively small fine-tuning set, (ii) annotated development and test sets, and (iii) baseline models for easy reproducibility and comparisons. In this work, we present the details of data collection and annotation and the performance of the baseline models. We also perform sensitivity analysis of pipeline models' performance (speech recognizer + text model) to the speech recognition accuracy, using more than 20 state-of-the-art speech recognition models.
translated by 谷歌翻译
问答系统被认为是流行且经常有效的信息在网络上寻求信息的手段。在这样的系统中,寻求信息者可以通过自然语言提出问题来获得对他们的查询的简短回应。交互式问题回答是一种最近提出且日益流行的解决方案,它位于问答和对话系统的交集。一方面,用户可以以普通语言提出问题,并找到对她的询问的实际回答;另一方面,如果在初始请求中有多个可能的答复,很少或歧义,则系统可以将问题交通会话延长到对话中。通过允许用户提出更多问题,交互式问题回答使用户能够与系统动态互动并获得更精确的结果。这项调查提供了有关当前文献中普遍存在的交互式提问方法的详细概述。它首先要解释提问系统的基本原理,从而定义新的符号和分类法,以将所有已确定的作品结合在统一框架内。然后,根据提出的方法,评估方法和数据集/应用程序域来介绍和检查有关交互式问题解答系统的审查已发表的工作。我们还描述了围绕社区提出的特定任务和问题的趋势,从而阐明了学者的未来利益。 GitHub页面的综合综合了本文献研究中涵盖的所有主要主题,我们的工作得到了进一步的支持。 https://sisinflab.github.io/interactive-question-answering-systems-survey/
translated by 谷歌翻译
在与用户进行交流时,以任务为导向的对话系统必须根据对话历史记录在每个回合时跟踪用户的需求。这个称为对话状态跟踪(DST)的过程至关重要,因为它直接告知下游对话政策。近年来,DST引起了很大的兴趣,文本到文本范式作为受欢迎的方法。在本评论论文中,我们首先介绍任务及其相关的数据集。然后,考虑到最近出版的大量出版物,我们确定了2021 - 2022年研究的重点和研究进展。尽管神经方法已经取得了重大进展,但我们认为对话系统(例如概括性)的某些关键方面仍未得到充实。为了激励未来的研究,我们提出了几种研究途径。
translated by 谷歌翻译
This paper aims to provide a radical rundown on Conversation Search (ConvSearch), an approach to enhance the information retrieval method where users engage in a dialogue for the information-seeking tasks. In this survey, we predominantly focused on the human interactive characteristics of the ConvSearch systems, highlighting the operations of the action modules, likely the Retrieval system, Question-Answering, and Recommender system. We labeled various ConvSearch research problems in knowledge bases, natural language processing, and dialogue management systems along with the action modules. We further categorized the framework to ConvSearch and the application is directed toward biomedical and healthcare fields for the utilization of clinical social technology. Finally, we conclude by talking through the challenges and issues of ConvSearch, particularly in Bio-Medicine. Our main aim is to provide an integrated and unified vision of the ConvSearch components from different fields, which benefit the information-seeking process in healthcare systems.
translated by 谷歌翻译
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.
translated by 谷歌翻译
Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, systems convert speech into text using an Automatic Speech Recognition (ASR) system, introducing errors. Furthermore, these systems do not address the differences in written and spoken language. The research on this topic is stymied by the lack of a public corpus. Motivated by these considerations, our goal in hosting the speech-aware dialog state tracking challenge was to create a public corpus or task which can be used to investigate the performance gap between the written and spoken forms of input, develop models that could alleviate this gap, and establish whether Text-to-Speech-based (TTS) systems is a reasonable surrogate to the more-labor intensive human data collection. We created three spoken versions of the popular written-domain MultiWoz task -- (a) TTS-Verbatim: written user inputs were converted into speech waveforms using a TTS system, (b) Human-Verbatim: humans spoke the user inputs verbatim, and (c) Human-paraphrased: humans paraphrased the user inputs. Additionally, we provided different forms of ASR output to encourage wider participation from teams that may not have access to state-of-the-art ASR systems. These included ASR transcripts, word time stamps, and latent representations of the audio (audio encoder outputs). In this paper, we describe the corpus, report results from participating teams, provide preliminary analyses of their results, and summarize the current state-of-the-art in this domain.
translated by 谷歌翻译
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
translated by 谷歌翻译
Any organization needs to improve their products, services, and processes. In this context, engaging with customers and understanding their journey is essential. Organizations have leveraged various techniques and technologies to support customer engagement, from call centres to chatbots and virtual agents. Recently, these systems have used Machine Learning (ML) and Natural Language Processing (NLP) to analyze large volumes of customer feedback and engagement data. The goal is to understand customers in context and provide meaningful answers across various channels. Despite multiple advances in Conversational Artificial Intelligence (AI) and Recommender Systems (RS), it is still challenging to understand the intent behind customer questions during the customer journey. To address this challenge, in this paper, we study and analyze the recent work in Conversational Recommender Systems (CRS) in general and, more specifically, in chatbot-based CRS. We introduce a pipeline to contextualize the input utterances in conversations. We then take the next step towards leveraging reverse feature engineering to link the contextualized input and learning model to support intent recognition. Since performance evaluation is achieved based on different ML models, we use transformer base models to evaluate the proposed approach using a labelled dialogue dataset (MSDialogue) of question-answering interactions between information seekers and answer providers.
translated by 谷歌翻译
插槽填充和意图检测是诸如语音助手的会话代理的骨干,是有效的研究领域。尽管公开的基准上的最先进的技术,但令人印象深刻的性能,他们概括到现实情景的能力尚未得到证明。在这项工作中,我们提出了一种自然,一套简单的口语导向转换,应用于数据集的评估集,在保留话语的语义时引入人类口语变化。我们将大自然应用于共同的插槽填充和意图检测基准,并证明了自然集合的标准评估的简单扰动可以显着降低模型性能。通过我们的实验,我们证明了当自然运营商应用于评估流行基准的评估集时,模型精度可以降低至多40%。
translated by 谷歌翻译
这项工作提出了一个新的对话数据集,即cookdial,该数据集促进了对任务知识了解的面向任务的对话系统的研究。该语料库包含260个以人类对任务为导向的对话框,其中代理给出了配方文档,指导用户烹饪菜肴。 Cookdial中的对话框展示了两个独特的功能:(i)对话流与支持文档之间的程序对齐; (ii)复杂的代理决策涉及分割长句子,解释硬说明并在对话框上下文中解决核心。此外,我们在假定的面向任务的对话框系统中确定了三个具有挑战性的(子)任务:(1)用户问题理解,(2)代理操作框架预测和(3)代理响应生成。对于这些任务中的每一个,我们都会开发一个神经基线模型,我们在cookdial数据集上进行了评估。我们公开发布烹饪数据集,包括对话框和食谱文档的丰富注释,以刺激对特定于域的文档接地对话框系统的进一步研究。
translated by 谷歌翻译
我们提出了Tacobot,这是为首届Alexa Prive Taskbot Challenge构建的面向任务的对话系统,该系统可帮助用户完成多步骤烹饪和家庭装修任务。Tacobot的设计采用以用户为中心的原则,并渴望提供协作且易于访问的对话体验。为此,它具有准确的语言理解,灵活的对话管理和引人入胜的响应生成。此外,Tacobot还以强大的搜索引擎和自动化的端到端测试套件为支持。在引导Tacobot的开发中,我们探索了一系列数据增强策略,以训练先进的神经语言处理模型,并通过收集的真实对话不断改善对话经验。在半决赛结束时,Tacobot的平均评分为3.55/5.0。
translated by 谷歌翻译
The success of the large neural language models on many NLP tasks is exciting. However, we find that these successes sometimes lead to hype in which these models are being described as "understanding" language or capturing "meaning". In this position paper, we argue that a system trained only on form has a priori no way to learn meaning. In keeping with the ACL 2020 theme of "Taking Stock of Where We've Been and Where We're Going", we argue that a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding.
translated by 谷歌翻译
我们提出了一个开放域的社交聊天机器人Chirpy Cardinal。为了既有信息又有信息,我们的机器人以一种真实的,情感上的方式与用户聊天。通过将受控的神经产生与脚手架,手写的对话整合在一起,我们让用户和机器人都轮流推动对话,从而产生引人入胜且流利的体验。Chirpy Cardinal部署在Alexa奖Socialbot Grand Challenge的第四次迭代中,每天处理数千次对话,在9个机器人中排名第二,平均用户评级为3.58/5。
translated by 谷歌翻译
口语理解(SLU)是大多数人机相互作用系统中的核心任务。随着智能家居,智能手机和智能扬声器的出现,SLU已成为该行业的关键技术。在经典的SLU方法中,自动语音识别(ASR)模块将语音信号转录为文本表示,自然语言理解(NLU)模块从中提取语义信息。最近,基于深神经网络的端到端SLU(E2E SLU)已经获得了动力,因为它受益于ASR和NLU部分的联合优化,因此限制了管道架构的误差效应的级联反应。但是,对于E2E模型用于预测语音输入的概念和意图的实际语言特性知之甚少。在本文中,我们提出了一项研究,以确定E2E模型执行SLU任务的信号特征和其他语言特性。该研究是在必须处理非英语(此处法语)语音命令的智能房屋的应用领域进行的。结果表明,良好的E2E SLU性能并不总是需要完美的ASR功能。此外,结果表明,与管道模型相比,E2E模型在处理背景噪声和句法变化方面具有出色的功能。最后,更细粒度的分析表明,E2E模型使用输入信号的音调信息来识别语音命令概念。本文概述的结果和方法提供了一个跳板,以进一步分析语音处理中的E2E模型。
translated by 谷歌翻译
随着自动语音处理(ASR)系统越来越好,使用ASR输出越来越令于进行下游自然语言处理(NLP)任务。但是,很少的开源工具包可用于在不同口语理解(SLU)基准上生成可重复的结果。因此,需要建立一个开源标准,可以用于具有更快的开始进入SLU研究。我们展示了Espnet-SLU,它旨在在一个框架中快速发展口语语言理解。 Espnet-SLU是一个项目内部到结束语音处理工具包,ESPNET,它是一个广泛使用的开源标准,用于各种语音处理任务,如ASR,文本到语音(TTS)和语音转换(ST)。我们增强了工具包,为各种SLU基准提供实现,使研究人员能够无缝混合和匹配不同的ASR和NLU模型。我们还提供预磨损的模型,具有集中调谐的超参数,可以匹配或甚至优于最新的最先进的性能。该工具包在https://github.com/espnet/espnet上公开提供。
translated by 谷歌翻译
Many dialogue systems (DSs) lack characteristics humans have, such as emotion perception, factuality, and informativeness. Enhancing DSs with knowledge alleviates this problem, but, as many ways of doing so exist, keeping track of all proposed methods is difficult. Here, we present the first survey of knowledge-enhanced DSs. We define three categories of systems - internal, external, and hybrid - based on the knowledge they use. We survey the motivation for enhancing DSs with knowledge, used datasets, and methods for knowledge search, knowledge encoding, and knowledge incorporation. Finally, we propose how to improve existing systems based on theories from linguistics and cognitive science.
translated by 谷歌翻译
The advances in language-based Artificial Intelligence (AI) technologies applied to build educational applications can present AI for social-good opportunities with a broader positive impact. Across many disciplines, enhancing the quality of mathematics education is crucial in building critical thinking and problem-solving skills at younger ages. Conversational AI systems have started maturing to a point where they could play a significant role in helping students learn fundamental math concepts. This work presents a task-oriented Spoken Dialogue System (SDS) built to support play-based learning of basic math concepts for early childhood education. The system has been evaluated via real-world deployments at school while the students are practicing early math concepts with multimodal interactions. We discuss our efforts to improve the SDS pipeline built for math learning, for which we explore utilizing MathBERT representations for potential enhancement to the Natural Language Understanding (NLU) module. We perform an end-to-end evaluation using real-world deployment outputs from the Automatic Speech Recognition (ASR), Intent Recognition, and Dialogue Manager (DM) components to understand how error propagation affects the overall performance in real-world scenarios.
translated by 谷歌翻译
当在现实世界中以任务为导向的对话系统中实现自然语言(NLG)组件时,不仅需要在训练数据上学习的自然话语,而且还需要适应对话环境(例如,环境中的噪音)听起来)和用户(例如,理解能力水平较低的用户)。受到语言生成任务的强化学习(RL)的最新进展的启发,我们提出了Antor,这是一种通过强化学习来适应以任务为导向对话的自然语言生成的方法。在Antor中,与用户对系统话语的理解相对应的自然语言理解(NLU)模块已纳入RL的目标函数中。如果将NLG的意图正确传达给了NLU,该意图理解了系统的话语,则NLG将获得积极的回报。我们在Multiwoz数据集上进行了实验,并确认Antor可以对语音识别错误和用户的不同词汇水平产生适应性话语。
translated by 谷歌翻译