大型研究展示了教师质疑策略如何改善学生学习结果。然而,开发新方案是挑战,因为缺乏特定情景的培训数据以及与标签相关的成本。本文介绍了基于AI的高保真度,级教室模拟器,帮助教师排练基于研究的数学质疑技巧。使用人类循环方法,我们收集了一个高质量的训练数据集,用于数学质疑方案。利用最近的不确定性量化的进步,我们评估了我们的可用性的会话代理,并分析了纳入人类循环方法进行数据收集和系统评估的实用性,以获得数学质疑场景。
translated by 谷歌翻译
高保真,基于AI的模拟课堂系统使教师能够排练有效的教学策略。但是,对话导向的开放式对话,例如教学关于规模因素的教学可能难以模仿。本文建立了一个基于文本的互动会话代理,以帮助教师根据着名的教学质量评估来练习数学质疑技能。我们采取了一种以人为本的设计来设计我们的系统,依靠深度学习,不确定量化和自然语言处理的进步,同时承认对会话代理的局限性进行特定的教学需求。在模拟期间直接使用专家输入,我们展示了如何实现谈话成功率和高用户满意度。
translated by 谷歌翻译
The advances in language-based Artificial Intelligence (AI) technologies applied to build educational applications can present AI for social-good opportunities with a broader positive impact. Across many disciplines, enhancing the quality of mathematics education is crucial in building critical thinking and problem-solving skills at younger ages. Conversational AI systems have started maturing to a point where they could play a significant role in helping students learn fundamental math concepts. This work presents a task-oriented Spoken Dialogue System (SDS) built to support play-based learning of basic math concepts for early childhood education. The system has been evaluated via real-world deployments at school while the students are practicing early math concepts with multimodal interactions. We discuss our efforts to improve the SDS pipeline built for math learning, for which we explore utilizing MathBERT representations for potential enhancement to the Natural Language Understanding (NLU) module. We perform an end-to-end evaluation using real-world deployment outputs from the Automatic Speech Recognition (ASR), Intent Recognition, and Dialogue Manager (DM) components to understand how error propagation affects the overall performance in real-world scenarios.
translated by 谷歌翻译
Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.
translated by 谷歌翻译
最近,通过“向导”模拟游戏收集了一类以任务为导向的对话(TOD)数据集。但是,《巫师》数据实际上是模拟的数据,因此与现实生活中的对话根本不同,这些对话更加嘈杂和随意。最近,Seretod挑战赛是组织的,并发布了Mobilecs数据集,该数据集由来自中国移动的真实用户和客户服务人员之间的真实世界对话框组成。基于Mobilecs数据集,Seretod挑战具有两个任务,不仅评估了对话系统本身的构建,而且还检查了对话框成绩单中的信息提取,这对于建立TOD的知识库至关重要。本文主要介绍了Mobilecs数据集对这两项任务的基线研究。我们介绍了如何构建两个基线,遇到的问题以及结果。我们预计基线可以促进令人兴奋的未来研究,以建立针对现实生活任务的人类机器人对话系统。
translated by 谷歌翻译
Any organization needs to improve their products, services, and processes. In this context, engaging with customers and understanding their journey is essential. Organizations have leveraged various techniques and technologies to support customer engagement, from call centres to chatbots and virtual agents. Recently, these systems have used Machine Learning (ML) and Natural Language Processing (NLP) to analyze large volumes of customer feedback and engagement data. The goal is to understand customers in context and provide meaningful answers across various channels. Despite multiple advances in Conversational Artificial Intelligence (AI) and Recommender Systems (RS), it is still challenging to understand the intent behind customer questions during the customer journey. To address this challenge, in this paper, we study and analyze the recent work in Conversational Recommender Systems (CRS) in general and, more specifically, in chatbot-based CRS. We introduce a pipeline to contextualize the input utterances in conversations. We then take the next step towards leveraging reverse feature engineering to link the contextualized input and learning model to support intent recognition. Since performance evaluation is achieved based on different ML models, we use transformer base models to evaluate the proposed approach using a labelled dialogue dataset (MSDialogue) of question-answering interactions between information seekers and answer providers.
translated by 谷歌翻译
以任务为导向的对话系统(TODS)继续升高,因为各种行业发现有效地利用其能力,节省时间和金钱。然而,即使是最先进的TOD尚未达到其全部潜力。TOD通常具有主要设计专注于完成手头的任务,因此任务分辨率的度量应优先考虑。可能会忽略可能指向对话的其他可能指向成功或其他方面的会话质量属性。这可能导致人类和对话系统之间的相互作用,让用户不满意或沮丧。本文探讨了对话系统的评价框架的文献,以及对话系统中的会话质量属性的作用,看起来,如何以及在与对话系统的性能相关的情况下,如何相关。
translated by 谷歌翻译
We propose a novel task, G4C (Goal-driven Guidance Generation in Grounded Communication), for studying goal-driven and grounded natural language interactions. Specifically, we choose Dungeons and Dragons (D&D) -- a role-playing game consisting of multiple player characters and a Dungeon Master (DM) who collaborate to achieve a set of goals that are beneficial to the players -- as a testbed for this task. Here, each of the player characters is a student, with their own personas and abilities, and the DM is the teacher, an arbitrator of the rules of the world and responsible for assisting and guiding the students towards a global goal. We propose a theory-of-mind-inspired methodology for training such a DM with reinforcement learning (RL), where a DM: (1) learns to predict how the players will react to its utterances using a dataset of D&D dialogue transcripts; and (2) uses this prediction as a reward function providing feedback on how effective these utterances are at guiding the players towards a goal. Human and automated evaluations show that a DM trained with RL to generate guidance by incorporating a theory-of-mind of the players significantly improves the players' ability to achieve goals grounded in their shared world.
translated by 谷歌翻译
Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Empirically, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.
translated by 谷歌翻译
我们提出了Tacobot,这是为首届Alexa Prive Taskbot Challenge构建的面向任务的对话系统,该系统可帮助用户完成多步骤烹饪和家庭装修任务。Tacobot的设计采用以用户为中心的原则,并渴望提供协作且易于访问的对话体验。为此,它具有准确的语言理解,灵活的对话管理和引人入胜的响应生成。此外,Tacobot还以强大的搜索引擎和自动化的端到端测试套件为支持。在引导Tacobot的开发中,我们探索了一系列数据增强策略,以训练先进的神经语言处理模型,并通过收集的真实对话不断改善对话经验。在半决赛结束时,Tacobot的平均评分为3.55/5.0。
translated by 谷歌翻译
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
translated by 谷歌翻译
我们提出了一个开放域的社交聊天机器人Chirpy Cardinal。为了既有信息又有信息,我们的机器人以一种真实的,情感上的方式与用户聊天。通过将受控的神经产生与脚手架,手写的对话整合在一起,我们让用户和机器人都轮流推动对话,从而产生引人入胜且流利的体验。Chirpy Cardinal部署在Alexa奖Socialbot Grand Challenge的第四次迭代中,每天处理数千次对话,在9个机器人中排名第二,平均用户评级为3.58/5。
translated by 谷歌翻译
Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.
translated by 谷歌翻译
This paper presents a conversational AI platform called Flowstorm. Flowstorm is an open-source SaaS project suitable for creating, running, and analyzing conversational applications. Thanks to the fast and fully automated build process, the dialogues created within the platform can be executed in seconds. Furthermore, we propose a novel dialogue architecture that uses a combination of tree structures with generative models. The tree structures are also used for training NLU models suitable for specific dialogue scenarios. However, the generative models are globally used across applications and extend the functionality of the dialogue trees. Moreover, the platform functionality benefits from out-of-the-box components, such as the one responsible for extracting data from utterances or working with crawled data. Additionally, it can be extended using a custom code directly in the platform. One of the essential features of the platform is the possibility to reuse the created assets across applications. There is a library of prepared assets where each developer can contribute. All of the features are available through a user-friendly visual editor.
translated by 谷歌翻译
As artificial intelligence (AI) becomes a prominent part of modern life, AI literacy is becoming important for all citizens, not just those in technology careers. Previous research in AI education materials has largely focused on the introduction of terminology as well as AI use cases and ethics, but few allow students to learn by creating their own machine learning models. Therefore, there is a need for enriching AI educational tools with more adaptable and flexible platforms for interested educators with any level of technical experience to utilize within their teaching material. As such, we propose the development of an open-source tool (Build-a-Bot) for students and teachers to not only create their own transformer-based chatbots based on their own course material, but also learn the fundamentals of AI through the model creation process. The primary concern of this paper is the creation of an interface for students to learn the principles of artificial intelligence by using a natural language pipeline to train a customized model to answer questions based on their own school curriculums. The model uses contexts given by their instructor, such as chapters of a textbook, to answer questions and is deployed on an interactive chatbot/voice agent. The pipeline teaches students data collection, data augmentation, intent recognition, and question answering by having them work through each of these processes while creating their AI agent, diverging from previous chatbot work where students and teachers use the bots as black-boxes with no abilities for customization or the bots lack AI capabilities, with the majority of dialogue scripts being rule-based. In addition, our tool is designed to make each step of this pipeline intuitive for students at a middle-school level. Further work primarily lies in providing our tool to schools and seeking student and teacher evaluations.
translated by 谷歌翻译
This paper aims to provide a radical rundown on Conversation Search (ConvSearch), an approach to enhance the information retrieval method where users engage in a dialogue for the information-seeking tasks. In this survey, we predominantly focused on the human interactive characteristics of the ConvSearch systems, highlighting the operations of the action modules, likely the Retrieval system, Question-Answering, and Recommender system. We labeled various ConvSearch research problems in knowledge bases, natural language processing, and dialogue management systems along with the action modules. We further categorized the framework to ConvSearch and the application is directed toward biomedical and healthcare fields for the utilization of clinical social technology. Finally, we conclude by talking through the challenges and issues of ConvSearch, particularly in Bio-Medicine. Our main aim is to provide an integrated and unified vision of the ConvSearch components from different fields, which benefit the information-seeking process in healthcare systems.
translated by 谷歌翻译
以人为中心的可解释人工智能(HCXAI)社区提出了将解释过程作为人与机器之间的对话进行构建。在该立场论文中,我们为基于文本的对话剂建立了Desiderata,能够使用自然语言进行交互方式解释神经模型的行为。从自然语言处理(NLP)研究的角度来看,我们设计了这种调解人的蓝图,以进行情感分析的任务,并评估当前的研究在基于对话的解释方面走上了多远。
translated by 谷歌翻译
Empathy is a vital factor that contributes to mutual understanding, and joint problem-solving. In recent years, a growing number of studies have recognized the benefits of empathy and started to incorporate empathy in conversational systems. We refer to this topic as empathetic conversational systems. To identify the critical gaps and future opportunities in this topic, this paper examines this rapidly growing field using five review dimensions: (i) conceptual empathy models and frameworks, (ii) adopted empathy-related concepts, (iii) datasets and algorithmic techniques developed, (iv) evaluation strategies, and (v) state-of-the-art approaches. The findings show that most studies have centered on the use of the EMPATHETICDIALOGUES dataset, and the text-based modality dominates research in this field. Studies mainly focused on extracting features from the messages of the users and the conversational systems, with minimal emphasis on user modeling and profiling. Notably, studies that have incorporated emotion causes, external knowledge, and affect matching in the response generation models, have obtained significantly better results. For implementation in diverse real-world settings, we recommend that future studies should address key gaps in areas of detecting and authenticating emotions at the entity level, handling multimodal inputs, displaying more nuanced empathetic behaviors, and encompassing additional dialogue system features.
translated by 谷歌翻译
在与用户进行交流时,以任务为导向的对话系统必须根据对话历史记录在每个回合时跟踪用户的需求。这个称为对话状态跟踪(DST)的过程至关重要,因为它直接告知下游对话政策。近年来,DST引起了很大的兴趣,文本到文本范式作为受欢迎的方法。在本评论论文中,我们首先介绍任务及其相关的数据集。然后,考虑到最近出版的大量出版物,我们确定了2021 - 2022年研究的重点和研究进展。尽管神经方法已经取得了重大进展,但我们认为对话系统(例如概括性)的某些关键方面仍未得到充实。为了激励未来的研究,我们提出了几种研究途径。
translated by 谷歌翻译
我们描述了对AI代理中既以人为本又基于设计的解释产生的立场。我们通过焦点小组的参与设计收集有关AI代理商的工作问题。我们通过任务方法知识模型捕获代理的设计,该模型明确指定了代理的任务和目标,以及它用于完成任务的机制,知识和词汇。我们通过在Skillsync中产生的解释来说明我们的方法,Skillsync是AI代理,该代理将公司和学院连接起来,以使工人提高和重新锻炼。特别是,我们在Skillsync中嵌入了一个名为AskJill的提问代理,AskJill在其中包含Skillsync设计的TMK模型。AskJill目前回答有关Skillsync任务和词汇的人类生成的问题,从而有助于解释其如何产生建议。
translated by 谷歌翻译