英语水平评估已成为过滤和选择学术界和工业的预期候选人的必要度量。随着这种评估需求的增加,越来越必要拥有自动化的人类可意识的结果,以防止不一致并确保对第二语言学习者有意义的反馈。基于特征的经典方法在理解得分模型学习的内容方面更具可解释。因此,在这项工作中,我们利用古典机器学习模型作为分类和回归问题的语音评分任务,其次是彻底的研究来解释和研究语言线索与扬声器的英语水平之间的关系。首先,我们提取五个类别(流利,发音,内容,语法和词汇和声学)的语言学家特征,并列车模型到级响应。相比之下,我们发现基于回归的模型相当于或更好地比分类方法更好。其次,我们进行消融研究以了解每个特征和特征类别对熟练分级性能的影响。此外,要了解个别特征贡献,我们展示了顶部特征对分级任务的最佳执行算法的重要性。第三,我们利用部分依赖性地块和福芙值来探索特征重要性,并得出结论,最好的培训模式了解用于分级本研究中使用的数据集的底层尺寸。
translated by 谷歌翻译
在这项研究中,我们提出了一种新的多模态端到端神经网络,用于使用注意融合自动评估非母语英语扬声器的自发言论。管道采用双向反复化卷积神经网络和双向长短期记忆神经网络,分别从谱图和转录中编码声学和词汇线索。对这些学习的预测特征进行注意融合,以在最终得分之前学习不同方式之间的复杂相互作用。我们将模型与强型基线进行比较,并发现对词汇和声学线索的综合关注显着提高了系统的整体性能。此外,我们对我们的模型提供了一种定性和定量分析。
translated by 谷歌翻译
在过去的三年里,自动评分发动机已被用于评分大约五百万个测试者。由于Covid-19和相关的教育和测试自动化,这个数字进一步增加。尽管使用了这么广泛,但基于AI的测试文献非常缺乏。提出新模型的大多数论文仅依赖于基于二次加权的Kappa(QWK)与人类评估者的协议,以显示模型效能。然而,这有效地忽略了论文评分的高度多重特征性质。论文评分取决于相干性,语法,相关性,充足和,词汇等特征。迄今为止,没有研究检测自动化论文评分:AES系统在全面上的所有这些功能。通过这种动机,我们为AES系统提出了一种模型不良反对派评估计划和相关指标,以测试其自然语言的理解能力和整体鲁棒性。我们使用所提出的方案评估当前的最先进的AES模型,并在最近的五个模型上报告结果。这些型号范围从基于特征为本的最新深度学习算法的方法。我们发现AES模型是高度不夸张的。即使是重型修改(高达25%)与问题无关的内容也不会降低模型产生的分数。另一方面,平均不相关的内容增加了分数,从而表明应该重新考虑模型评估策略和尺寸。我们还要求200名人类评估者在看到人类可以检测到两者之间的差异以及是否同意自动分数分配的分数的同意,以获得原始和对抗的反应。
translated by 谷歌翻译
由于在线学习和评估平台(例如Coursera,Udemy,Khan Academy等)的兴起,对论文(AES)和自动论文评分的自动评估(AES)已成为一个严重的问题。研究人员最近提出了许多用于自动评估的技术。但是,其中许多技术都使用手工制作的功能,因此从特征表示的角度受到限制。深度学习已成为机器学习中的新范式,可以利用大量数据并确定对论文评估有用的功能。为此,我们提出了一种基于复发网络(RNN)和卷积神经网络(CNN)的新型体系结构。在拟议的体系结构中,多通道卷积层从嵌入矢量和基本语义概念中学习并捕获单词n-gram的上下文特征,并使用max-pooling操作在论文级别形成特征向量。 RNN的变体称为双门复发单元(BGRU),用于访问以前和后续的上下文表示。该实验是对Kaggle上的八个数据集进行的,以实现AES的任务。实验结果表明,我们提出的系统比其他基于深度学习的AES系统以及其他最新AES系统的评分精度明显更高。
translated by 谷歌翻译
本地语言识别(NLI)是培训(通过监督机器学习)的任务,该分类器猜测文本作者的母语。在过去的十年中,这项任务已经进行了广泛的研究,多年来,NLI系统的性能稳步改善。我们专注于NLI任务的另一个方面,即分析由\ emph {Aupplable}机器学习算法培训的NLI分类器的内部组件,以获取其分类决策的解释,并具有获得的最终目标,即获得最终的目标。深入了解语言现象````赋予说话者''的母语''。我们使用这种观点来解决NLI和(研究得多的)伴侣任务,即猜测是由本地人还是非本地人说的文本。使用三个不同出处的数据集(英语学习者论文的两个数据集和社交媒体帖子的数据集),我们研究哪种语言特征(词汇,形态学,句法和统计)最有效地解决了我们的两项任务,即,最大的表明说话者的L1。我们还提出了两个案例研究,一个关于西班牙语,另一个关于意大利英语学习者,其中我们分析了分类器对发现这些L1最重要的单个语言特征。总体而言,我们的研究表明,使用可解释的机器学习可能是TH的宝贵工具
translated by 谷歌翻译
双相情感障碍是一种心理健康障碍,导致情绪波动,从令人沮丧到狂热。双相障碍的诊断通常是根据患者访谈进行的,并从患者的护理人员获得的报告。随后,诊断取决于专家的经验,并且可以与其他精神障碍的疾病混淆。双极性障碍诊断中的自动化过程可以帮助提供定量指标,并让患者的更容易观察较长的时间。此外,在Covid-19大流行期间,对遥控和诊断的需求变得尤为重要。在本论文中,我们根据声学,语言和视觉方式的患者录制来创建一种多模态决策系统。该系统培养在双极障碍语料库上。进行综合分析单峰和多模式系统,以及各种融合技术。除了使用单向特征处理整个患者会话外,还研究了剪辑的任务级调查。在多模式融合系统中使用声学,语言和视觉特征,我们实现了64.8%的未加权平均召回得分,这提高了在该数据集上实现的最先进的性能。
translated by 谷歌翻译
口吃是一种言语障碍,在此期间,语音流被非自愿停顿和声音重复打断。口吃识别是一个有趣的跨学科研究问题,涉及病理学,心理学,声学和信号处理,使检测很难且复杂。机器和深度学习的最新发展已经彻底彻底改变了语音领域,但是对口吃的识别受到了最小的关注。这项工作通过试图将研究人员从跨学科领域聚集在一起来填补空白。在本文中,我们回顾了全面的声学特征,基于统计和深度学习的口吃/不足分类方法。我们还提出了一些挑战和未来的指示。
translated by 谷歌翻译
研究界长期以来一直在非本地语音中研究了计算机辅助的发音训练(上尉)方法。研究人员致力于研究各种模型架构,例如贝叶斯网络和深度学习方法,以及分析语音信号的不同表示。尽管近年来取得了重大进展,但现有的CAPT方法仍无法以高精度检测发音误差(在40 \%-80 \%召回时只有60 \%精度)。关键问题之一是发音错误检测模型的可靠培训所需的语音错误的可用性较低。如果我们有一个可以模仿非本地语音并产生任何数量的训练数据的生成模型,那么检测发音错误的任务将容易得多。我们介绍了基于音素到音量(P2P),文本到语音(T2S)以及语音到语音(S2S)转换的三种创新技术,以生成正确发音和错误发音的合成语音。我们表明,这些技术不仅提高了三个机器学习模型的准确性,以检测发音错误,而且还有助于在现场建立新的最新技术。早期的研究使用了简单的语音生成技术,例如P2P转换,但仅是提高发音误差检测准确性的附加机制。另一方面,我们认为语音生成是检测发音误差的第一类方法。这些技术的有效性在检测发音和词汇应力误差的任务中进行了评估。评估中使用了非本地英语言语语料库。与最先进的方法相比,最佳提出的S2S技术将AUC度量误差的准确性从41 \%提高到41 \%从0.528提高到0.749。
translated by 谷歌翻译
Building an accurate model of travel behaviour based on individuals' characteristics and built environment attributes is of importance for policy-making and transportation planning. Recent experiments with big data and Machine Learning (ML) algorithms toward a better travel behaviour analysis have mainly overlooked socially disadvantaged groups. Accordingly, in this study, we explore the travel behaviour responses of low-income individuals to transit investments in the Greater Toronto and Hamilton Area, Canada, using statistical and ML models. We first investigate how the model choice affects the prediction of transit use by the low-income group. This step includes comparing the predictive performance of traditional and ML algorithms and then evaluating a transit investment policy by contrasting the predicted activities and the spatial distribution of transit trips generated by vulnerable households after improving accessibility. We also empirically investigate the proposed transit investment by each algorithm and compare it with the city of Brampton's future transportation plan. While, unsurprisingly, the ML algorithms outperform classical models, there are still doubts about using them due to interpretability concerns. Hence, we adopt recent local and global model-agnostic interpretation tools to interpret how the model arrives at its predictions. Our findings reveal the great potential of ML algorithms for enhanced travel behaviour predictions for low-income strata without considerably sacrificing interpretability.
translated by 谷歌翻译
Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.
translated by 谷歌翻译
口语理解(SLU)是大多数人机相互作用系统中的核心任务。随着智能家居,智能手机和智能扬声器的出现,SLU已成为该行业的关键技术。在经典的SLU方法中,自动语音识别(ASR)模块将语音信号转录为文本表示,自然语言理解(NLU)模块从中提取语义信息。最近,基于深神经网络的端到端SLU(E2E SLU)已经获得了动力,因为它受益于ASR和NLU部分的联合优化,因此限制了管道架构的误差效应的级联反应。但是,对于E2E模型用于预测语音输入的概念和意图的实际语言特性知之甚少。在本文中,我们提出了一项研究,以确定E2E模型执行SLU任务的信号特征和其他语言特性。该研究是在必须处理非英语(此处法语)语音命令的智能房屋的应用领域进行的。结果表明,良好的E2E SLU性能并不总是需要完美的ASR功能。此外,结果表明,与管道模型相比,E2E模型在处理背景噪声和句法变化方面具有出色的功能。最后,更细粒度的分析表明,E2E模型使用输入信号的音调信息来识别语音命令概念。本文概述的结果和方法提供了一个跳板,以进一步分析语音处理中的E2E模型。
translated by 谷歌翻译
其中一个关键的交际能力是能够维持单声道言论的流利程度以及产生复杂语言的能力,以令人信服地争论一个立场。在本文中,我们的目标是预测由110名个人7小时的演讲组成的争论演讲的众群数据集中的TED谈话风格的情感评级。通过与三个辩论主题有关的任务提示引发语音样本。该样本总共有2211名来自737名有关的人类评估者,其有关的14个情感类别。我们通过微调预先调整在TED谈判公开演讲的大型数据集上进行了微调的模型来提出有效的方法来预测这些类别的分类任务。我们使用从最先进的自动语音识别系统中获得的流利功能的组合和从自动文本分析系统获得的大量人类解释的语言特征。所有14家评级类别的分类准确度大于60%,评分类别“信息性”的峰值性能为72%。在二次实验中,我们确定了使用SP-in-in-infly群体的特征的相对重要性。
translated by 谷歌翻译
It does not matter whether it is a job interview with Tech Giants, Wall Street firms, or a small startup; all candidates want to demonstrate their best selves or even present themselves better than they really are. Meanwhile, recruiters want to know the candidates' authentic selves and detect soft skills that prove an expert candidate would be a great fit in any company. Recruiters worldwide usually struggle to find employees with the highest level of these skills. Digital footprints can assist recruiters in this process by providing candidates' unique set of online activities, while social media delivers one of the largest digital footprints to track people. In this study, for the first time, we show that a wide range of behavioral competencies consisting of 16 in-demand soft skills can be automatically predicted from Instagram profiles based on the following lists and other quantitative features using machine learning algorithms. We also provide predictions on Big Five personality traits. Models were built based on a sample of 400 Iranian volunteer users who answered an online questionnaire and provided their Instagram usernames which allowed us to crawl the public profiles. We applied several machine learning algorithms to the uniformed data. Deep learning models mostly outperformed by demonstrating 70% and 69% average Accuracy in two-level and three-level classifications respectively. Creating a large pool of people with the highest level of soft skills, and making more accurate evaluations of job candidates is possible with the application of AI on social media user-generated data.
translated by 谷歌翻译
在人类循环机器学习应用程序的背景下,如决策支持系统,可解释性方法应在不使用户等待的情况下提供可操作的见解。在本文中,我们提出了加速的模型 - 不可知论解释(ACME),一种可解释的方法,即在全球和本地层面迅速提供特征重要性分数。可以将acme应用于每个回归或分类模型的后验。 ACME计算功能排名不仅提供了一个什么,但它还提供了一个用于评估功能值的变化如何影响模型预测的原因 - 如果分析工具。我们评估了综合性和现实世界数据集的建议方法,同时也与福芙添加剂解释(Shap)相比,我们制作了灵感的方法,目前是最先进的模型无关的解释性方法。我们在生产解释的质量方面取得了可比的结果,同时急剧减少计算时间并为全局和局部解释提供一致的可视化。为了促进该领域的研究,为重复性,我们还提供了一种存储库,其中代码用于实验。
translated by 谷歌翻译
我们使用不同的语言支持特征预处理方法研究特征密度(FD)的有效性,以估计数据集复杂性,这又用于比较估计任何训练之前机器学习(ML)分类器的潜在性能。我们假设估计数据集复杂性允许减少所需实验迭代的数量。这样我们可以优化ML模型的资源密集型培训,这是由于可用数据集大小的增加以及基于深神经网络(DNN)的模型的不断增加的普及而成为一个严重问题。由于训练大规模ML模型引起的令人惊叹的二氧化碳排放量,不断增加对更强大的计算资源需求的问题也在影响环境。该研究是在多个数据集中进行的,包括流行的数据集,例如用于培训典型情感分析模型的Yelp业务审查数据集,以及最近的数据集尝试解决网络欺凌问题,这是一个严重的社会问题,也是一个严重的社会问题一个更复杂的问题,形成了语言代表的观点。我们使用收集多种语言的网络欺凌数据集,即英语,日语和波兰语。数据集的语言复杂性的差异允许我们另外讨论语言备份的单词预处理的功效。
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
translated by 谷歌翻译
State-of-the-art text simplification (TS) systems adopt end-to-end neural network models to directly generate the simplified version of the input text, and usually function as a blackbox. Moreover, TS is usually treated as an all-purpose generic task under the assumption of homogeneity, where the same simplification is suitable for all. In recent years, however, there has been increasing recognition of the need to adapt the simplification techniques to the specific needs of different target groups. In this work, we aim to advance current research on explainable and controllable TS in two ways: First, building on recently proposed work to increase the transparency of TS systems, we use a large set of (psycho-)linguistic features in combination with pre-trained language models to improve explainable complexity prediction. Second, based on the results of this preliminary task, we extend a state-of-the-art Seq2Seq TS model, ACCESS, to enable explicit control of ten attributes. The results of experiments show (1) that our approach improves the performance of state-of-the-art models for predicting explainable complexity and (2) that explicitly conditioning the Seq2Seq model on ten attributes leads to a significant improvement in performance in both within-domain and out-of-domain settings.
translated by 谷歌翻译
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
translated by 谷歌翻译
众所周知,端到端的神经NLP体系结构很难理解,这引起了近年来为解释性建模的许多努力。模型解释的基本原则是忠诚,即,解释应准确地代表模型预测背后的推理过程。这项调查首先讨论了忠诚的定义和评估及其对解释性的意义。然后,我们通过将方法分为五类来介绍忠实解释的最新进展:相似性方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自我解释模型。每个类别将通过其代表性研究,优势和缺点来说明。最后,我们从它们的共同美德和局限性方面讨论了上述所有方法,并反思未来的工作方向忠实的解释性。对于有兴趣研究可解释性的研究人员,这项调查将为该领域提供可访问且全面的概述,为进一步探索提供基础。对于希望更好地了解自己的模型的用户,该调查将是一项介绍性手册,帮助选择最合适的解释方法。
translated by 谷歌翻译