口语语言理解(SLU)任务涉及从语音音频信号映射到语义标签。鉴于此类任务的复杂性,可能预期良好的性能需要大量标记的数据集,这很难为每个新任务和域收集。但是,最近的自我监督讲话表现的进步使得考虑使用有限标记的数据学习SLU模型是可行的。在这项工作中,我们专注于低资源讨论(ner)并解决问题:超越自我监督的预培训,我们如何使用未为任务注释的外部语音和/或文本数据?我们借鉴了各种方法,包括自我训练,知识蒸馏和转移学习,并考虑其对端到端模型和管道(语音识别后跟文本型号)的适用性。我们发现,这些方法中的几种方法可以在资源受限的环境中提高绩效,超出了训练有素的表示的福利。与事先工作相比,我们发现改进的F1分数高达16%。虽然最好的基线模型是一种管道方法,但使用外部数据时最终通过端到端模型实现的最佳性能。我们提供了详细的比较和分析,例如,端到端模型能够专注于更加立列人的单词。
translated by 谷歌翻译
通过共享数据集和基准,已经促进了语音处理的进展。历史上,这些都集中在自动语音识别(ASR),扬声器标识或其他较低级别的任务上。兴趣在更高层次的口语中越来越多,理解任务,包括使用端到端模型,但是此类任务的注释数据集较少。与此同时,最近的工作显示了预先培训通用表示的可能性,然后使用相对较少标记的数据进行微调的多个任务。我们建议为口语语言理解(屠宰)创建一套基准任务,由有限尺寸标记的培训集和相应的评估集组成。该资源将允许研究界跟踪进度,评估高级任务的预先接受预期的表示,并研究开放的问题,例如管道与端到端方法的实用性。我们介绍了雪橇基准套件的第一阶段,包括指定实体识别,情感分析和相应数据集上的ASR。我们专注于自然产生的(未读取或综合)语音和自由可用的数据集。我们为VoxceReb和Voxpopuli数据集的子集提供新的转录和注释,基线模型的评估指标和结果,以及重现基线的开源工具包,并评估新模型。
translated by 谷歌翻译
扬声器日流是一个标签音频或视频录制的任务,与扬声器身份或短暂的任务标记对应于扬声器标识的类,以识别“谁谈到何时发表讲话”。在早期,对MultiSpeaker录音的语音识别开发了扬声器日益衰退算法,以使扬声器自适应处理能够实现扬声器自适应处理。这些算法还将自己的价值作为独立应用程序随着时间的推移,为诸如音频检索等下游任务提供特定于扬声器的核算。最近,随着深度学习技术的出现,这在讲话应用领域的研究和实践中引起了革命性的变化,对扬声器日益改善已经进行了快速进步。在本文中,我们不仅审查了扬声器日益改善技术的历史发展,而且还审查了神经扬声器日益改善方法的最新进步。此外,我们讨论了扬声器日复速度系统如何与语音识别应用相结合,以及最近深度学习的激增是如何引领联合建模这两个组件互相互补的方式。通过考虑这种令人兴奋的技术趋势,我们认为本文对社区提供了有价值的贡献,以通过巩固具有神经方法的最新发展,从而促进更有效的扬声器日益改善进一步进展。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Question Answering (QA) is a task that entails reasoning over natural language contexts, and many relevant works augment language models (LMs) with graph neural networks (GNNs) to encode the Knowledge Graph (KG) information. However, most existing GNN-based modules for QA do not take advantage of rich relational information of KGs and depend on limited information interaction between the LM and the KG. To address these issues, we propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations in a unified manner. Specifically, QAT constructs Meta-Path tokens, which learn relation-centric embeddings based on diverse structural and semantic relations. Then, our Relation-Aware Self-Attention module comprehensively integrates different modalities via the Cross-Modal Relative Position Bias, which guides information exchange between relevant entities of different modalities. We validate the effectiveness of QAT on commonsense question answering datasets like CommonsenseQA and OpenBookQA, and on a medical question answering dataset, MedQA-USMLE. On all the datasets, our method achieves state-of-the-art performance. Our code is available at http://github.com/mlvlab/QAT.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Despite their popularity in deep learning and machine learning in general, the theoretical properties of adaptive optimizers such as Adagrad, RMSProp, Adam or AdamW are not yet fully understood. In this paper, we develop a novel framework to study the stability and generalization of these optimization methods. Based on this framework, we show provable guarantees about such properties that depend heavily on a single parameter $\beta_2$. Our empirical experiments support our claims and provide practical insights into the stability and generalization properties of adaptive optimization methods.
translated by 谷歌翻译
基于时空的图(STMAP)方法显示出为车辆轨迹重建处理高角度视频的巨大潜力,可以满足各种数据驱动的建模和模仿学习应用的需求。在本文中,我们开发了时空深嵌入(STDE)模型,该模型在像素和实例水平上施加了平等约束,以生成用于STMAP上车辆条纹分割的实例感知嵌入。在像素级别上,每个像素在不同范围的8-邻居像素进行编码,随后使用该编码来指导神经网络学习嵌入机制。在实例级别上,歧视性损耗函数被设计为将属于同一实例的像素更接近,并将不同实例的平均值分开。然后,通过静脉 - 沃特算法算法优化时空亲和力的输出,以获得最终的聚类结果。基于分割指标,我们的模型优于其他五个用于STMAP处理的基线,并在阴影,静态噪声和重叠的影响下显示出稳健性。该设计的模型用于处理所有公共NGSIM US-101视频,以生成完整的车辆轨迹,表明具有良好的可扩展性和适应性。最后但并非最不重要的一点是,讨论了带有STDE和未来方向的扫描线方法的优势。代码,STMAP数据集和视频轨迹在在线存储库中公开可用。 github链接:shorturl.at/jklt0。
translated by 谷歌翻译
在教育环境中进行随机实验提出了一个问题,即我们如何使用机器学习技术来改善教育干预措施。使用自适应实验中的汤普森采样(TS)(TS)等多臂匪徒(MAB)算法,即使在干预完成之前,也可以通过增加对最佳状态(ARM)的分配可能性来获得更好的结果的机会。这是比传统的A/B测试的优势,该测试可能会分配相等数量的学生为最佳和非最佳条件。问题是勘探探索权衡取舍。尽管自适应政策旨在收集足够的信息来分配更多的学生以可靠地提供更好的武器,但过去的工作表明,这可能还不够探索,无法就武器是否有所不同,得出可靠的结论。因此,在整个实验中提供额外的均匀随机(UR)探索是很有趣的。本文展示了一个真实的自适应实验,该实验是关于学生如何与教师每周的电子邮件提醒互动以建立时间管理习惯的。我们感兴趣的指标是打开电子邮件率,它跟踪由不同主题行的武器。这些是按照不同的分配算法传递的:ur,ts和我们确定为ts {\ dag} - 结合了TS和UR奖励以更新其先验者。我们强调了这些自适应算法的问题 - 在没有显着差异时可能会剥削手臂 - 并解决它们的原因和后果。未来的方向包括研究最佳臂的早期选择不是理想的情况以及自适应算法如何解决它们的情况。
translated by 谷歌翻译
由于异质访问点(APS)的性质,负载平衡(LB)是混合灯保真度(LIFI)和无线保真度(WIFI)网络(HLWNETS)的挑战性问题。机器学习有可能以近乎最佳的网络性能为培训过程提供复杂性的LB解决方案。但是,当网络环境(尤其是用户数量)更改时,需要进行最先进的(SOTA)学习辅助LB方法,这大大限制了其实用性。在本文中,提出了一个新颖的深神经网络(DNN)结构,称为自适应目标条件神经网络(A-TCNN),该结构在其他用户的条件下为一个目标用户进行AP选择。此外,开发了一种自适应机制,可以通过分配数据速率要求将较大数量的用户映射到较大的数字,而不会影响目标用户的AP选择结果。这使提出的方法可以处理不同数量的用户,而无需再进行重新培训。结果表明,A-TCNN实现了非常接近测试数据集的网络吞吐量,差距小于3%。还证明,A-TCNN可以获得与两个SOTA基准相当的网络吞吐量,同时最多将运行时降低了三个数量级。
translated by 谷歌翻译