尽管受到监督的深度学习彻底改变了语音和音频处理,但它必须为个人任务和应用程序方案建立专业模型。同样,很难将其应用于仅可用标记数据的方言和语言。自我监督的代表学习方法承诺一个单一的通用模型,该模型将使各种各样的任务和领域受益。这种方法已显示出在自然语言处理和计算机视觉域中的成功,在减少许多下游场景所需的标签数量的同时,达到了新的性能水平。语音表示学习在三个主要类别中也经历了类似的进展:生成,对比和预测方法。其他方法依赖于多模式数据,用于预训练,将文本或视觉数据流与语音混合。尽管自我监督的语音表示仍然是一个新生的研究领域,但它与用零词汇资源的声学单词嵌入和学习密切相关,这两种资源已经进行了多年的积极研究。这篇评论介绍了自我监督的语音表示学习及其与其他研究领域的联系的方法。由于许多当前的方法仅集中在自动语音识别作为下游任务上,因此我们回顾了基准测试的最新努力,以将应用程序扩展到语音识别之外。
translated by 谷歌翻译
通过首先通过自动语音识别(ASR)转换话语,然后将输出馈送到基于文本的模型,通常通过转录语言理解(SLU)任务来解决。自我监督代表学习的最新进展旨在改善ASR组件。我们调查了是否对演讲的代表性学习已经成熟,以取代SLU中的ASR。我们将学位语音特征与Wav2Vec 2.0,最先进的ASR成绩单以及基于新型语音的名称实体识别任务的输入,是真实世界紧急呼叫和两个基于语音的命名实体识别任务的输入。现有的SLU基准。我们表明,学习的语音功能优于三种分类任务的ASR成绩单。对于机器翻译,ASR成绩单仍然是更好的选择。我们突出了Wav2VEC 2.0表示的内在稳健性,以失控的单词作为更好的性能的关键。
translated by 谷歌翻译
The success of neural networks builds to a large extent on their ability to create internal knowledge representations from real-world high-dimensional data, such as images, sound, or text. Approaches to extract and present these representations, in order to explain the neural network's decisions, is an active and multifaceted research field. To gain a deeper understanding of a central aspect of this field, we have performed a targeted review focusing on research that aims to associate internal representations with human understandable concepts. In doing this, we added a perspective on the existing research by using primarily deductive nomological explanations as a proposed taxonomy. We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations. The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability; is it understanding the ML model or, is it actionable explanations useful in the deployment domain?
translated by 谷歌翻译
Human motion prediction is a complex task as it involves forecasting variables over time on a graph of connected sensors. This is especially true in the case of few-shot learning, where we strive to forecast motion sequences for previously unseen actions based on only a few examples. Despite this, almost all related approaches for few-shot motion prediction do not incorporate the underlying graph, while it is a common component in classical motion prediction. Furthermore, state-of-the-art methods for few-shot motion prediction are restricted to motion tasks with a fixed output space meaning these tasks are all limited to the same sensor graph. In this work, we propose to extend recent works on few-shot time-series forecasting with heterogeneous attributes with graph neural networks to introduce the first few-shot motion approach that explicitly incorporates the spatial graph while also generalizing across motion tasks with heterogeneous sensors. In our experiments on motion tasks with heterogeneous sensors, we demonstrate significant performance improvements with lifts from 10.4% up to 39.3% compared to best state-of-the-art models. Moreover, we show that our model can perform on par with the best approach so far when evaluating on tasks with a fixed output space while maintaining two magnitudes fewer parameters.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
This volume contains revised versions of the papers selected for the third volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
translated by 谷歌翻译
Time series, sets of sequences in chronological order, are essential data in statistical research with many forecasting applications. Although recent performance in many Transformer-based models has been noticeable, long multi-horizon time series forecasting remains a very challenging task. Going beyond transformers in sequence translation and transduction research, we observe the effects of down-and-up samplings that can nudge temporal saliency patterns to emerge in time sequences. Motivated by the mentioned observation, in this paper, we propose a novel architecture, Temporal Saliency Detection (TSD), on top of the attention mechanism and apply it to multi-horizon time series prediction. We renovate the traditional encoder-decoder architecture by making as a series of deep convolutional blocks to work in tandem with the multi-head self-attention. The proposed TSD approach facilitates the multiresolution of saliency patterns upon condensed multi-heads, thus progressively enhancing complex time series forecasting. Experimental results illustrate that our proposed approach has significantly outperformed existing state-of-the-art methods across multiple standard benchmark datasets in many far-horizon forecasting settings. Overall, TSD achieves 31% and 46% relative improvement over the current state-of-the-art models in multivariate and univariate time series forecasting scenarios on standard benchmarks. The Git repository is available at https://github.com/duongtrung/time-series-temporal-saliency-patterns.
translated by 谷歌翻译
Hyperspectral Imaging (HSI) provides detailed spectral information and has been utilised in many real-world applications. This work introduces an HSI dataset of building facades in a light industry environment with the aim of classifying different building materials in a scene. The dataset is called the Light Industrial Building HSI (LIB-HSI) dataset. This dataset consists of nine categories and 44 classes. In this study, we investigated deep learning based semantic segmentation algorithms on RGB and hyperspectral images to classify various building materials, such as timber, brick and concrete.
translated by 谷歌翻译
We propose a novel multi-task method for quantile forecasting with shared Linear layers. Our method is based on the Implicit quantile learning approach, where samples from the Uniform distribution $\mathcal{U}(0, 1)$ are reparameterized to quantile values of the target distribution. We combine the implicit quantile and input time series representations to directly forecast multiple quantile estimations for multiple horizons jointly. Prior works have adopted a Linear layer for the direct estimation of all forecasting horizons in a multi-task learning setup. We show that following similar intuition from multi-task learning to exploit correlations among forecast horizons, we can model multiple quantile estimates as auxiliary tasks for each of the forecast horizon to improve forecast accuracy across the quantile estimates compared to modeling only a single quantile estimate. We show learning auxiliary quantile tasks leads to state-of-the-art performance on deterministic forecasting benchmarks concerning the main-task of forecasting the 50$^{th}$ percentile estimate.
translated by 谷歌翻译
Point cloud analysis is receiving increasing attention, however, most existing point cloud models lack the practical ability to deal with the unavoidable presence of unknown objects. This paper mainly discusses point cloud analysis under open-set settings, where we train the model without data from unknown classes and identify them in the inference stage. Basically, we propose to solve open-set point cloud analysis using a novel Point Cut-and-Mix mechanism consisting of Unknown-Point Simulator and Unknown-Point Estimator modules. Specifically, we use the Unknown-Point Simulator to simulate unknown data in the training stage by manipulating the geometric context of partial known data. Based on this, the Unknown-Point Estimator module learns to exploit the point cloud's feature context for discriminating the known and unknown data. Extensive experiments show the plausibility of open-set point cloud analysis and the effectiveness of our proposed solutions. Our code is available at \url{https://github.com/ShiQiu0419/pointcam}.
translated by 谷歌翻译