在为城市声音分类构建深度神经网络之后,这项工作侧重于辅助驾驶员遭受听力损失的敏感应用。因此,清晰的病因证明和解释模型预测包括强的要求。为此,我们使用了两个不同的音频信号表示,即MEL和常数Q谱图,而深度神经网络所做的决定是通过层明智的相关性传播来解释的。同时,在两个特征集中分配高相关性的频率内容,表示非常辨别的信息表征本发明的分类任务。总的来说,我们为理解深层城市声音分类提供了解释的AI框架。
translated by 谷歌翻译
听觉对于自动驾驶汽车(AV)至关重要,以更好地感知其周围环境。尽管相机,激光雷达和雷达等AV的视觉传感器有助于看到其周围环境,但AV无法看到这些传感器的视线。另一方面,视线无法阻碍AV的听力感。例如,即使紧急车辆不在AV的视线之内,AV也可以通过音频分类识别紧急车辆的警笛。因此,听觉感知与相机,激光雷达和基于雷达的感知系统互补。本文提出了一个基于深度学习的强大音频分类框架,旨在提高对AV的环境感知。提出的框架利用深度卷积神经网络(CNN)来对不同的音频类进行分类。 Urbansound8K是一个城市环境数据集,用于训练和测试开发的框架。七个音频课程,即空调,汽车喇叭,儿童播放,狗皮,发动机空闲,枪声和警报器,是从urbansound8k数据集中识别的,因为它们与AVS相关。我们的框架可以以97.82%的精度对不同的音频类别进行分类。此外,介绍了所有十个类的音频分类精度,这证明,与现有的音频分类框架相比,在与AV相关的声音的情况下,我们的框架的性能更好。
translated by 谷歌翻译
Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to wellinformed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.
translated by 谷歌翻译
卷积神经网络(CNN)以其出色的功能提取能力而闻名,可以从数据中学习模型,但被用作黑匣子。对卷积滤液和相关特征的解释可以帮助建立对CNN的理解,以区分各种类别。在这项工作中,我们关注的是CNN模型的解释性,称为CNNexplain,该模型用于COVID-19和非CoVID-19分类,重点是卷积过滤器的特征解释性,以及这些功能如何有助于分类。具体而言,我们使用了各种可解释的人工智能(XAI)方法,例如可视化,SmoothGrad,Grad-Cam和Lime来提供卷积滤液的解释及相关特征及其在分类中的作用。我们已经分析了使用干咳嗽光谱图的这些方法的解释。从石灰,光滑果实和GRAD-CAM获得的解释结果突出了不同频谱图的重要特征及其与分类的相关性。
translated by 谷歌翻译
With big data becoming increasingly available, IoT hardware becoming widely adopted, and AI capabilities becoming more powerful, organizations are continuously investing in sensing. Data coming from sensor networks are currently combined with sensor fusion and AI algorithms to drive innovation in fields such as self-driving cars. Data from these sensors can be utilized in numerous use cases, including alerts in safety systems of urban settings, for events such as gun shots and explosions. Moreover, diverse types of sensors, such as sound sensors, can be utilized in low-light conditions or at locations where a camera is not available. This paper investigates the potential of the utilization of sound-sensor data in an urban context. Technically, we propose a novel approach of classifying sound data using the Wigner-Ville distribution and Convolutional Neural Networks. In this paper, we report on the performance of the approach on open-source datasets. The concept and work presented is based on my doctoral thesis, which was performed as part of the Engineering Doctorate program in Data Science at the University of Eindhoven, in collaboration with the Dutch National Police. Additional work on real-world datasets was performed during the thesis, which are not presented here due to confidentiality.
translated by 谷歌翻译
可解释的人工智能(XAI)的新兴领域旨在为当今强大但不透明的深度学习模型带来透明度。尽管本地XAI方法以归因图的形式解释了个体预测,从而确定了重要特征的发生位置(但没有提供有关其代表的信息),但全局解释技术可视化模型通常学会的编码的概念。因此,两种方法仅提供部分见解,并留下将模型推理解释的负担。只有少数当代技术旨在将本地和全球XAI背后的原则结合起来,以获取更多信息的解释。但是,这些方法通常仅限于特定的模型体系结构,或对培训制度或数据和标签可用性施加其他要求,这实际上使事后应用程序成为任意预训练的模型。在这项工作中,我们介绍了概念相关性传播方法(CRP)方法,该方法结合了XAI的本地和全球观点,因此允许回答“何处”和“ where”和“什么”问题,而没有其他约束。我们进一步介绍了相关性最大化的原则,以根据模型对模型的有用性找到代表性的示例。因此,我们提高了对激活最大化及其局限性的共同实践的依赖。我们证明了我们方法在各种环境中的能力,展示了概念相关性传播和相关性最大化导致了更加可解释的解释,并通过概念图表,概念组成分析和概念集合和概念子区和概念子区和概念子集和定量研究对模型的表示和推理提供了深刻的见解。它们在细粒度决策中的作用。
translated by 谷歌翻译
Nonlinear methods such as Deep Neural Networks (DNNs) are the gold standard for various challenging machine learning problems, e.g., image classification, natural language processing or human action recognition. Although these methods perform impressively well, they have a significant disadvantage, the lack of transparency, limiting the interpretability of the solution and thus the scope of application in practice. Especially DNNs act as black boxes due to their multilayer nonlinear structure. In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method is based on deep Taylor decomposition and efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer. We evaluate the proposed method empirically on the MNIST and ILSVRC data sets.
translated by 谷歌翻译
音频分割和声音事件检测是机器聆听中的关键主题,旨在检测声学类别及其各自的边界。它对于音频分析,语音识别,音频索引和音乐信息检索非常有用。近年来,大多数研究文章都采用分类。该技术将音频分为小帧,并在这些帧上单独执行分类。在本文中,我们提出了一种新颖的方法,叫您只听一次(Yoho),该方法受到计算机视觉中普遍采用的Yolo算法的启发。我们将声学边界的检测转换为回归问题,而不是基于框架的分类。这是通过具有单独的输出神经元来检测音频类的存在并预测其起点和终点来完成的。与最先进的卷积复发性神经网络相比,Yoho的F量的相对改善范围从多个数据集中的1%到6%不等,以进行音频分段和声音事件检测。由于Yoho的输出更端到端,并且可以预测的神经元更少,因此推理速度的速度至少比逐个分类快6倍。另外,由于这种方法可以直接预测声学边界,因此后处理和平滑速度约为7倍。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself -the correspondence between the visual and the audio streams, and we introduce a novel "Audio-Visual Correspondence" learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good visual and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art selfsupervised approaches on ImageNet classification. We also demonstrate that the network is able to localize objects in both modalities, as well as perform fine-grained recognition tasks.
translated by 谷歌翻译
在本文中,我们评估了基于对抗示例的深度学习的AED系统。我们测试多个安全性关键任务的稳健性,实现为CNNS分类器,以及由Google制造的现有第三方嵌套设备,该模型运行自己的黑盒深度学习模型。我们的对抗示例使用由白色和背景噪声制成的音频扰动。这种干扰易于创建,以执行和再现,并且可以访问大量潜在的攻击者,甚至是非技术精明的攻击者。我们表明,对手可以专注于音频对抗性投入,使AED系统分类,即使我们使用少量给定类型的嘈杂干扰,也能实现高成功率。例如,在枪声课堂的情况下,我们在采用少于0.05白噪声水平时达到近100%的成功率。类似于以前通过工作的工作侧重于来自图像域以及语音识别域的对抗示例。然后,我们寻求通过对策提高分类器的鲁棒性。我们雇用了对抗性培训和音频去噪。我们表明,当应用于音频输入时,这些对策可以是分离或组合的,在攻击时,可以成功地产生近50%的近50%。
translated by 谷歌翻译
我们提出了一种新的四管齐下的方法,在文献中首次建立消防员的情境意识。我们构建了一系列深度学习框架,彼此之叠,以提高消防员在紧急首次响应设置中进行的救援任务的安全性,效率和成功完成。首先,我们使用深度卷积神经网络(CNN)系统,以实时地分类和识别来自热图像的感兴趣对象。接下来,我们将此CNN框架扩展了对象检测,跟踪,分割与掩码RCNN框架,以及具有多模级自然语言处理(NLP)框架的场景描述。第三,我们建立了一个深入的Q学习的代理,免受压力引起的迷失方向和焦虑,能够根据现场消防环境中观察和存储的事实来制定明确的导航决策。最后,我们使用了一种低计算无监督的学习技术,称为张量分解,在实时对异常检测进行有意义的特征提取。通过这些临时深度学习结构,我们建立了人工智能系统的骨干,用于消防员的情境意识。要将设计的系统带入消防员的使用,我们设计了一种物理结构,其中处理后的结果被用作创建增强现实的投入,这是一个能够建议他们所在地的消防员和周围的关键特征,这对救援操作至关重要在手头,以及路径规划功能,充当虚拟指南,以帮助迷彩的第一个响应者恢复安全。当组合时,这四种方法呈现了一种新颖的信息理解,转移和综合方法,这可能会大大提高消防员响应和功效,并降低寿命损失。
translated by 谷歌翻译
Deep Neural Networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multi-layer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the "importance" of individual pixels wrt the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012 and MIT Places data sets. Our main result is that the recently proposed Layer-wise Relevance Propagation (LRP) algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of neural network performance.
translated by 谷歌翻译
在这项工作中,我们详细描述了深度学习和计算机视觉如何帮助检测AirTender系统的故障事件,AirTender系统是售后摩托车阻尼系统组件。监测飞行员运行的最有效方法之一是在其表面上寻找油污渍。从实时图像开始,首先在摩托车悬架系统中检测到Airtender,然后二进制分类器确定Airtender是否在溢出油。该检测是在YOLO5架构的帮助下进行的,而分类是在适当设计的卷积神经网络油网40的帮助下进行的。为了更清楚地检测油的泄漏,我们用荧光染料稀释了荧光染料,激发波长峰值约为390 nm。然后用合适的紫外线LED照亮飞行员。整个系统是设计低成本检测设置的尝试。船上设备(例如迷你计算机)被放置在悬架系统附近,并连接到全高清摄像头框架架上。板载设备通过我们的神经网络算法,然后能够将AirTender定位并分类为正常功能(非泄漏图像)或异常(泄漏图像)。
translated by 谷歌翻译
The occurrence of vacuum arcs or radio frequency (rf) breakdowns is one of the most prevalent factors limiting the high-gradient performance of normal conducting rf cavities in particle accelerators. In this paper, we search for the existence of previously unrecognized features related to the incidence of rf breakdowns by applying a machine learning strategy to high-gradient cavity data from CERN's test stand for the Compact Linear Collider (CLIC). By interpreting the parameters of the learned models with explainable artificial intelligence (AI), we reverse-engineer physical properties for deriving fast, reliable, and simple rule-based models. Based on 6 months of historical data and dedicated experiments, our models show fractions of data with a high influence on the occurrence of breakdowns. Specifically, it is shown that the field emitted current following an initial breakdown is closely related to the probability of another breakdown occurring shortly thereafter. Results also indicate that the cavity pressure should be monitored with increased temporal resolution in future experiments, to further explore the vacuum activity associated with breakdowns.
translated by 谷歌翻译
由生物声监测设备组成的无线声传感器网络运行的专家系统的部署,从声音中识别鸟类物种将使许多生态价值任务自动化,包括对鸟类种群组成的分析或濒危物种的检测在环境感兴趣的地区。由于人工智能的最新进展,可以将这些设备具有准确的音频分类功能,其中深度学习技术出色。但是,使生物声音设备负担得起的一个关键问题是使用小脚印深神经网络,这些神经网络可以嵌入资源和电池约束硬件平台中。因此,这项工作提供了两个重型和大脚印深神经网络(VGG16和RESNET50)和轻量级替代方案MobilenetV2之间的批判性比较分析。我们的实验结果表明,MobileNetV2的平均F1得分低于RESNET50(0.789 vs. 0.834)的5 \%,其性能优于VGG16,其足迹大小近40倍。此外,为了比较模型,我们创建并公开了西部地中海湿地鸟类数据集,其中包括201.6分钟和5,795个音频摘录,摘录了20种特有鸟类的aiguamolls de l'empord \ e empord \`一个自然公园。
translated by 谷歌翻译
随着最近自动驾驶的进步,语音控制系统越来越多地被用作人车相互作用方法。该技术使驱动程序能够使用语音命令来控制车辆,并将在高级驾驶员辅助系统(ADA)中使用。前工作表明,Siri,Alexa和Cortana非常容易受到听不及的指挥攻击。这可以扩展到现实世界应用中的ADA,并且由于麦克风非线性,难以检测这种听不到的指挥威胁。在本文中,我们旨在通过使用相机视图来开发更实用的解决方案,以防御ADA通过多传感器检测其环境的声明命令攻击。为此,我们提出了一种新的多模式深度学习分类系统,以防御听不及的指挥攻击。我们的实验结果证实了建议的防御方法的可行性,最佳分类精度达到89.2%。代码是在https://github.com/itseg-mq/sensor-fusion-against-voiceCommand-attacks上获得的。
translated by 谷歌翻译
口吃是一种言语障碍,在此期间,语音流被非自愿停顿和声音重复打断。口吃识别是一个有趣的跨学科研究问题,涉及病理学,心理学,声学和信号处理,使检测很难且复杂。机器和深度学习的最新发展已经彻底彻底改变了语音领域,但是对口吃的识别受到了最小的关注。这项工作通过试图将研究人员从跨学科领域聚集在一起来填补空白。在本文中,我们回顾了全面的声学特征,基于统计和深度学习的口吃/不足分类方法。我们还提出了一些挑战和未来的指示。
translated by 谷歌翻译
第五代(5G)网络和超越设想巨大的东西互联网(物联网)推出,以支持延长现实(XR),增强/虚拟现实(AR / VR),工业自动化,自主驾驶和智能所有带来的破坏性应用一起占用射频(RF)频谱的大规模和多样化的IOT设备。随着频谱嘎嘎和吞吐量挑战,这种大规模的无线设备暴露了前所未有的威胁表面。 RF指纹识别是预约的作为候选技术,可以与加密和零信任安全措施相结合,以确保无线网络中的数据隐私,机密性和完整性。在未来的通信网络中,在这项工作中,在未来的通信网络中的相关性,我们对RF指纹识别方法进行了全面的调查,从传统观点到最近的基于深度学习(DL)的算法。现有的调查大多专注于无线指纹方法的受限制呈现,然而,许多方面仍然是不可能的。然而,在这项工作中,我们通过解决信号智能(SIGINT),应用程序,相关DL算法,RF指纹技术的系统文献综述来缓解这一点,跨越过去二十年的RF指纹技术的系统文献综述,对数据集和潜在研究途径的讨论 - 必须以百科全书的方式阐明读者的必要条件。
translated by 谷歌翻译
这项工作引入了图像分类器的注意机制和相应的深神经网络(DNN)结构,称为ISNET。在训练过程中,ISNET使用分割目标来学习如何找到图像感兴趣的区域并将注意力集中在其上。该提案基于一个新颖的概念,即在说明热图中的背景相关性最小化。它几乎可以应用于任何分类神经网络体系结构,而在运行时没有任何额外的计算成本。能够忽略背景的单个DNN可以替换分段者的通用管道,然后是分类器,更快,更轻。我们测试了ISNET的三种应用:Covid-19和胸部X射线中的结核病检测以及面部属性估计。前两个任务采用了混合培训数据库,并培养了快捷方式学习。通过关注肺部并忽略背景中的偏见来源,ISNET减少了问题。因此,它改善了生物医学分类问题中外部(分布外)测试数据集的概括,超越了标准分类器,多任务DNN(执行分类和细分),注意力门控神经网络以及标准段 - 分类管道。面部属性估计表明,ISNET可以精确地集中在面孔上,也适用于自然图像。 ISNET提出了一种准确,快速和轻的方法,可忽略背景并改善各种领域的概括。
translated by 谷歌翻译