许多现有数据挖掘算法使用直接在其模型中的特征值,使它们对用于测量/表示数据的单位/尺度敏感。已经提出了基于秩转换的数据的预处理作为克服这个问题的潜在解决方案。然而,在使用秩转换预处理后的结果数据均匀分布,这在许多数据挖掘应用中可能不是非常有用的。在本文中,我们基于多个子样本的级别提供了更好且有效的替代方案。我们称之为拟议的预处理技术为ARE |在子样本的集合中的平均排名。我们广泛使用的数据挖掘算法的经验结果,用于在各种数据集中进行分类和异常检测表明,ARE在特定于更加一致的任务方面会导致ares跨各种算法和数据集的结果。除此之外,它会导致大多数时间更好地或竞争的结果与最广泛使用的最大初始化和传统排名转换相比。
translated by 谷歌翻译
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
translated by 谷歌翻译
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.
translated by 谷歌翻译
A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. However, current biological networks are noisy, sparse, and incomplete, limiting our ability to create a holistic view of the biological system and understand the biological phenomena. Experimental identification of such interactions is both time-consuming and expensive. With the recent advancements in high-throughput data generation and significant improvement in computational power, various computational methods have been developed to predict novel interactions in the noisy network. Recently, deep learning methods such as graph neural networks have shown their effectiveness in modeling graph-structured data and achieved good performance in biomedical interaction prediction. However, graph neural networks-based methods require human expertise and experimentation to design the appropriate complexity of the model and significantly impact the performance of the model. Furthermore, deep graph neural networks face overfitting problems and tend to be poorly calibrated with high confidence on incorrect predictions. To address these challenges, we propose Bayesian model selection for graph convolutional networks to jointly infer the most plausible number of graph convolution layers (depth) warranted by data and perform dropout regularization simultaneously. Experiments on four interaction datasets show that our proposed method achieves accurate and calibrated predictions. Our proposed method enables the graph convolutional networks to dynamically adapt their depths to accommodate an increasing number of interactions.
translated by 谷歌翻译
Drone-camera based human activity recognition (HAR) has received significant attention from the computer vision research community in the past few years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Fusion (SWTF) module to utilize sparsely sampled video frames for obtaining global weighted temporal fusion outcome. The proposed SWTF is divided into two components. First, a temporal segment network that sparsely samples a given set of frames. Second, weighted temporal fusion, that incorporates a fusion of feature maps derived from optical flow, with raw RGB images. This is followed by base-network, which comprises a convolutional neural network module along with fully connected layers that provide us with activity recognition. The SWTF network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a significant margin.
translated by 谷歌翻译
端到端(E2E)模型已成为最新语音识别系统的默认选择。此类型号经过大量标记数据的培训,这些数据通常无法用于低资源语言。诸如自我监督学习和转移学习的诺言之类的技术尚未在培训准确的模型中有效。另一方面,在各种域和扬声器集合上收集标记的数据集非常昂贵。在这项工作中,我们通过公共资料中的印度语言,特别是来自印度广播电台的公共档案馆的印度语言的``采矿''文本和音频对展示了这些方法的廉价和有效替代方案。作为关键组件,我们将Needleman-Wunsch算法调整为与相应的音频片段对齐句子,并给定长音频和其转录本的PDF,同时由于OCR,无关紧要的文本和未转录的语音而对错误进行了强大的态度。因此,我们创建了Shrutilipi,这是一个数据集,其中包含超过6,400个小时的12个印度语言标签的音频,总计为495万个句子。平均而言,Shrutilipi导致2.3倍增加了公开可用的标签数据。我们在12种语言中与21种人类评估者建立了Shrutilipi的质量。我们还根据代表区域,说话者和提到的实体建立了Shrutilipi的多样性。值得注意的是,我们表明,将Shrutilipi添加到WAV2VEC模型的训练集中,导致在Indicsuperb基准上的7种语言中,平均降低了5.8 \%。对于具有最多基准的印地语(7),平均水平从18.8%下降到13.5%。这种改进扩展到有效的模型:对于构象异构体模型(比WAV2VEC小10倍),我们显示出2.3%的下降。最后,我们通过证明对其进行训练的模型对嘈杂的输入更强大,证明了Shrutilipi的多样性。
translated by 谷歌翻译
AI研究中的基石是创建和采用标准化培训和测试数据集,以指定最新模型的进度。一个特别成功的例子是用于培训和评估英语自然语言理解(NLU)模型的胶水数据集。围绕基于BERT的语言模型的大量研究围绕着胶水中NLU任务的性能改进。为了评估其他语言的语言模型,创建了几个特定语言的胶水数据集。语音语言理解(SLU)的领域遵循了类似的轨迹。大型自我监督模型(例如WAV2VEC2)的成功实现了具有相对易于访问的未标记数据的语音模型。然后可以在SLU任务(例如出色的基准测试)上评估这些模型。在这项工作中,我们将其扩展到通过释放Indicsuperb基准测试来指示语言。具体来说,我们做出以下三项贡献。 (i)我们收集了Kathbath,其中包含来自印度203个地区的1,218个贡献者的12个印度语言的1,684小时的标记语音数据。 (ii)使用Kathbath,我们在6个语音任务中创建基准:自动语音识别,扬声器验证,说话者识别(单声道/多),语言识别,逐个示例查询以及对12种语言的关键字发现。 (iii)在发布的基准测试中,我们与常用的基线Fbank一起训练和评估不同的自我监督模型。我们表明,在大多数任务上,特定于语言的微调模型比基线更准确,包括对于语言识别任务的76 \%差距。但是,对于说话者识别,在大型数据集上训练的自我监督模型证明了一个优势。我们希望Indicsuperb有助于发展印度语言的语音语言理解模型的进步。
translated by 谷歌翻译
我们为250k参数feedforward,流媒体,无状态关键字发现模型的所有组件的所有组件提出了一种新型的2阶段次级量化量化训练算法。对于第一阶段,我们使用tanh(。)在致密层的重量上使用非线性转换来调整最近提出的量化技术。在第二阶段,我们在网络的其余部分上使用线性量化方法,包括其他参数(偏见,增益,batchnorm),输入和激活。我们进行大规模实验,对26,000小时的去识别生产,远场和近场音频数据进行培训(对4,000小时的数据进行评估)。我们在两个嵌入式芯片组设置中组织结果:a)具有商品臂霓虹灯指令套件和8位容器,我们使用sub 8位权重(4、5、8位)和8位的精度,CPU和内存结果 - 网络其余部分的量化; b)具有现成的神经网络加速器,用于一系列重量位宽度(1和5位),同时提出准确性结果,我们预测记忆利用率的减少。在两种配置中,我们的结果都表明,提出的算法可以实现:a)以虚假拒绝率(FRR)的虚假检测率(FDR)在检测错误权衡(DET)曲线上具有完整浮点模型的操作点(det)曲线的奇偶校验。 ; b)计算和内存的显着降低,最大提高了CPU消耗量的3倍,并且记忆消耗改善了4倍以上。
translated by 谷歌翻译
尽管机器人可以在大量隔离任务上熟练,但在现实的动态环境中的机器人部署是一个具有挑战性的问题。原因之一是机器人很少配备强大的内省能力,这意味着他们不能总是以合理的方式处理失败。此外,手动诊断通常是一项繁琐的任务,需要技术人员具有相当多的机器人技能。在本文中,我们讨论了我们正在进行的努力 - 在Ropod项目的背景下 - 解决其中一些问题。特别是,我们(i)提出了我们早期开发机器人黑匣子的早期努力,并考虑一些使其设计复杂的因素,(ii)解释我们的组件和系统监控概念,(iii)将远程监控和实验的必要性描述为以及我们最初的执行这些尝试。我们的初步工作打开了一系列有希望的方向,使机器人在实践中更可用和可靠 - 不仅在Ropod的背景下,而且在更一般的意义上也是如此。
translated by 谷歌翻译
在本文中,我们提出了一种新的青光眼分类方法,该方法在最佳增强的视网膜图像特征上采用小波神经网络(WNN)。为了避免眼科医生对视网膜图像进行乏味和错误的手动分析,计算机辅助诊断(CAD)实质上有助于强大的诊断。我们的目标是以新的方法引入CAD系统。视网膜图像质量改进尝试分为两个阶段。视网膜图像预处理阶段通过基于分位数的直方图修饰来改善图像的亮度和对比度。其次是图像增强阶段,该阶段涉及使用图像特异性动态结构元素以进行视网膜结构富集。基于图形的视网膜图像特征在本地图结构(LGS)和图形最短路径(GSP)统计数据以及增强视网膜数据集的统计特征以及统计特征中提取。 WNN用于将青光眼视网膜图像与合适的小波活化函数分类。将WNN分类器的性能与具有各种数据集的多层感知器神经网络进行了比较。结果表明,我们的方法优于现有方法。
translated by 谷歌翻译