这项工作介绍了一种基于高准确性EMG的手势识别的方法。一种新开发的深度学习方法,即,深层残留的收缩网络用于执行手势识别。基于手势引起的EMG信号的特征,进行了优化以提高识别精度。最后,应用三种不同的算法将EMG信号识别的准确性与DRSN的精度进行比较。结果表明,DRSN在EMG识别准确性方面表现出传统的神经网络。本文提供了一种对EMG信号进行分类以及探索DRSN可能应用的可靠方法。
translated by 谷歌翻译
Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify (Australian) birds from publicly available audio files of their birdsong. We present approaches used for data processing and augmentation and compare the results of various state of the art machine learning models. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58%
translated by 谷歌翻译
我们将图形神经网络训练来自小工具N体模拟的光晕目录的神经网络,以执行宇宙学参数的无现场级别可能的推断。目录包含$ \ Lessim $ 5,000 HAROS带质量$ \ gtrsim 10^{10} 〜h^{ - 1} m_ \ odot $,定期卷为$(25〜H^{ - 1} {\ rm mpc}){\ rm mpc}) ^3 $;目录中的每个光环都具有多种特性,例如位置,质量,速度,浓度和最大圆速度。我们的模型构建为置换,翻译和旋转的不变性,不施加最低限度的规模来提取信息,并能够以平均值来推断$ \ omega _ {\ rm m} $和$ \ sigma_8 $的值$ \ sim6 \%$的相对误差分别使用位置加上速度和位置加上质量。更重要的是,我们发现我们的模型非常强大:他们可以推断出使用数千个N-n-Body模拟的Halo目录进行测试时,使用五个不同的N-进行测试时,在使用Halo目录进行测试时,$ \ omega _ {\ rm m} $和$ \ sigma_8 $身体代码:算盘,Cubep $^3 $ M,Enzo,PKDGrav3和Ramses。令人惊讶的是,经过培训的模型推断$ \ omega _ {\ rm m} $在对数千个最先进的骆驼水力动力模拟进行测试时也可以使用,该模拟使用四个不同的代码和子网格物理实现。使用诸如浓度和最大循环速度之类的光环特性允许我们的模型提取更多信息,而牺牲了模型的鲁棒性。这可能会发生,因为不同的N体代码不会在与这些参数相对应的相关尺度上收敛。
translated by 谷歌翻译
尽管基于计划的序列建模方法在连续控制方面表现出巨大的潜力,但由于高维空间中规划的高度计算复杂性和天生的困难,将它们扩展到高维状态序列仍然是一个开放的挑战。我们提出了轨迹自动编码计划器(TAP),这是一种基于计划的序列建模RL方法,可扩展到高州行动维度。使用状态条件矢量定量的变分自动编码器(VQ-VAE),点击模拟给定当前状态的轨迹的条件分布。当部署为RL代理时,TAP避免在高维连续动作空间中逐步计划,而是通过Beam Search寻找最佳的潜在代码序列。与$ o(d^3)$轨迹变压器的复杂性不同,TAP享受常数$ o(c)$规划有关州行动维度$ d $的计算复杂性。我们的经验评估还表明,随着维度的增长,TAP的表现越来越强。对于具有较高状态和动作维度的ADROIT机器人手动操纵任务,TAP超过了基于模型的方法,包括TT,其边距很大,并且还击败了强大的无模型参与者 - 批评基准。
translated by 谷歌翻译
随着对深度学习民主化的向往,在资源约束设备上实施基于变压器的自然语言处理(NLP)模型的需求越来越大,以实施低延迟和高准确性。现有的BERT修剪方法要求域专家启发手工制作超参数,以在模型大小,延迟和准确性之间取得平衡。在这项工作中,我们提出了AE-Bert,这是一种具有有效评估的自动和高效的BERT修剪框架,以选择“良好”子网络候选(高精度),鉴于整体修剪比率的约束。我们提出的方法不需要人类专家的经验,并且可以在许多NLP任务上取得更好的准确性能。我们关于一般语言理解评估(胶水)基准的实验结果表明,AE-Bert优于Bert $ _ {\ Mathrm {base}} $的最先进的(SOTA)手工制作的修剪方法。在QNLI和RTE上,我们获得75 \%和42.8%的总体修剪比,同时获得更高的精度。在MRPC上,我们的得分比SOTA高4.6,在相同的整体修剪比为0.5。在STS-B上,与SOTA手工制作的修剪方法相比,我们可以达到40 \%的修剪比,而Spearman相关性的损失非常小。实验结果还表明,在模型压缩之后,单个bert $ _ {\ mathrm {base}} $ coder的推理时间在xilinx alveo u200 fpga板上具有1.83 $ \ times $ speedup,与intel(r)xeon相比)Gold 5218(2.30GHz)CPU,它显示了部署BERT $ _ {\ MATHRM {base}} $模型在计算限制设备上生成的方法生成的子网的合理性。
translated by 谷歌翻译
制定了具有机器学习模拟(骆驼)项目的宇宙学和天体物理学,通过数千名宇宙的流体动力模拟和机器学习将宇宙学与天体物理学结合起来。骆驼包含4,233个宇宙学仿真,2,049个n-body和2,184个最先进的流体动力模拟,在参数空间中采样巨大的体积。在本文中,我们介绍了骆驼公共数据发布,描述了骆驼模拟的特性和由它们产生的各种数据产品,包括光环,次麦,银河系和空隙目录,功率谱,Bispectra,Lyman - $ \ Alpha $光谱,概率分布函数,光环径向轮廓和X射线光子列表。我们还释放了超过骆驼 - 山姆的数十亿个星系的目录:与Santa Cruz半分析模型相结合的大量N身体模拟。我们释放包含350多个Terabytes的所有数据,并包含143,922个快照,数百万光环,星系和摘要统计数据。我们提供有关如何访问,下载,读取和处理数据AT \ URL {https://camels.readthedocs.io}的进一步技术详细信息。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译