光谱图卷积网络(GCNS)是特别的深层模型,其目的在于将神经网络扩展到任意的不规则域。这些网络的原理包括使用Laplacians的特征分解突出图信号,然后在将所产生的滤波信号返回到输入图域之前在光谱域中实现滤波。然而,这些操作的成功高度依赖于主要手工制作的二手拉普拉斯人的相关性,这使得GCN明显次优。在本文中,我们介绍了一种新颖的光谱GCN,不仅可以仅限于通常的卷积参数,而且是拉普拉斯运营商。后者设计了“端到端”作为递归Chebyshev分解的一部分,其特殊性地传送了学习表示的差异和非差异性质 - 随着顺序和辨别力的增加 - 没有过分统计化训练有素的GCN。对基于骨架的动作识别的具有挑战性的任务进行了广泛的实验,展示了我们提出的拉普拉斯设计的泛化能力和表现优惠。不同的基线(建造在手工制作和其他学习的拉普拉斯人)以及相关工作。
translated by 谷歌翻译
学习图形卷积网络(GCNS)是一种新兴领域,其旨在将卷积操作概括为任意非常规域。特别地,与光谱域相比,在空间域操作的GCNS显示出优异的性能,但它们的成功高度依赖于如何定义输入图的拓扑。在本文中,我们向图表卷积网络介绍了一个新颖的框架,了解图形的拓扑属性。我们的方法的设计原理基于约束目标函数的优化,该函数不仅在GCNS中的常用卷积参数中学习,而且是传达这些图中最相关的拓扑关系的转换基础。基于骨架的动作识别的具有挑战性任务进行的实验表明,与手工图形设计以及相关工作相比,所提出的方法的优越性。
translated by 谷歌翻译
图表卷积网络(GCNS)旨在扩展深度学习,以任意不规则域,即图表。它们的成功高度依赖于如何定义输入图的拓扑结构,并且大多数现有的GCN架构依赖于预定义或手工制作的图形结构。在本文中,我们介绍了一种新的方法,该方法将输入图的拓扑(或连接)作为GCN设计的一部分。我们方法的主要贡献驻留在建立正交的连接基础上,以便在实现卷积之前通过其邻居优化节点。我们的方法还考虑了一个时剧性标准,它作为符合规范器,使学习基础和潜在的GCNS轻质,同时仍然非常有效。对基于骨架的手势识别的挑战性任务进行了实验,展示了学习GCNS W.R.T的高效率。相关工作。
translated by 谷歌翻译
In this paper, we design lightweight graph convolutional networks (GCNs) using a particular class of regularizers, dubbed as phase-field models (PFMs). PFMs exhibit a bi-phase behavior using a particular ultra-local term that allows training both the topology and the weight parameters of GCNs as a part of a single "end-to-end" optimization problem. Our proposed solution also relies on a reparametrization that pushes the mask of the topology towards binary values leading to effective topology selection and high generalization while implementing any targeted pruning rate. Both masks and weights share the same set of latent variables and this further enhances the generalization power of the resulting lightweight GCNs. Extensive experiments conducted on the challenging task of skeleton-based recognition show the outperformance of PFMs against other staple regularizers as well as related lightweight design methods.
translated by 谷歌翻译
Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. This formulation not only leads to greater expressive power but also stronger generalization capability. On two large datasets, Kinetics and NTU-RGBD, it achieves substantial improvements over mainstream methods.
translated by 谷歌翻译
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multiscale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model 1 outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
图形卷积网络(GCN)优于基于骨架的人类动作识别领域的先前方法,包括人类的互动识别任务。但是,在处理相互作用序列时,基于GCN的当前方法只需将两人骨架分为两个离散序列,然后以单人动作分类的方式分别执行图形卷积。这种操作忽略了丰富的交互信息,并阻碍了语义模式学习的有效空间关系建模。为了克服上述缺点,我们引入了一个新型的统一的两人图,代表关节之间的空间相互作用相关性。此外,提出了适当设计的图形标记策略,以使我们的GCN模型学习判别时空交互特征。实验显示了使用拟议的两人图形拓扑时的相互作用和单个动作的准确性提高。最后,我们提出了一个两人的图形卷积网络(2P-GCN)。提出的2P-GCN在三个相互作用数据集(SBU,NTU-RGB+D和NTU-RGB+D 120)的四个基准测试基准上获得了最新结果。
translated by 谷歌翻译
Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction. Our code is available at https://github.com/limaosen0/AS-GCN . 1
translated by 谷歌翻译
图表卷积网络(GCNS)的方法在基于骨架的动作识别任务上实现了高级性能。然而,骨架图不能完全代表骨架数据中包含的运动信息。此外,基于GCN的方法中的骨架图的拓扑是根据自然连接手动设置的,并且它为所有样本都固定,这不能很好地适应不同的情况。在这项工作中,我们提出了一种新的动态超图卷积网络(DHGCN),用于基于骨架的动作识别。 DHGCN使用超图来表示骨架结构,以有效利用人类关节中包含的运动信息。根据其移动动态地分配了骨架超图中的每个接头,并且我们模型中的超图拓扑可以根据关节之间的关系动态调整到不同的样本。实验结果表明,我们的模型的性能在三个数据集中实现了竞争性能:动力学 - 骨架400,NTU RGB + D 60和NTU RGB + D 120。
translated by 谷歌翻译
随着姿势估计和图形卷积网络的进步,基于骨架的两人互动识别一直在越来越多的关注。尽管准确性逐渐提高,但计算复杂性的提高使其在现实环境中更不切实际。由于常规方法不能完全代表体内关节之间的关系,因此仍然存在准确性改善的空间。在本文中,我们提出了一个轻巧的模型,以准确识别两人的交互。除了结合了中间融合的体系结构外,我们还引入了一种分解卷积技术,以减少模型的重量参数。我们还引入了一个网络流,该网络说明体内关节之间的相对距离变化以提高准确性。使用两个大规模数据集NTU RGB+D 60和120的实验表明,与常规方法相比,我们的方法同时达到了最高准确性和相对较低的计算复杂性。
translated by 谷歌翻译
Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them.Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field.
translated by 谷歌翻译
In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeletonbased action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
translated by 谷歌翻译
Research in Graph Signal Processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.
translated by 谷歌翻译
图形卷积网络由于非欧几里得数据的出色建模能力而广泛用于基于骨架的动作识别。由于图形卷积是局部操作,因此它只能利用短距离关节依赖性和短期轨迹,但无法直接建模遥远的关节关系和远程时间信息,这些信息对于区分各种动作至关重要。为了解决此问题,我们提出了多尺度的空间图卷积(MS-GC)模块和一个多尺度的时间图卷积(MT-GC)模块,以在空间和时间尺寸中丰富模型的接受场。具体而言,MS-GC和MT-GC模块将相应的局部图卷积分解为一组子图形卷积,形成了层次的残差体系结构。在不引入其他参数的情况下,该功能将通过一系列子图卷积处理,每个节点都可以与其邻域一起完成多个空间和时间聚集。因此,最终的等效接收场被扩大,能够捕获空间和时间域中的短期和远程依赖性。通过将这两个模块耦合为基本块,我们进一步提出了一个多尺度的空间时间图卷积网络(MST-GCN),该网络(MST-GCN)堆叠了多个块以学习有效的运动表示行动识别的运动表示。拟议的MST-GCN在三个具有挑战性的基准数据集(NTU RGB+D,NTU-1220 RGB+D和动力学 - 骨骼)上实现了出色的性能,用于基于骨架的动作识别。
translated by 谷歌翻译
本文提出了一种新的图形卷积运算符,称为中央差异图卷积(CDGC),用于基于骨架的动作识别。它不仅能够聚合节点信息,如vanilla图卷积操作,而且还可以介绍梯度信息。在不引入任何其他参数的情况下,CDGC可以在任何现有的图形卷积网络(GCN)中取代VANILLA图表卷积。此外,开发了一种加速版的CDGC,这大大提高了培训速度。两个流行的大型数据集NTU RGB + D 60和120的实验表明了所提出的CDGC的功效。代码可在https://github.com/iesymiao/cd-gcn获得。
translated by 谷歌翻译
图形卷积网络(GCN)已被证明是一个有力的概念,在过去几年中,已成功应用于许多领域的各种任务。在这项工作中,我们研究了为GCN定义铺平道路的理论,包括经典图理论的相关部分。我们还讨论并在实验上证明了GCN的关键特性和局限性,例如由样品的统计依赖性引起的,该图由图的边缘引入,这会导致完整梯度的估计值偏置。我们讨论的另一个限制是Minibatch采样对模型性能的负面影响。结果,在参数更新期间,在整个数据集上计算梯度,从而破坏了对大图的可扩展性。为了解决这个问题,我们研究了替代方法,这些方法允许在每次迭代中仅采样一部分数据,可以安全地学习良好的参数。我们重现了KIPF等人的工作中报告的结果。并提出一个灵感签名的实现,这是一种无抽样的minibatch方法。最终,我们比较了基准数据集上的两个实现,证明它们在半监督节点分类任务的预测准确性方面是可比的。
translated by 谷歌翻译
基于骨架的动作识别方法受到时空骨骼图的语义提取的限制。但是,当前方法在有效地结合时间和空间图尺寸的特征方面很难,一侧往往厚度厚,另一侧较薄。在本文中,我们提出了一个时间通道聚合图卷积网络(TCA-GCN),以动态有效地学习基于骨架动作识别的不同时间和通道维度中的空间和时间拓扑。我们使用时间聚合模块来学习时间维特征和通道聚合模块,以有效地将空间动态通道拓扑特征与时间动态拓扑特征相结合。此外,我们在时间建模上提取多尺度的骨骼特征,并将其与注意机制融合。广泛的实验表明,在NTU RGB+D,NTU RGB+D 120和NW-UCLA数据集上,我们的模型结果优于最先进的方法。
translated by 谷歌翻译
人类骨骼数据由于其背景鲁棒性和高效率而受到行动识别的越来越多。在基于骨架的动作识别中,图形卷积网络(GCN)已成为主流方法。本文分析了基于GCN的模型的基本因素 - 邻接矩阵。我们注意到,大多数基于GCN的方法基于人类天然骨架结构进行其邻接矩阵。根据我们以前的工作和分析,我们建议人类的自然骨骼结构邻接矩阵不适合基于骨架的动作识别。我们提出了一个新的邻接矩阵,该矩阵放弃了所有刚性邻居的连接,但使该模型可以适应地学习关节的关系。我们对两个基于骨架的动作识别数据集(NTURGBD60和FINEGYM)进行了验证模型进行广泛的实验和分析。全面的实验结果和分析表明,1)最广泛使用的人类天然骨骼结构邻接矩阵在基于骨架的动作识别中不适合; 2)所提出的邻接矩阵在模型性能,噪声稳健性和可传递性方面表现出色。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译