Interpretability has emerged as a crucial aspect of machine learning, aimed at providing insights into the working of complex neural networks. However , existing solutions vary vastly based on the nature of the interpretability task, with each use case requiring substantial time and effort. This paper introduces MARGIN, a simple yet general approach to address a large set of interpretability tasks ranging from identifying prototypes to explaining image predictions. MARGIN exploits ideas rooted in graph signal analysis to determine influential nodes in a graph, which are defined as those nodes that maximally describe a function defined on the graph. By carefully defining task-specific graphs and functions, we demonstrate that MARGIN outperforms existing approaches in a number of disparate interpretability challenges.
translated by 谷歌翻译
In social settings, individuals interact through webs of relationships. Eachindividual is a node in a complex network (or graph) of interdependencies andgenerates data, lots of data. We label the data by its source, or formallystated, we index the data by the nodes of the graph. The resulting signals(data indexed by the nodes) are far removed from time or image signals indexedby well ordered time samples or pixels. DSP, discrete signal processing,provides a comprehensive, elegant, and efficient methodology to describe,represent, transform, analyze, process, or synthesize these well ordered timeor image signals. This paper extends to signals on graphs DSP and its basictenets, including filters, convolution, z-transform, impulse response, spectralrepresentation, Fourier transform, frequency response, and illustrates DSP ongraphs by classifying blogs, linear predicting and compressing data fromirregularly located weather stations, or predicting behavior of customers of amobile service provider.
translated by 谷歌翻译
In applications such as social, energy, transportation , sensor, and neuronal networks, high-dimensional data naturally reside on the vertices of weighted graphs. The emerging field of signal processing on graphs merges algebraic and spectral graph theoretic concepts with computational harmonic analysis to process such signals on graphs. In this tutorial overview, we outline the main challenges of the area, discuss different ways to define graph spectral domains, which are the analogues to the classical frequency domain, and highlight the importance of incorporating the irregular structures of graph data domains when processing signals on graphs. We then review methods to generalize fundamental operations such as filtering, translation, modulation, dilation, and downsampling to the graph setting, and survey the localized, multiscale transforms that have been proposed to efficiently extract information from high-dimensional data on graphs. We conclude with a brief discussion of open issues and possible extensions.
translated by 谷歌翻译
由于存在多视图信息源,现代数据分析流程变得越来越复杂。虽然图形在模拟复杂关系方面是有效的,但在许多情况下,单个图形很难简洁地表示所有交互,因此多层图形变得流行。虽然这会导致更丰富的表示,但是从单图案例中扩展解决方案并不简单。因此,在多层案例中,迫切需要新的解决方案来解决诸如节点分类之类的经典问题。在本文中,我们考虑了多层图的半监督学习问题。虽然深度网络嵌入,例如DeepWalk被广泛用于社区发现,我们认为使用图形神经网络的随机节点属性的特征学习可以更有效。为此,我们建议使用注意力模型进行有效的特征学习,并开发利用层间依赖性构建多层图嵌入的双层体系结构GrAMME-SG和GrAMME-Fusion。通过对几个基准数据集的经验研究,我们评估了所提出的方法,并在比较最先进的网络嵌入策略时证明了显着的性能改进。结果还表明,即使在explicitnode属性不可用的情况下,使用简单的随机特征也是一种有效的选择。
translated by 谷歌翻译
Mining useful clusters from high dimensional data has received significant attention of the computer vision and pattern recognition community in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of dimensionality. However, often such methods are accompanied with three different problems: high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix factorization methods) and susceptibility to gross corruptions in the data. In this paper we propose a principal component analysis (PCA) based solution that overcomes these three issues and approximates a low-rank recovery method for high dimensional datasets. We target the low-rank recovery by enforcing two types of graph smoothness assumptions, one on the data samples and the other on the features by designing a convex optimization problem. The resulting algorithm is fast, efficient and scalable for huge datasets with O(n log(n)) computational complexity in the number of data samples. It is also robust to gross corruptions in the dataset as well as to the model parameters. Clustering experiments on 7 benchmark datasets with different types of corruptions and background separation experiments on 3 video datasets show that our proposed model outperforms 10 state-of-the-art dimensionality reduction models. Our theoretical analysis proves that the proposed model is able to recover approximate low-rank representations with a bounded error for clusterable data.
translated by 谷歌翻译
We consider the problem of selecting an optimal mask for an image manifold, i.e., choosing a subset of the pixels of the image that preserves the manifold's geometric structure present in the original data. Such masking implements a form of compressive sensing through emerging imaging sensor platforms for which the power expense grows with the number of pixels acquired. Our goal is for the manifold learned from masked images to resemble its full image counterpart as closely as possible. More precisely, we show that one can indeed accurately learn an image manifold without having to consider a large majority of the image pixels. In doing so, we consider two masking methods that preserve the local and global geometric structure of the manifold, respectively. In each case, the process of finding the optimal masking pattern can be cast as a binary integer program, which is computationally expensive but can be approximated by a fast greedy algorithm. Numerical experiments show that the relevant manifold structure is preserved through the data-dependent masking process, even for modest mask sizes.
translated by 谷歌翻译
我们提出了一种新的框架,通过对齐相关的内在维度来组合数据集。我们的方法假设两个数据集来自一个共同的潜在空间,即它们测量等效系统。因此,我们期望存在与这些数据集的内在几何相关联的数据流形的自然(尽管未知)对齐,这些数据集受到测量的干扰。采样过程中的工件。重要的是,我们不承担数据点之间的任何个人通信(部分或完整)。相反,我们依赖于我们的假设,即数据特征的子集在数据集之间具有对应性。我们利用这个假设来估计内部流形维度之间的关系,这是由每个数据集上的扩散图坐标给出的。我们通过考虑相应数据特征的图(或流形)傅立叶系数来计算数据集的扩散坐标之间的相关矩阵。然后,我们将该相关矩阵正交化,以在数据集的扩散图之间形成等距变换。最后,我们将此变换应用于扩散坐标,并将数据集的统一扩散几何结构构建在一起。我们表明,这种方法成功地纠正了错位工件并实现了数据集成。
translated by 谷歌翻译
Reed-Xiaoli探测器(RXD)被认为是用于图像异常检测的基准算法,但它提出了已知的局限性,即多变量高斯模型后的图像依赖性,高维协方差矩阵的估计和反演以及无法有效地包括空间感知在评估中。在这项工作中,提出了一种基于小图的图像异常检测问题的解决方案;利用图形傅立叶变换,我们能够克服一些RDD的局限性,同时降低计算成本。使用合成和再现法测试高光谱和医学图像,证明所提出的技术能够通过现有技术中的其他算法获得显着的增益性能。
translated by 谷歌翻译
In this work we propose the construction of two-channel wavelet filterbanks for analyzing functions defined on the vertices of any arbitrary finite weighted undirected graph. These graph based functions are referred to as graph-signals as we build a framework in which many concepts from the classical signal processing domain, such as Fourier decomposition, signal filtering and downsampling can be extended to graph domain. Especially, we observe a spectral folding phenomenon in bipartite graphs which occurs during downsampling of these graphs and produces aliasing in graph signals. This property of bipartite graphs, allows us to design critically sampled two-channel filterbanks, and we propose quadrature mirror filters (referred to as graph-QMF) for bipartite graph which cancel aliasing and lead to perfect reconstruction. For arbitrary graphs we present a bipartite subgraph decomposition which produces an edge-disjoint collection of bipartite subgraphs. Graph-QMFs are then constructed on each bipartite subgraph leading to "multi-dimensional" separable wavelet filterbanks on graphs. Our proposed filterbanks are critically sampled and we state necessary and sufficient conditions for orthogonality, aliasing cancellation and perfect reconstruction. The filterbanks are realized by Chebychev polynomial approximations.
translated by 谷歌翻译
We study the problem of selecting the best sampling set for bandlimited reconstruction of signals on graphs. A frequency domain representation for graph signals can be defined using the eigenvectors and eigenvalues of variation operators that take into account the underlying graph connectivity. Smoothly varying signals defined on the nodes are of particular interest in various applications, and tend to be approximately bandlimited in the frequency basis. Sampling theory for graph signals deals with the problem of choosing the best subset of nodes for reconstructing a bandlimited signal from its samples. Most approaches to this problem require a computation of the frequency basis (i.e., the eigenvectors of the variation operator), followed by a search procedure using the basis elements. This can be impractical, in terms of storage and time complexity, for real datasets involving very large graphs. We circumvent this issue in our formulation by introducing quantities called graph spectral proxies, defined using the powers of the variation operator, in order to approximate the spectral content of graph signals. This allows us to formulate a direct sampling set selection approach that does not require the computation and storage of the basis elements. We show that our approach also provides stable reconstruction when the samples are noisy or when the original signal is only approximately bandlimited. Furthermore, the proposed approach is valid for any choice of the variation operator, thereby covering a wide range of graphs and applications. We demonstrate its effectiveness through various numerical experiments.
translated by 谷歌翻译
最近邻(NN)图拉普拉斯矩阵的特征分解是谱聚类的主要计算瓶颈。在这项工作中,我们引入了一种高度可扩展的,保持频谱的图形稀疏化算法,该算法能够构建超稀疏NN(u-NN)图,保证保留原始图谱,例如原始图拉普拉斯算子的第一个feweigenvector。我们的方法可以立即实现大型数据网络的可扩展频谱聚类,而不会牺牲解决方案的质量。所提出的方法从构造来自原始图的低拉伸跨越树(LSST)开始,随后通过利用光谱离树嵌入方案,在LSST之后以“光谱临界”的树外边缘的一小部分恢复。为了确定要恢复到LSST的合适的离树边缘量,提出了特征值稳定性检验方案,该方案能够鲁棒地保留稀疏图中的前几个拉普拉斯特征向量。另外,提出了一种增量图密度化方案来识别额外边缘。在原始NN图中已经缺失但仍可在光谱聚类任务中发挥重要作用。我们对各种众所周知的数据集的实验结果表明,所提出的方法可以比较地降低NN图的复杂度,从而导致频谱聚类的显着加速。
translated by 谷歌翻译
The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the data sets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP-based graph inference methods, and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data.
translated by 谷歌翻译
使用预测模型来识别可作为不同神经病理学病症的生物标志物的模式正变得非常普遍。在本文中,我们考虑了自闭症谱系障碍(ASD)分类的问题,其中先前的研究表明,将各种各样的元特征(如社会文化特征)纳入预测模型可能是有益的。基于图形的方法自然适合这些场景,其中acontextual图捕获表征群体的特征,而特定的大脑活动模式被用作thenodes的多变量信号。图形神经网络已经显示出对图形结构数据的推理的改进。虽然底层图强烈地反映了整体表现,但在实践中没有系统的方法来选择合适的图,从而使预测模型不具有鲁棒性。为了解决这个问题,我们提出了一种图形卷积神经网络(G-CNN)的自举版本,它利用了一组弱训练的G-CNN,并降低了模型对图形结构选择的敏感性。我们在具有挑战性的自闭症脑成像数据交换(ABIDE)数据集上证明了它的有效性,并表明我们的方法改进了最近提出的基于图形的神经网络。我们还表明,我们的方法对噪声图更加稳健。
translated by 谷歌翻译
卷积神经网络是图像和音频识别任务中非常有效的架构,这要归功于它们能够在其域上利用信号类的局部翻译不变性。在本文中,我们考虑了CNNs可能在没有翻译组作用的情况下对更多generaldomains上定义的信号的概括。特别是,我们提出了两种结构,一种是基于域的层次聚类,另一种是基于图拉普拉斯算子的光谱。我们通过实验表明,对于低维图,可以使用与输入大小无关的许多参数来学习卷积层,从而实现高效的深层架构。
translated by 谷歌翻译
我们介绍了一种新的谐波分析,用于在强连通有向图的顶点上定义的函数,其中随机游走算子是基石。作为第一步,我们将随机游动算子的特征向量集视为函数重定向图的非正交傅里叶型基。我们通过将从Dirichletenergy获得的随机游走算子的特征向量的变化与其相关特征值的实部相关联,找到了频率解释。从这个Fourierbasis,我们可以继续进行并对有向图进行多尺度分析。我们通过Coifman和Maggionifor有向图扩展扩散小波框架,提出了冗余小波变换和抽取小波变换。因此,我们对有向图的谐波分析的发展使我们考虑了应用于有向图的图上的半监督学习问题和信号建模问题,突出了我们框架的效率。
translated by 谷歌翻译
驻留在图的不同部分中的节点可以在其本地网络拓扑内具有类似的结构。识别这些角色提供了对网络组织的关键洞察力,并可用于各种机器学习任务。然而,学习节点的结构表示是一个具有挑战性的问题,并且通常涉及为每个节点手动指定和定制拓扑特征。在本文中,我们开发GraphWave,这是一种通过利用热小波扩散模式通过低维嵌入来表示每个节点的网络邻域的方法。而不是对手工选择的特征进行训练,GraphWave以无人监督的方式学习这些嵌入。我们在数学上证明具有相似网络邻域的节点将具有类似的GraphWave嵌入,尽管这些节点可能驻留在网络的非常不同的部分中,并且我们的方法与边缘的数量线性地缩放。在各种不同设置中的实验证明了GraphWave在网络中捕获结构角色的真实潜力,并且我们的方法在每个实验中都优于现有的最先进的基线,高达137%。
translated by 谷歌翻译
可以利用数据固有结构的机器学习模型具有突出的优势。特别是,由于其在多个领域的广泛应用,因此图形结构数据的深度学习解决方案激增。图形关注网络(GAT)是图表中广泛类别的特征学习模型的最新成员,它利用注意机制有效地学习半监督学习问题的连续向量表示。在本文中,我们对GAT模型进行了详细分析,并对其行为提出了有趣的见解。特别是,我们表明这些模型容易受到攻击者(流氓节点)的影响,因此我们会提出正则化策略,以提高GAT模型的鲁棒性。使用基准数据集,我们使用提出的GAT强大变体证明了半监督学习的性能改进。
translated by 谷歌翻译
Spectral methods for mesh processing and analysis rely on the eigenvalues, eigenvectors, or eigenspace projections derived from appropriately defined mesh operators to carry out desired tasks. Early work in this area can be traced back to the seminal paper by Taubin in 1995, where spectral analysis of mesh geometry based on a combinatorial Laplacian aids our understanding of the low-pass filtering approach to mesh smoothing. Over the past fifteen years, the list of applications in the area of geometry processing which utilize the eigenstructures of a variety of mesh operators in different manners have been growing steadily. Many works presented so far draw parallels from developments in fields such as graph theory, computer vision, machine learning, graph drawing, numerical linear algebra, and high-performance computing. This paper aims to provide a comprehensive survey on the spectral approach, focusing on its power and versatility in solving geometry processing problems and attempting to bridge the gap between relevant research in computer graphics and other fields. Necessary theoretical background is provided. Existing works covered are classified according to different criteria: the operators or eigenstructures employed, application domains, or the dimensionality of the spectral embeddings used. Despite much empirical success, there still remain many open questions pertaining to the spectral approach. These are discussed as we conclude the survey and provide our perspective on possible future research.
translated by 谷歌翻译
In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsuper-vised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.
translated by 谷歌翻译
We propose a novel method for constructing wavelet transforms of functionsdefined on the vertices of an arbitrary finite weighted graph. Our approach isbased on defining scaling using the the graph analogue of the Fourier domain,namely the spectral decomposition of the discrete graph Laplacian $\L$. Given awavelet generating kernel $g$ and a scale parameter $t$, we define the scaledwavelet operator $T_g^t = g(t\L)$. The spectral graph wavelets are then formedby localizing this operator by applying it to an indicator function. Subject toan admissibility condition on $g$, this procedure defines an invertibletransform. We explore the localization properties of the wavelets in the limitof fine scales. Additionally, we present a fast Chebyshev polynomialapproximation algorithm for computing the transform that avoids the need fordiagonalizing $\L$. We highlight potential applications of the transformthrough examples of wavelets on graphs corresponding to a variety of differentproblem domains.
translated by 谷歌翻译