用于图像分类的深神经网络通常使用卷积过滤器来提取区分特征,然后再将其传递到线性分类器。大多数可解释性文献都集中在为卷积过滤器提供语义含义,以解释模型的推理过程,并确认其从输入域中使用相关信息。可以通过使用单数值分解分解其重量矩阵来研究完全连接的层,实际上研究每个矩阵中的行之间的相关性以发现地图的动力学。在这项工作中,我们为卷积层的重量张量定义了一个奇异的值分解,该分解器提供了对过滤器之间的相关性的类似理解,从而揭示了卷积图的动力学。我们使用随机矩阵理论中的最新结果来验证我们的定义。通过在图像分类网络的线性层上应用分解,我们建议一个框架,可以使用HyperGraphs应用可解释性方法来模型类别分离。我们没有寻找激活来解释网络,而是使用每个线性层具有最大相应奇异值的奇异向量来识别对网络最重要的特征。我们用示例说明了我们的方法,并介绍了本研究使用的分析工具DeepDataProfiler库。
translated by 谷歌翻译
可解释的人工智能(XAI)的新兴领域旨在为当今强大但不透明的深度学习模型带来透明度。尽管本地XAI方法以归因图的形式解释了个体预测,从而确定了重要特征的发生位置(但没有提供有关其代表的信息),但全局解释技术可视化模型通常学会的编码的概念。因此,两种方法仅提供部分见解,并留下将模型推理解释的负担。只有少数当代技术旨在将本地和全球XAI背后的原则结合起来,以获取更多信息的解释。但是,这些方法通常仅限于特定的模型体系结构,或对培训制度或数据和标签可用性施加其他要求,这实际上使事后应用程序成为任意预训练的模型。在这项工作中,我们介绍了概念相关性传播方法(CRP)方法,该方法结合了XAI的本地和全球观点,因此允许回答“何处”和“ where”和“什么”问题,而没有其他约束。我们进一步介绍了相关性最大化的原则,以根据模型对模型的有用性找到代表性的示例。因此,我们提高了对激活最大化及其局限性的共同实践的依赖。我们证明了我们方法在各种环境中的能力,展示了概念相关性传播和相关性最大化导致了更加可解释的解释,并通过概念图表,概念组成分析和概念集合和概念子区和概念子区和概念子集和定量研究对模型的表示和推理提供了深刻的见解。它们在细粒度决策中的作用。
translated by 谷歌翻译
为了分析多维数据的丰富,已经开发了张量的框架。传统上,矩阵奇异值分解(SVD)用于从包含矢量化数据的矩阵中提取最主导的特征。虽然SVD对可以适当表示为矩阵的数据非常有用,但是矢量化步骤导致我们丢失了数据内在的高维关系。为了便于高效的多维特征提取,我们利用了使用基于投影的分类算法,使用T-SVDM,矩阵SVD的张量模拟。我们的作品扩展了T-SVDM框架和分类算法,最初提出了所有数量的尺寸。然后,我们使用Starplus FMRI DataSet将此算法应用于分类任务。我们的数值实验表明,基于张于FMRI分类的卓越方法,而不是基于最佳的等效矩阵的方法。我们的结果说明了我们选择的张量框架的优势,提供了对参数的有益选择的洞察力,并且可以进一步开发用于分类更复杂的成像数据。我们在https://github.com/elizabethnewman/tensor-fmri提供我们的Python实现。
translated by 谷歌翻译
We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy, but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lower layers of the model. We exploit the redundancy present within the convolutional filters to derive approximations that significantly reduce the required computation. Using large state-of-the-art models, we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2×, while keeping the accuracy within 1% of the original model.
translated by 谷歌翻译
了解生物和人造网络的运作仍然是一个艰难而重要的挑战。为了确定一般原则,研究人员越来越有兴趣测量培训的大量网络,或者在培训或生物学地适应类似的任务。现在需要一种标准化的分析工具来确定网络级协变量 - 例如架构,解剖脑区和模型生物 - 影响神经表示(隐藏层激活)。在这里,我们通过定义量化代表性异化的广泛的公制空间,为这些分析提供严格的基础。使用本框架,我们根据规范相关分析修改现有的代表性相似度量,以满足三角形不等式,制定致扫描层中的感应偏差的新型度量,并识别使网络表示能够结合到基本上的近似的欧几里德嵌入物。货架机学习方法。我们展示了来自生物学(Allen Institute脑观测所)和深度学习(NAS-BENCH-101)的大规模数据集的这些方法。在这样做时,我们识别在解剖特征和模型性能方面可解释的神经表现之间的关系。
translated by 谷歌翻译
Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden-layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers, and we discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight into how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models.
translated by 谷歌翻译
Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.
translated by 谷歌翻译
在本文中,我们提出了解决稳定性和卷积神经网络(CNN)的稳定性和视野的问题的神经网络。作为提高网络深度或宽度以提高性能的替代方案,我们提出了与全球加权拉普拉斯,分数拉普拉斯和逆分数拉普拉斯算子有关的基于积分的空间非识别算子,其在物理科学中的几个问题中出现。这种网络的前向传播由部分积分微分方程(PIDE)启发。我们在自动驾驶中测试基准图像分类数据集和语义分段任务的提出神经架构的有效性。此外,我们调查了这些密集的运营商的额外计算成本以及提出神经网络的前向传播的稳定性。
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
神经网络经常将许多无关的概念包装到一个神经元中 - 一种令人困惑的现象被称为“多疾病”,这使解释性更具挑战性。本文提供了一个玩具模型,可以完全理解多义,这是由于模型在“叠加”中存储其他稀疏特征的结果。我们证明了相变的存在,与均匀多型的几何形状的令人惊讶的联系以及与对抗性例子联系的证据。我们还讨论了对机械解释性的潜在影响。
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译
小组卷积神经网络(G-CNN)是卷积神经网络(CNN)的概括,通过在其体系结构中明确编码旋转和排列,在广泛的技术应用中脱颖而出。尽管G-CNN的成功是由它们的\ emph {emplapicit}对称偏见驱动的,但最近的一项工作表明,\ emph {隐式}对特定体系结构的偏差是理解过度参数化神经网的概​​括的关键。在这种情况下,我们表明,通过梯度下降训练了二进制分类的$ L $ layer全宽线性G-CNN,将二进制分类收敛到具有低级别傅立叶矩阵系数的解决方案,并由$ 2/l $ -schatten矩阵规范正规化。我们的工作严格概括了先前对线性CNN的隐性偏差对线性G-CNN的隐性分析,包括所有有限组,包括非交换组的挑战性设置(例如排列),以及无限组的频段限制G-CNN 。我们通过在各个组上实验验证定理,并在经验上探索更现实的非线性网络,该网络在局部捕获了相似的正则化模式。最后,我们通过不确定性原理提供了对傅立叶空间隐式正则化的直观解释。
translated by 谷歌翻译
为了在深度学习中解释隐性正则化时,给予了矩阵和张量因子化的突出重点,这与简化的神经网络相对应。结果表明,这些模型分别表现出对低基质和张量排名的隐式趋势。当前的论文理论上绘制了更接近实际的深度学习,从理论上分析了分层张分解中的隐式正则化,该模型等同于某些深卷积神经网络。通过动态系统镜头,我们克服了与层次结构相关的挑战,并建立了对低层次张量级别的隐性正则化。这转化为相关卷积网络对区域的隐性正则化。受我们的理论的启发,我们设计了明确的正则化,阻碍了区域性,并证明了其在需要建筑变化的传统智慧的情况下,可以改善现代卷积网络在非本地任务上的性能。我们的工作突出了通过对其隐式正则化的理论分析来增强神经网络的潜力。
translated by 谷歌翻译
Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to wellinformed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.
translated by 谷歌翻译
对称性一直是探索广泛复杂系统的基本工具。在机器学习中,在模型和数据中都探索了对称性。在本文中,我们试图将模型家族架构引起的对称性与该家族的内部数据表示的对称性联系起来。我们通过计算一组基本的对称组来做到这一点,我们称它们称为模型的\ emph {Intertwiner组}。这些中的每一个都来自模型的特定非线性层,不同的非线性导致不同的对称组。这些组以模型的权重更改模型的权重,使模型所代表的基础函数保持恒定,但模型内部数据的内部表示可能会改变。我们通过一系列实验将Intertwiner组连接到模型的数据内部表示,这些实验在具有相同体系结构的模型之间探测隐藏状态之间的相似性。我们的工作表明,网络的对称性在该网络的数据表示中传播到对称性中,从而使我们更好地了解架构如何影响学习和预测过程。最后,我们推测,对于Relu网络,交织组可能会为在隐藏层而不是任意线性组合的激活基础上集中模型可解释性探索的共同实践提供理由。
translated by 谷歌翻译
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
translated by 谷歌翻译
这是一门专门针对STEM学生开发的介绍性机器学习课程。我们的目标是为有兴趣的读者提供基础知识,以在自己的项目中使用机器学习,并将自己熟悉术语作为进一步阅读相关文献的基础。在这些讲义中,我们讨论受监督,无监督和强化学习。注释从没有神经网络的机器学习方法的说明开始,例如原理分析,T-SNE,聚类以及线性回归和线性分类器。我们继续介绍基本和先进的神经网络结构,例如密集的进料和常规神经网络,经常性的神经网络,受限的玻尔兹曼机器,(变性)自动编码器,生成的对抗性网络。讨论了潜在空间表示的解释性问题,并使用梦和对抗性攻击的例子。最后一部分致力于加强学习,我们在其中介绍了价值功能和政策学习的基本概念。
translated by 谷歌翻译
最近有一项激烈的活动在嵌入非常高维和非线性数据结构的嵌入中,其中大部分在数据科学和机器学习文献中。我们分四部分调查这项活动。在第一部分中,我们涵盖了非线性方法,例如主曲线,多维缩放,局部线性方法,ISOMAP,基于图形的方法和扩散映射,基于内核的方法和随机投影。第二部分与拓扑嵌入方法有关,特别是将拓扑特性映射到持久图和映射器算法中。具有巨大增长的另一种类型的数据集是非常高维网络数据。第三部分中考虑的任务是如何将此类数据嵌入中等维度的向量空间中,以使数据适合传统技术,例如群集和分类技术。可以说,这是算法机器学习方法与统计建模(所谓的随机块建模)之间的对比度。在论文中,我们讨论了两种方法的利弊。调查的最后一部分涉及嵌入$ \ mathbb {r}^ 2 $,即可视化中。提出了三种方法:基于第一部分,第二和第三部分中的方法,$ t $ -sne,UMAP和大节。在两个模拟数据集上进行了说明和比较。一个由嘈杂的ranunculoid曲线组成的三胞胎,另一个由随机块模型和两种类型的节点产生的复杂性的网络组成。
translated by 谷歌翻译
Recent work has sought to understand the behavior of neural networks by comparing representations between layers and between different trained models. We examine methods for comparing neural network representations based on canonical correlation analysis (CCA). We show that CCA belongs to a family of statistics for measuring multivariate similarity, but that neither CCA nor any other statistic that is invariant to invertible linear transformation can measure meaningful similarities between representations of higher dimension than the number of data points. We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation. This similarity index is equivalent to centered kernel alignment (CKA) and is also closely connected to CCA. Unlike CCA, CKA can reliably identify correspondences between representations in networks trained from different initializations.
translated by 谷歌翻译
This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ≥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors.
translated by 谷歌翻译