以图形为中心的人工智能(Graph AI)在建模自然界中普遍存在的相互作用系统(从生物学的动态系统到粒子物理学)方面取得了显着成功。数据的异质性的增加,需要对可以结合多种电感偏见的图形神经体系结构。但是,将来自各种来源的数据组合起来是具有挑战性的,因为适当的归纳偏差可能会因数据模式而异。多模式学习方法融合了多个数据模式,同时利用跨模式依赖性来应对这一挑战。在这里,我们调查了以图形为中心的AI的140项研究,并意识到,使用图越来越多地将各种数据类型汇集在一起​​,并将其馈入复杂的多模型模型。这些模型分为图像,语言和知识接地的多模式学习。我们提出了基于此分类的多模式图学习的算法蓝图。该蓝图是通过选择适当的四个不同组件来处理多模式数据的最先进架构的方法。这项工作可以为标准化精致的多模式体系结构的设计铺平道路,以解决高度复杂的现实世界问题。
translated by 谷歌翻译
图形神经网络(GNN)已成功用于许多涉及图形结构数据的问题,从而实现了最新的性能。 GNN通常采用消息通话方案,其中每个节点都使用置换不变的聚合函数从其邻居中汇总信息。标准良好的选择(例如平均值或总和函数)具有有限的功能,因为它们无法捕获邻居之间的相互作用。在这项工作中,我们使用信息理论框架正式化了这些交互,该框架特别包括协同信息。在此定义的驱动下,我们介绍了图排序注意(山羊)层,这是一种新型的GNN组件,可捕获邻域中的节点之间的相互作用。这是通过通过注意机制学习局部节点顺序并使用复发性神经网络聚合器来处理订购表示的来实现的。这种设计使我们能够利用置换敏感的聚合器,同时维持所提出的山羊层的排列量表。山羊模型展示了其在捕获复杂信息(例如中心中心性和节点的有效大小)中的建模图指标中提高的性能。在实用用例中,通过在几个现实世界节点分类基准中成功证实了其出色的建模能力。
translated by 谷歌翻译
Graph AutoCododers(GAE)和变分图自动编码器(VGAE)作为链接预测的强大方法出现。他们的表现对社区探测问题的印象不那么令人印象深刻,根据最近和同意的实验评估,它们的表现通常超过了诸如louvain方法之类的简单替代方案。目前尚不清楚可以通过GAE和VGAE改善社区检测的程度,尤其是在没有节点功能的情况下。此外,不确定是否可以在链接预测上同时保留良好的性能。在本文中,我们表明,可以高精度地共同解决这两个任务。为此,我们介绍和理论上研究了一个社区保留的消息传递方案,通过在计算嵌入空间时考虑初始图形结构和基于模块化的先验社区来掺杂我们的GAE和VGAE编码器。我们还提出了新颖的培训和优化策略,包括引入一个模块化的正规器,以补充联合链路预测和社区检测的现有重建损失。我们通过对各种现实世界图的深入实验验证,证明了方法的经验有效性,称为模块化感知的GAE和VGAE。
translated by 谷歌翻译
Graph神经网络(GNN)最近已成为使用图的机器学习的主要范式。对GNNS的研究主要集中于消息传递神经网络(MPNNS)的家族。与同构的Weisfeiler-Leman(WL)测试类似,这些模型遵循迭代的邻域聚合过程以更新顶点表示,并通过汇总顶点表示来更新顶点图表。尽管非常成功,但在过去的几年中,对MPNN进行了深入的研究。因此,需要新颖的体系结构,这将使该领域的研究能够脱离MPNN。在本文中,我们提出了一个新的图形神经网络模型,即所谓的$ \ pi $ -gnn,该模型学习了每个图的“软”排列(即双随机)矩阵,从而将所有图形投影到一个共同的矢量空间中。学到的矩阵在输入图的顶点上强加了“软”顺序,并基于此顺序,将邻接矩阵映射到向量中。这些向量可以被送入完全连接或卷积的层,以应对监督的学习任务。在大图的情况下,为了使模型在运行时间和记忆方面更有效,我们进一步放松了双随机矩阵,以使其排列随机矩阵。我们从经验上评估了图形分类和图形回归数据集的模型,并表明它与最新模型达到了性能竞争。
translated by 谷歌翻译
神经网络是人工智能的巅峰之作,因为近年来我们目睹了许多新颖的体系结构,学习和优化技术的深度学习。利用这一事实是,神经网络固有地构成神经元之间的多部分图,我们旨在直接分析其结构,以提取有意义的信息,以改善学习过程。对于我们的知识图挖掘技术,尚未对神经网络中的学习进行增强。在本文中,我们为从深度学习体系结构中提取的完整加权多部分图的K核结构提出了一个改编版本。由于多方图是两分图的组合,而两分图的组合是超图的起点图,因此我们设计了k-hypercore分解,这是k核退化性的超图类似物。我们将K-Hypercore应用于几个神经网络体系结构,更具体地用于卷积神经网络和多层感知,以进行非常短的训练后的图像识别任务。然后,我们使用了由神经元的超核数量提供的信息来重新定位神经网络的权重,从而偏向梯度优化方案。广泛的实验证明,K-Hypercore的表现优于最新初始化方法。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/
translated by 谷歌翻译
In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.
translated by 谷歌翻译
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $\gamma$ goes to $1$, and moreover, when $\gamma$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
translated by 谷歌翻译
The automated segmentation and tracking of macrophages during their migration are challenging tasks due to their dynamically changing shapes and motions. This paper proposes a new algorithm to achieve automatic cell tracking in time-lapse microscopy macrophage data. First, we design a segmentation method employing space-time filtering, local Otsu's thresholding, and the SUBSURF (subjective surface segmentation) method. Next, the partial trajectories for cells overlapping in the temporal direction are extracted in the segmented images. Finally, the extracted trajectories are linked by considering their direction of movement. The segmented images and the obtained trajectories from the proposed method are compared with those of the semi-automatic segmentation and manual tracking. The proposed tracking achieved 97.4% of accuracy for macrophage data under challenging situations, feeble fluorescent intensity, irregular shapes, and motion of macrophages. We expect that the automatically extracted trajectories of macrophages can provide pieces of evidence of how macrophages migrate depending on their polarization modes in the situation, such as during wound healing.
translated by 谷歌翻译