translated by 谷歌翻译
贝叶斯网络中的精确推断非常棘手,并且对相应集团树(CT)中最大集团的大小具有指数依赖性,因此需要近似。基于因子的结合物大小的方法比基于结构的方法更准确,但是昂贵,因为它们涉及大量候选结构或区域图中的信念的推断。我们提出了一种基于增量的建筑 - 上方(ibia)范式的近似推断的替代方法,该方法将贝叶斯网络转换为包含一系列链接的集团森林(SLCTF)的数据结构,并由用户包围的集团尺寸 - 指定值。在此方法的增量构建阶段中,只要集团大小在指定的界限内,CTF是通过向CTF添加变量来逐步构建的。一旦达到集团尺寸约束,CTF中的CTS就会在IBIA的推断阶段进行校准。所得的集团信念在近似阶段使用,以获得较小的集团大小的近似CTF。近似CTF构成了序列中下一个CTF的起点。重复这些步骤,直到将所有变量添加到序列中的CTF中。我们证明,我们用于汇总树的增量结构的算法始终会产生有效的CT,并且我们的近似技术保留了一个集团内变量的共同信念。基于此,我们表明SLCTF数据结构可用于有效的分区功能以及先验和后边缘的近似推断。使用了500多个基准测试该方法,与其他近似方法相比,结果显示出具有竞争力的运行时的误差显着降低。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
即使机器学习算法已经在数据科学中发挥了重要作用,但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式,或数据集中的异质,分层或完全缺少的数据片段,因此很难应用此类方法。作为解决方案,我们提出了一个用于样本表示,模型定义和培训的多功能,统一的框架,称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲,为HMILL的关键组件的设计合理,我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论,该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性,它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外,我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张,我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中,我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中,基于建议的框架的解决方案可实现与专业方法相当的性能。
translated by 谷歌翻译
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
translated by 谷歌翻译
We argue the case for Gaussian Belief Propagation (GBP) as a strong algorithmic framework for the distributed, generic and incremental probabilistic estimation we need in Spatial AI as we aim at high performance smart robots and devices which operate within the constraints of real products. Processor hardware is changing rapidly, and GBP has the right character to take advantage of highly distributed processing and storage while estimating global quantities, as well as great flexibility. We present a detailed tutorial on GBP, relating to the standard factor graph formulation used in robotics and computer vision, and give several simulation examples with code which demonstrate its properties.
translated by 谷歌翻译
We present the Neural Satisfiability Network (NSNet), a general neural framework that models satisfiability problems as probabilistic inference and meanwhile exhibits proper explainability. Inspired by the Belief Propagation (BP), NSNet uses a novel graph neural network (GNN) to parameterize BP in the latent space, where its hidden representations maintain the same probabilistic interpretation as BP. NSNet can be flexibly configured to solve both SAT and #SAT problems by applying different learning objectives. For SAT, instead of directly predicting a satisfying assignment, NSNet performs marginal inference among all satisfying solutions, which we empirically find is more feasible for neural networks to learn. With the estimated marginals, a satisfying assignment can be efficiently generated by rounding and executing a stochastic local search. For #SAT, NSNet performs approximate model counting by learning the Bethe approximation of the partition function. Our evaluations show that NSNet achieves competitive results in terms of inference accuracy and time efficiency on multiple SAT and #SAT datasets.
translated by 谷歌翻译
众所周知,加入操作(尤其是N-Way,多到许多人的加入)是耗时和资源的。在大尺度上,关于桌子和联接量的大小,当前的最新方法(包括使用嵌套环/哈希/排序 - 合并算法的二进制加入算法,或者,或者,最糟糕的案例最佳连接算法(wojas)),甚至可能无法给定合理的资源和时间限制产生任何答案。在这项工作中,我们介绍了一种新的n-way qui-join处理方法,即图形结合(GJ)。关键想法是两个方面:首先,将物理连接计算问题映射到PGMS并引入调整的推理算法,该算法可以计算基于运行的编码(RLE)基于连接的汇总摘要,并需要实现结合结果所必需的所有统计信息。其次,也是最重要的是,要表明,像GJ这样的联接算法(像GJ一样)产生了上述联接介绍摘要,然后对其进行删除,可以在时空中引入巨大的性能优势。通过工作,TPCD和LASTFM数据集的加入查询进行了全面的实验,将GJ与PostgreSQL和MonetDB进行了比较,以及UMBRA系统中实现的最先进的WOJA。内存中加入计算的结果表明,性能改善的速度分别比PostgreSQL,MONETDB和UMBRA快64倍,388倍和6倍。对于磁盘加入计算,GJ的速度比PostgreSQL,MONETDB和UMBRA的速度分别高达820X,717X和165X。此外,GJ空间需求分别高达21,488倍,38,333倍和78,750倍,分别比PostgreSQL,MonetDB和Umbra小。
translated by 谷歌翻译
We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task? And how can clustering results be validated? Connectivity-based versus prototype-based approaches are reflected in the context of several popular methods: single-linkage, spectral embedding, k-means, and Gaussian mixtures are discussed as well as the density-based protocols (H)DBSCAN, Jarvis-Patrick, CommonNN, and density-peaks.
translated by 谷歌翻译
translated by 谷歌翻译
This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.
translated by 谷歌翻译
我们将反应性消息传递(RMP)作为框架,用于在概率模型的因子图表示中执行基于时间表,鲁棒和可扩展的消息通过的基于消息传递的推断。 RMP基于反应性编程风格,该样式仅描述因子图中的节点如何对连接节点中的更改作出反应。没有固定消息传递计划提高推理过程的稳健性,可伸缩性和执行时间。我们还存在ReactiveMp.jl,这是一个Julia包,用于通过最小化约束的自由能实现RMP。通过用户定义的本地表单和分解约束对变分后部分布的结构,ReastiveMp.jl执行混合消息传递算法,包括信仰传播,变分消息通过,期望传播和期望最大化更新规则。实验结果表明,与其他概率模型的贝叶斯推断的其他朱莉娅封装相比,基于Reactivemp的RMP的性能提高。特别是,我们表明RMP框架能够为大型概率状态空间模型运行贝叶斯人推断,并在标准膝上型计算机上具有数十万个随机变量。
translated by 谷歌翻译
The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences.This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds.The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed.
translated by 谷歌翻译
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.
translated by 谷歌翻译
translated by 谷歌翻译
This paper proposes a new 3D gas distribution mapping technique based on the local message passing of Gaussian belief propagation that is capable of resolving in real time, concentration estimates in 3D space whilst accounting for the obstacle information within the scenario, the first of its kind in the literature. The gas mapping problem is formulated as a 3D factor graph of Gaussian potentials, the connections of which are conditioned on local occupancy values. The Gaussian belief propagation framework is introduced as the solver and a new hybrid message scheduler is introduced to increase the rate of convergence. The factor graph problem is then redesigned as a dynamically expanding inference task, coupling the information of consecutive gas measurements with local spatial structure obtained by the robot. The proposed algorithm is compared to the state of the art methods in 2D and 3D simulations and is found to resolve distribution maps orders of magnitude quicker than typical direct solvers. The proposed framework is then deployed for the first time onboard a ground robot in a 3D mapping and exploration task. The system is shown to be able to resolve multiple sensor inputs and output high resolution 3D gas distribution maps in a GPS denied cluttered scenario in real time. This online inference of complicated plume structures provides a new layer of contextual information over its 2D counterparts and enables autonomous systems to take advantage of real time estimates to inform potential next best sampling locations.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译