空气污染监测平台在预防和减轻污染影响方面发挥着非常重要的作用。绘图信号处理领域的最新进展使得可以使用图表描述和分析空气污染监测网络。其中一个主要应用是使用传感器的子集重新重建图表中的测量信号。使用来自传感器邻居的信息重建信号可以有助于提高网络数据的质量,示例是用相关的相邻节点的缺失数据填充,或者校正与更准确的相邻传感器的漂移传感器。本文比较了各种类型的图形信号重建方法应用于西班牙空气污染参考站的真实数据集。所考虑的方法是拉普拉斯插值,曲线​​图信号处理低通基的曲线曲线信号重建,以及基于内核的曲线图信号重建,并在测量O3,NO2和PM10的实际空气污染数据集上进行比较。示出了重建污染物信号的方法的能力,以及该重建的计算成本。结果表明了基于基于内核的曲线图信号重建的方法的优越性,以及具有大量低成本传感器的空气污染监测网络中的方法的难度。但是,我们表明可以通过简单的方法克服可扩展性,例如使用聚类算法对网络进行分区。
translated by 谷歌翻译
使用机器学习技术校准低成本传感器是现在广泛使用的方法。虽然在部署低成本传感器的空气质量监测的低成本传感器中仍有许多挑战,但低成本传感器已被证明与高精度仪器相结合。因此,大多数研究专注于使用机器学习应用不同的校准技术。然而,这些模型的成功应用取决于传感器获得的数据的质量,并且已经从传感器采样和数据预处理到传感器本身的校准,从传感器采集过程中支付了很少的关注。在本文中,我们展示了主要的传感器采样参数,它们对基于机器学习的传感器校准的质量的相应影响及其对能源消耗的影响,因此显示了现有的权衡。最后,实验节点上的结果显示了数据采样策略在对流层臭氧,二氧化氮和一氧化氮低成本传感器的校准中的影响。具体地,我们展示了如何最小化感测子系统的占空比的采样策略可以降低功耗,同时保持数据质量。
translated by 谷歌翻译
Research in Graph Signal Processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.
translated by 谷歌翻译
无线传感器网络是当前时代最有前途的技术之一,因为它们的规模较小,成本较低和易于部署。随着无线传感器数量的增加,生成丢失数据的概率也会上升。如果用于决策,这种不完整的数据可能会导致灾难性后果。有很多关于这个问题的文献。但是,大多数方法显示出大量数据丢失时性能降解。受图形信号处理的新兴领域的启发,本文对无线传感器网络中的Sobolev重建算法进行了一项新研究。对几个公开数据集进行的实验比较表明,该算法超过多个最新技术的最大利润率为54%。我们进一步表明,即使在大规模数据丢失情况下,该算法也会始终检索丢失的数据。
translated by 谷歌翻译
最近有一项激烈的活动在嵌入非常高维和非线性数据结构的嵌入中,其中大部分在数据科学和机器学习文献中。我们分四部分调查这项活动。在第一部分中,我们涵盖了非线性方法,例如主曲线,多维缩放,局部线性方法,ISOMAP,基于图形的方法和扩散映射,基于内核的方法和随机投影。第二部分与拓扑嵌入方法有关,特别是将拓扑特性映射到持久图和映射器算法中。具有巨大增长的另一种类型的数据集是非常高维网络数据。第三部分中考虑的任务是如何将此类数据嵌入中等维度的向量空间中,以使数据适合传统技术,例如群集和分类技术。可以说,这是算法机器学习方法与统计建模(所谓的随机块建模)之间的对比度。在论文中,我们讨论了两种方法的利弊。调查的最后一部分涉及嵌入$ \ mathbb {r}^ 2 $,即可视化中。提出了三种方法:基于第一部分,第二和第三部分中的方法,$ t $ -sne,UMAP和大节。在两个模拟数据集上进行了说明和比较。一个由嘈杂的ranunculoid曲线组成的三胞胎,另一个由随机块模型和两种类型的节点产生的复杂性的网络组成。
translated by 谷歌翻译
In applications such as social, energy, transportation, sensor, and neuronal networks, high-dimensional data naturally reside on the vertices of weighted graphs. The emerging field of signal processing on graphs merges algebraic and spectral graph theoretic concepts with computational harmonic analysis to process such signals on graphs. In this tutorial overview, we outline the main challenges of the area, discuss different ways to define graph spectral domains, which are the analogues to the classical frequency domain, and highlight the importance of incorporating the irregular structures of graph data domains when processing signals on graphs. We then review methods to generalize fundamental operations such as filtering, translation, modulation, dilation, and downsampling to the graph setting, and survey the localized, multiscale transforms that have been proposed to efficiently extract information from high-dimensional data on graphs. We conclude with a brief discussion of open issues and possible extensions.
translated by 谷歌翻译
在过去十年中,图形内核引起了很多关注,并在结构化数据上发展成为一种快速发展的学习分支。在过去的20年中,该领域发生的相当大的研究活动导致开发数十个图形内核,每个图形内核都对焦于图形的特定结构性质。图形内核已成功地成功地在广泛的域中,从社交网络到生物信息学。本调查的目标是提供图形内核的文献的统一视图。特别是,我们概述了各种图形内核。此外,我们对公共数据集的几个内核进行了实验评估,并提供了比较研究。最后,我们讨论图形内核的关键应用,并概述了一些仍有待解决的挑战。
translated by 谷歌翻译
Data-driven neighborhood definitions and graph constructions are often used in machine learning and signal processing applications. k-nearest neighbor~(kNN) and $\epsilon$-neighborhood methods are among the most common methods used for neighborhood selection, due to their computational simplicity. However, the choice of parameters associated with these methods, such as k and $\epsilon$, is still ad hoc. We make two main contributions in this paper. First, we present an alternative view of neighborhood selection, where we show that neighborhood construction is equivalent to a sparse signal approximation problem. Second, we propose an algorithm, non-negative kernel regression~(NNK), for obtaining neighborhoods that lead to better sparse representation. NNK draws similarities to the orthogonal matching pursuit approach to signal representation and possesses desirable geometric and theoretical properties. Experiments demonstrate (i) the robustness of the NNK algorithm for neighborhood and graph construction, (ii) its ability to adapt the number of neighbors to the data properties, and (iii) its superior performance in local neighborhood and graph-based machine learning tasks.
translated by 谷歌翻译
这篇综述的目的是将读者介绍到图表内,以将其应用于化学信息学中的分类问题。图内核是使我们能够推断分子的化学特性的功能,可以帮助您完成诸如寻找适合药物设计的化合物等任务。内核方法的使用只是一种特殊的两种方式量化了图之间的相似性。我们将讨论限制在这种方法上,尽管近年来已经出现了流行的替代方法,但最著名的是图形神经网络。
translated by 谷歌翻译
在水分配系统(WDS)的每个节点中始终知道压力有助于安全有效的操作。然而,由于现实生活中的仪器数量有限的仪器而无法收集完整的测量数据。通过观察仅在纸张中介绍了通过观察到有限数量的节点来重建所有节点压力的数据驱动的方法。重建方法基于K局部化光谱滤波器,在水网络上的图形卷积之外。考虑到应用中的特点,讨论了层数,层深度和Chebyshev-Polymomial的数量的影响。另外,示出了加权方法,其中可以通过邻接矩阵将关于摩擦损失的信息嵌入光谱图滤波器。与节点的总数相比,所提出的模型的性能呈现在观察到的不同数量的节点上。加权连接在二进制连接上证明没有益处,但是所提出的模型将节点压力与最多5%相对误差相对于5%的观察比以5%的相对误差重建。通过遵循论文讨论的考虑,通过浅图神经网络实现了结果。
translated by 谷歌翻译
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
translated by 谷歌翻译
我们根据功能性隐藏动态地理模型(F-HDGM)的惩罚最大似然估计器(PMLE)提出了一种新型的模型选择算法。这些模型采用经典的混合效应回归结构,该结构具有嵌入式时空动力学,以模拟在功能域中观察到的地理参考数据。因此,感兴趣的参数是该域之间的函数。该算法同时选择了相关的样条基函数和回归变量,这些函数和回归变量用于对响应变量与协变量之间的固定效应关系进行建模。这样,它会自动收缩到功能系数的零部分或无关回归器的全部效果。该算法基于迭代优化,并使用自适应的绝对收缩和选择器操作员(LASSO)惩罚函数,其中未含量的F-HDGM最大likikelihood估计器获得了其中的权重。最大化的计算负担大大减少了可能性的局部二次近似。通过蒙特卡洛模拟研究,我们分析了在不同情况下算法的性能,包括回归器之间的强相关性。我们表明,在我们考虑的所有情况下,受罚的估计器的表现都优于未确定的估计器。我们将该算法应用于一个真实案例研究,其中将意大利伦巴第地区的小时二氧化氮浓度记录记录为具有多种天气和土地覆盖协变量的功能过程。
translated by 谷歌翻译
图表信号处理是一种普遍存在的任务,如传感器,社会,运输和大脑网络,点云处理和图形神经网络等许多应用程序。通常,图形信号在感测过程中损坏,从而需要恢复。在本文中,我们提出了一种基于深度算法展开(DAU)的图形信号恢复方法。首先,我们通过展开乘法器(ADMM)的交替方向方法的迭代来呈现曲线图信号置位。然后,我们建议通过展开即插即用ADMM(PNP-ADMM)的迭代进行线性劣化的一般恢复方法。在第二种方法中,将展开的基于ADMM的Denoiser纳入子模块,导致嵌套的DAU结构。所提出的去噪/恢复方法中的参数以端到端的方式进行培训。我们的方法是可解释的,并保持参数的数量,因为我们只调谐与图形的正则化参数。我们克服了现有曲线图信号恢复方法中的两个主要挑战:1)由于固定参数,凸优化算法的有限性能由于通常手动确定的固定参数。 2)图形神经网络的大量参数导致训练难度。对曲线信号去噪和插值的几个实验是对合成和真实世界的数据进行的。所提出的方法在两个任务中的根均方误差方面,在几种现有技术上显示了性能改进。
translated by 谷歌翻译
许多科学预测问题在使用稀疏和不均匀分布的观测中处理空间和时间的复杂变化方面具有时空数据和建模相关的挑战。本文提出了一种新颖的深度学习架构,对位置依赖的时间序列数据(DEEPLatte)的深度学习预测,明确地将空间统计的理论纳入神经网络以解决这些挑战。除了特征选择模块和时空学习模块之外,Deeplatte还包含一个自相关引导的半监督学习策略,以强制执行学习的时空嵌入空间中的预测的本地自相关模式和全局自相关趋势,以与观察到的数据一致,克服了稀疏和不均匀分布式观测的限制。在培训过程中,监督和半监督亏损指导整个网络的更新:1)防止过度装备,2)优化特征选择,3)学习有用的时空表示,4)改善整体预测。我们在一位良好的公共卫生主题,空气质量预测中,使用公共公共卫生主题,在学习,复杂的身体环境中进行了展示Deeblatte的演示 - 洛杉矶。该实验表明,该方法提供准确的细空间尺度空气质量预测,并揭示了影响结果的关键环境因素。
translated by 谷歌翻译
Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.
translated by 谷歌翻译
Pre-publication draft of a book to be published byMorgan & Claypool publishers. Unedited version released with permission. All relevant copyrights held by the author and publisher extend to this pre-publication draft.
translated by 谷歌翻译
Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them.Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field.
translated by 谷歌翻译
Network-based analyses of dynamical systems have become increasingly popular in climate science. Here we address network construction from a statistical perspective and highlight the often ignored fact that the calculated correlation values are only empirical estimates. To measure spurious behaviour as deviation from a ground truth network, we simulate time-dependent isotropic random fields on the sphere and apply common network construction techniques. We find several ways in which the uncertainty stemming from the estimation procedure has major impact on network characteristics. When the data has locally coherent correlation structure, spurious link bundle teleconnections and spurious high-degree clusters have to be expected. Anisotropic estimation variance can also induce severe biases into empirical networks. We validate our findings with ERA5 reanalysis data. Moreover we explain why commonly applied resampling procedures are inappropriate for significance evaluation and propose a statistically more meaningful ensemble construction framework. By communicating which difficulties arise in estimation from scarce data and by presenting which design decisions increase robustness, we hope to contribute to more reliable climate network construction in the future.
translated by 谷歌翻译
通过图形结构表示数据标识在多个数据分析应用中提取信息的最有效方法之一。当调查多模式数据集时,这尤其如此,因为通过各种传感策略收集的记录被考虑并探索。然而,经典曲线图信号处理基于根据热扩散机构配置的信息传播的模型。该系统提供了对多模式数据分析不适用于多模式数据分析的数据属性的若干约束和假设,特别是当考虑从异构源收集的大规模数据集,因此结果的准确性和稳健性可能会受到严重危害。在本文中,我们介绍了一种基于流体扩散的图表定义模型。该方法提高了基于图形的数据分析的能力,以考虑运行方案中现代数据分析的几个问题,从而为对考试记录的记录底层的现象提供了一种精确,多才多艺的,有效地理解平台,以及完全利用记录的多样性提供的潜力,以获得数据的彻底表征及其意义。在这项工作中,我们专注于使用这种流体扩散模型来驱动社区检测方案,即根据节点中的节点中的相似性将多模式数据集分为多个组中。在不同应用场景中测试真正的多模式数据集实现的实验结果表明,我们的方法能够强烈优先于多媒体数据分析中的社区检测的最先进方案。
translated by 谷歌翻译
即使机器学习算法已经在数据科学中发挥了重要作用,但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式,或数据集中的异质,分层或完全缺少的数据片段,因此很难应用此类方法。作为解决方案,我们提出了一个用于样本表示,模型定义和培训的多功能,统一的框架,称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲,为HMILL的关键组件的设计合理,我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论,该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性,它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外,我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张,我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中,我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中,基于建议的框架的解决方案可实现与专业方法相当的性能。
translated by 谷歌翻译