智能论文笔记

Unsupervised embedding and similarity detection of microregions using public transport schedules

Piotr Gramacki

分类：机器学习

2021-11-03

空间数据在应对与城市相关的任务中的作用近年来一直在增长。要在机器学习模型中使用它们，通常需要将它们转换为向量表示，这导致了空间数据表示学习领域的开发。还有一种越来越多的各种空间数据类型，提出了一种表示学习方法。迄今为止，公共交通时间表迄今未被用于一个城市地区的学习陈述的任务。在这项工作中，开发了一种方法来将公共交通可用性信息嵌入到矢量空间中。要对其申请进行实验，从48个城市收集公共交通时间表。使用H3空间索引方法，它们被分成微区域。还提出了一种方法来识别具有类似公共交通报价特征的地区。在其基础上，定义了该地区的公共交通报价的多层次类型。本文表明，所提出的表示方法可以识别城市之间具有相似公共交通特性的微区域，并且可用于评估城市中可用的公共交通的质量。

translated by 谷歌翻译

gtfs2vec -- Learning GTFS Embeddings for comparing Public Transport Offer in Microregions

Piotr Gramacki , Szymon Woźniak , Piotr Szymański

分类：机器学习 | 人工智能

2021-11-01

我们选择了48个欧洲城市，并以GTFS格式聚集了公共交通时间表。我们利用优步的H3空间指数将每个城市划分为六角形微区域。基于时间表数据，我们创建了某些功能，描述了每个区域中的公共交通可用性的数量和各种功能。接下来，我们培训了一个自动关联的深神经网络来嵌入每个区域。具有这样的准备的表示，我们使用分层聚类方法来识别类似地区。为此，我们利用了一个附着的聚类算法，在地区和病房的方法之间具有欧几里德距离，以最小化簇内方差。最后，我们在不同级别分析了所获得的集群，以确定定性描述公共交通可用性的一些群集。我们认为，我们的类型与分析的城市的特征匹配，并允许成功寻找具有相似公共交通计划特征的地区。

translated by 谷歌翻译

Predicting the Location of Bicycle-sharing Stations using OpenStreetMap Data

Kamil Raczycki

分类：机器学习 | 人工智能

2021-11-02

规划自行车共享站的布局是一个复杂的过程，特别是在刚刚实施自行车共享系统的城市。城市规划者通常必须根据公开可用的数据并私下提供来自管理的数据，然后使用现场流行的位置分配模型。较小城市的许多城市可能难以招聘专家进行此类规划。本文提出了一种新的解决方案来简化和促进通过使用空间嵌入方法来实现这种规划的过程。仅基于来自OpenStreetMap的公开数据，以及来自欧洲34个城市的站布局，已经开发了一种使用优步H3离散全球电网系统将城市分成微区域的方法，并指示其值得放置站的区域在不同城市使用转移学习的现有系统。工作的结果是在规划驻地布局的决策中支持规划者的机制，以选择参考城市。

translated by 谷歌翻译

Hex2vec -- Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags

Szymon Woźniak , Piotr Szymański

分类：机器学习 | 人工智能

2021-11-01

空间和地理数据的表示学习是一种快速开发的领域，其允许使用深神经网络的区域和高质量推断之间的相似性检测。然而，过去的方法集中在嵌入光栅图像（地图，街道或卫星照片），移动数据或道路网络上。在本文中，我们提出了第一种关于在微区网格中的城市功能和土地利用的开放式车间地区的传染媒介表示的第一种方法。我们确定与土地使用，建筑和城市地区功能，水，绿色或其他自然区域的主要特征相关的OSM标签的子集。通过手动验证标记质量，我们选择了36个城市用于培训区域的陈述。优步的H3索引用于将城市划分为六边形，而OSM标签为每个六角形汇总。我们提出了基于负采样的跳过克模型的Hex2VEC方法。由此产生的矢量表示展示了地图特征的语义结构，类似于基于向量的语言模型中的存在。我们还在六个波兰城市中从区域相似性检测的见解，并提出了通过附聚类获得的区域类型。

translated by 谷歌翻译

Transfer Learning Approach to Bicycle-sharing Systems' Station Location Planning using OpenStreetMap Data

Kamil Raczycki , Piotr Szymański

分类：机器学习 | 人工智能

2021-11-01

自行车共享系统（BSS）已成为发达地区大众较大城市的许多公民的日常现实。然而，规划自行车共享站的布局通常需要昂贵的数据收集，测量旅行行为和跳闸建模，然后是站布局优化。许多小城镇，特别是在发展中地区，可能难以融资此类项目。规划BSS也需要相当多的时间。然而，随着大流行表明我们，市政当局将面临迅速适应流动性转变的必要性，包括留有自行车公共交通工具的公民。在解决自行车需求的增加方面，快速铺设自行车共享系统将变得至关重要。本文解决了BSS布局设计中的成本和时间问题，并提出了一种新的解决方案来简化和促进使用空间嵌入方法的这种规划的过程。仅基于来自OpenStreetMap的公开数据，以及来自欧洲34个城市的站布局，已经开发了一种使用优步H3离散全球电网系统将城市分成微区域的方法，并指示其值得放置站的区域在不同城市使用转移学习的现有系统。工作的结果是在规划驻地布局的决策中支持规划者的机制，以选择参考城市。

translated by 谷歌翻译

Deep Learning based Urban Vehicle Trajectory Analytics

Seongjin Choi

分类：机器学习

2021-11-15

“轨迹”是指由地理空间中的移动物体产生的迹线，通常由一系列按时间顺序排列的点表示，其中每个点由地理空间坐标集和时间戳组成。位置感应和无线通信技术的快速进步使我们能够收集和存储大量的轨迹数据。因此，许多研究人员使用轨迹数据来分析各种移动物体的移动性。在本文中，我们专注于“城市车辆轨迹”，这是指城市交通网络中车辆的轨迹，我们专注于“城市车辆轨迹分析”。城市车辆轨迹分析提供了前所未有的机会，可以了解城市交通网络中的车辆运动模式，包括以用户为中心的旅行经验和系统范围的时空模式。城市车辆轨迹数据的时空特征在结构上相互关联，因此，许多先前的研究人员使用了各种方法来理解这种结构。特别是，由于其强大的函数近似和特征表示能力，深度学习模型是由于许多研究人员的注意。因此，本文的目的是开发基于深度学习的城市车辆轨迹分析模型，以更好地了解城市交通网络的移动模式。特别是，本文重点介绍了两项研究主题，具有很高的必要性，重要性和适用性：下一个位置预测，以及合成轨迹生成。在这项研究中，我们向城市车辆轨迹分析提供了各种新型模型，使用深度学习。

translated by 谷歌翻译

Clustering -- Basic concepts and methods

Jan-Oliver Felix Kapp-Joswig , Bettina G. Keller

分类：机器学习

2022-12-01

We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task? And how can clustering results be validated? Connectivity-based versus prototype-based approaches are reflected in the context of several popular methods: single-linkage, spectral embedding, k-means, and Gaussian mixtures are discussed as well as the density-based protocols (H)DBSCAN, Jarvis-Patrick, CommonNN, and density-peaks.

translated by 谷歌翻译

On the Design of Graph Embeddings for the Sensorless Estimation of Road Traffic Profiles

Eric L. Manibardo , Ibai Laña , Esther Villar , Javier Del Ser

分类：机器学习 | 人工智能

2022-01-11

交通预测模型依赖需要感测，处理和存储的数据。这需要部署和维护交通传感基础设施，往往导致不适合的货币成本。缺乏感测的位置可以与合成数据模拟相辅相成，进一步降低交通监测所需的经济投资。根据类似道路的数据分布，其中最常见的数据生成方法之一包括产生实际的流量模式。检测具有相似流量的道路的过程是这些系统的关键点。但是，在不收集目标位置收集数据，没有用于该相似性的搜索可以使用流量度量。我们提出了一种通过检查道路段的拓扑特征来发现具有可用流量数据的方法的方法。相关的拓扑功能被提取为数值表示（嵌入式）以比较不同的位置，并最终根据其嵌入之间的相似性找到最相似的道路。检查该新颖选择系统的性能，并与更简单的流量估计方法进行比较。找到类似的数据源后，使用生成方法来合成流量配置文件。根据感知道路的交通行为的相似性，可以使用一条路的数据来馈送生成方法。在合成样品的精度方面分析了几种代理方法。最重要的是，这项工作打算促进进一步的研究努力提高综合交通样本的质量，从而降低对传感基础设施的需求。

translated by 谷歌翻译

From "Where" to "What": Towards Human-Understandable Explanations through Concept Relevance Propagation

Reduan Achtibat , Maximilian Dreyer , Ilona Eisenbraun , Sebastian Bosse , Thomas Wiegand , Wojciech Samek , Sebastian Lapuschkin

分类：机器学习 | 人工智能

2022-06-07

可解释的人工智能（XAI）的新兴领域旨在为当今强大但不透明的深度学习模型带来透明度。尽管本地XAI方法以归因图的形式解释了个体预测，从而确定了重要特征的发生位置（但没有提供有关其代表的信息），但全局解释技术可视化模型通常学会的编码的概念。因此，两种方法仅提供部分见解，并留下将模型推理解释的负担。只有少数当代技术旨在将本地和全球XAI背后的原则结合起来，以获取更多信息的解释。但是，这些方法通常仅限于特定的模型体系结构，或对培训制度或数据和标签可用性施加其他要求，这实际上使事后应用程序成为任意预训练的模型。在这项工作中，我们介绍了概念相关性传播方法（CRP）方法，该方法结合了XAI的本地和全球观点，因此允许回答“何处”和“ where”和“什么”问题，而没有其他约束。我们进一步介绍了相关性最大化的原则，以根据模型对模型的有用性找到代表性的示例。因此，我们提高了对激活最大化及其局限性的共同实践的依赖。我们证明了我们方法在各种环境中的能力，展示了概念相关性传播和相关性最大化导致了更加可解释的解释，并通过概念图表，概念组成分析和概念集合和概念子区和概念子区和概念子集和定量研究对模型的表示和推理提供了深刻的见解。它们在细粒度决策中的作用。

translated by 谷歌翻译

Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns

Thilo Reintjes

分类：人工智能 | 机器学习

2022-09-26

本文调查了股票回购，特别是分享回购公告。它解决了如何识别此类公告，股票回购的超额回报以及股票回购公告后的回报的预测。我们说明了两种NLP方法，用于自动检测股票回购公告。即使有少量的培训数据，我们也可以达到高达90％的准确性。该论文利用这些NLP方法生成一个由57,155个股票回购公告组成的大数据集。通过分析该数据集，本论文的目的是表明大多数宣布回购的公司的大多数公司都表现不佳。但是，少数公司的表现极大地超过了MSCI世界。当查看所有公司的平均值时，这种重要的表现过高会导致净收益。如果根据公司的规模调整了基准指数，则平均表现过高，并且大多数表现不佳。但是，发现宣布股票回购的公司至少占其市值的1％，即使使用调整后的基准，也平均交付了显着的表现。还发现，在危机时期宣布股票回购的公司比整个市场更好。此外，生成的数据集用于训练72个机器学习模型。通过此，它能够找到许多可以达到高达77％并产生大量超额回报的策略。可以在六个不同的时间范围内改善各种性能指标，并确定明显的表现。这是通过训练多个模型的不同任务和时间范围以及结合这些不同模型的方法来实现的，从而通过融合弱学习者来产生重大改进，以创造一个强大的学习者。

translated by 谷歌翻译

Mapping the Internet: Modelling Entity Interactions in Complex Heterogeneous Networks

Simon Mandlik , Tomas Pevny

分类：机器学习

2021-04-19

即使机器学习算法已经在数据科学中发挥了重要作用，但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式，或数据集中的异质，分层或完全缺少的数据片段，因此很难应用此类方法。作为解决方案，我们提出了一个用于样本表示，模型定义和培训的多功能，统一的框架，称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲，为HMILL的关键组件的设计合理，我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论，该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性，它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外，我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张，我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中，我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中，基于建议的框架的解决方案可实现与专业方法相当的性能。

translated by 谷歌翻译

Survey of Generative Methods for Social Media Analysis

Stan Matwin , Aristides Milios , Paweł Prałat , Amilcar Soares , François Théberge

分类：机器学习

2021-12-13

本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片（Sota）。它填补了空白，因为现有的调查文章在其范围内或被约会。我们包括两个重要方面，目前正在挖掘和建模社交媒体的重要性：动态和网络。社会动态对于了解影响影响或疾病的传播，友谊的形成，友谊的形成等，另一方面，可以捕获各种复杂关系，提供额外的洞察力和识别否则将不会被注意的重要模式。

translated by 谷歌翻译

Proceedings of the 3rd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

translated by 谷歌翻译

A Comprehensive Survey on the Convergence of Vehicular Social Networks and Fog Computing

Farimasadat Miri , Richard Pazzi

分类：人工智能

2021-11-30

近年来，物联网设备的数量越来越快，这导致了用于管理，存储，分析和从不同物联网设备的原始数据做出决定的具有挑战性的任务，尤其是对于延时敏感的应用程序。在车辆网络（VANET）环境中，由于常见的拓扑变化，车辆的动态性质使当前的开放研究发出更具挑战性，这可能导致车辆之间断开连接。为此，已经在5G基础设施上计算了云和雾化的背景下提出了许多研究工作。另一方面，有多种研究提案旨在延长车辆之间的连接时间。已经定义了车辆社交网络（VSN）以减少车辆之间的连接时间的负担。本调查纸首先提供了关于雾，云和相关范例，如5G和SDN的必要背景信息和定义。然后，它将读者介绍给车辆社交网络，不同的指标和VSN和在线社交网络之间的主要差异。最后，本调查调查了在展示不同架构的VANET背景下的相关工作，以解决雾计算中的不同问题。此外，它提供了不同方法的分类，并在雾和云的上下文中讨论所需的指标，并将其与车辆社交网络进行比较。与VSN和雾计算领域的新研究挑战和趋势一起讨论了相关相关工程的比较。

translated by 谷歌翻译

Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

Sebastian Lapuschkin , Stephan Wäldchen , Alexander Binder , Grégoire Montavon , Wojciech Samek , Klaus-Robert Müller

分类：

2019-02-26

Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to wellinformed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.

translated by 谷歌翻译

Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

Lennart Küll

分类：自然语言处理

2022-12-01

The number of scientific publications continues to rise exponentially, especially in Computer Science (CS). However, current solutions to analyze those publications restrict access behind a paywall, offer no features for visual analysis, limit access to their data, only focus on niches or sub-fields, and/or are not flexible and modular enough to be transferred to other datasets. In this thesis, we conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata and to determine the state of CS research. Specifically, we investigate trends of the quantity, impact, and topics for authors, venues, document types (conferences vs. journals), and fields of study (compared to, e.g., medicine). To achieve this we introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations. The data underlying this system is the DBLP Discovery Dataset (D3), which contains metadata from 5 million CS publications. Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future. The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles receive twice as many citations compared to conference papers, but the contrast is much smaller for the most cited conferences and journals, and vi) journals also get more citations in all other investigated fields of study, while only CS and engineering publish more in conferences than journals.

translated by 谷歌翻译

Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction

Matthew Stevenson , Christophe Mues , Cristián Bravo

分类：机器学习 | 计算机视觉

2021-12-02

LIDAR（“光检测和测距”或“激光成像，检测和测距”）技术可用于提供城市和农村景观的详细三维高度地图。迄今为止，空气传播的激光雷达成像主要被限制在环境和考古域中。然而，该数据的地理上粒度和开放源特性也为使用了地理人口类型数据的社会，组织和业务应用程序。具体地，处理该多维数据的复杂性迄今为止涉及其更广泛的采用。在本文中，我们提出了一系列方便的任务无关瓷砖高程嵌入来解决这一挑战，利用无监督深度学习的最新进展。通过预测大伦敦地区的小型地区，通过预测七个剥夺指数（2019年）来测试我们嵌入的潜力。这些索引涵盖了一系列社会经济结果，并作为可以应用嵌入的各种下游任务的代理。我们考虑不仅仅是独立于自己的数据的适用性，而且与人口统计特征结合使用，也可以作为辅助数据源，从而为嵌入品提供了一个现实用例。在尝试各种模型/嵌入配置中，我们发现我们最好的表现嵌入式导致单独使用标准人口统计特征的根本平衡（RMSE）改进高达21％。我们还展示了使用深度学习与K-Means集群相结合的嵌入管道的嵌入管道，产生相干瓷砖段，允许解释潜在的嵌入功能。

translated by 谷歌翻译

Applications of deep learning in traffic congestion detection, prediction and alleviation: A survey

Nishant Kumar , Martin Raubal

分类：机器学习 | (统计)机器学习

2021-02-19

检测，预测和减轻交通拥堵是针对改善运输网络的服务水平的目标。随着对更高分辨率的更大数据集的访问，深度学习对这种任务的相关性正在增加。近年来几篇综合调查论文总结了运输领域的深度学习应用。然而，运输网络的系统动态在非拥挤状态和拥塞状态之间变化大大变化 - 从而需要清楚地了解对拥堵预测特异性特异性的挑战。在这项调查中，我们在与检测，预测和缓解拥堵相关的任务中，介绍了深度学习应用的当前状态。重复和非经常性充血是单独讨论的。我们的调查导致我们揭示了当前研究状态的固有挑战和差距。最后，我们向未来的研究方向提出了一些建议，因为所确定的挑战的答案。

translated by 谷歌翻译

A Machine Learning Enhanced Approach for Automated Sunquake Detection in Acoustic Emission Maps

Vanessa Mercea , Alin Razvan Paraschiv , Daniela Adriana Lacatus , Anca Marginean , Diana Besliu-Ionescu

分类：计算机视觉 | 机器学习

2022-12-13

Sunquakes are seismic emissions visible on the solar surface, associated with some solar flares. Although discovered in 1998, they have only recently become a more commonly detected phenomenon. Despite the availability of several manual detection guidelines, to our knowledge, the astrophysical data produced for sunquakes is new to the field of Machine Learning. Detecting sunquakes is a daunting task for human operators and this work aims to ease and, if possible, to improve their detection. Thus, we introduce a dataset constructed from acoustic egression-power maps of solar active regions obtained for Solar Cycles 23 and 24 using the holography method. We then present a pedagogical approach to the application of machine learning representation methods for sunquake detection using AutoEncoders, Contrastive Learning, Object Detection and recurrent techniques, which we enhance by introducing several custom domain-specific data augmentation transformations. We address the main challenges of the automated sunquake detection task, namely the very high noise patterns in and outside the active region shadow and the extreme class imbalance given by the limited number of frames that present sunquake signatures. With our trained models, we find temporal and spatial locations of peculiar acoustic emission and qualitatively associate them to eruptive and high energy emission. While noting that these models are still in a prototype stage and there is much room for improvement in metrics and bias levels, we hypothesize that their agreement on example use cases has the potential to enable detection of weak solar acoustic manifestations.

translated by 谷歌翻译

AWT -- Clustering Meteorological Time Series Using an Aggregated Wavelet Tree

Christina Pacher , Irene Schicker , Rosmarie deWit , Katerina Hlavackova-Schindler , Claudia Plant

分类：机器学习

2022-12-13

Both clustering and outlier detection play an important role for meteorological measurements. We present the AWT algorithm, a clustering algorithm for time series data that also performs implicit outlier detection during the clustering. AWT integrates ideas of several well-known K-Means clustering algorithms. It chooses the number of clusters automatically based on a user-defined threshold parameter, and it can be used for heterogeneous meteorological input data as well as for data sets that exceed the available memory size. We apply AWT to crowd sourced 2-m temperature data with an hourly resolution from the city of Vienna to detect outliers and to investigate if the final clusters show general similarities and similarities with urban land-use characteristics. It is shown that both the outlier detection and the implicit mapping to land-use characteristic is possible with AWT which opens new possible fields of application, specifically in the rapidly evolving field of urban climate and urban weather.

translated by 谷歌翻译