智能论文笔记

Mitigating the Bias of Centered Objects in Common Datasets

Gergely Szabo , Andras Horvath

分类：计算机视觉 | 机器学习

2021-12-16

卷积网络被视为换档不变，但据证明它们的响应可能根据对象的确切位置而变化。在本文中，我们将展示大多数常见的数据集具有偏差，其中物体在训练期间在图像的中心被覆盖。这些网络的偏差和边界条件可以对这些架构的性能产生显着影响，并且当物体接近边界时，它们的精度显着下降。我们还将演示如何通过数据增强技术减轻这种效果。

translated by 谷歌翻译

Automatic vehicle trajectory data reconstruction at scale

Yanbing Wang , Derek Gloudemans , Zi Nean Teoh , Lisa Liu , Gergely Zachár , William Barbour , Daniel Work

分类：计算机视觉

2022-12-15

Vehicle trajectory data has received increasing research attention over the past decades. With the technological sensing improvements such as high-resolution video cameras, in-vehicle radars and lidars, abundant individual and contextual traffic data is now available. However, though the data quantity is massive, it is by itself of limited utility for traffic research because of noise and systematic sensing errors, thus necessitates proper processing to ensure data quality. We draw particular attention to extracting high-resolution vehicle trajectory data from video cameras as traffic monitoring cameras are becoming increasingly ubiquitous. We explore methods for automatic trajectory data reconciliation, given "raw" vehicle detection and tracking information from automatic video processing algorithms. We propose a pipeline including a) an online data association algorithm to match fragments that are associated to the same object (vehicle), which is formulated as a min-cost network flow problem of a graph, and b) a trajectory reconciliation method formulated as a quadratic program to enhance raw detection data. The pipeline leverages vehicle dynamics and physical constraints to associate tracked objects when they become fragmented, remove measurement noise on trajectories and impute missing data due to fragmentations. The accuracy is benchmarked on a sample of manually-labeled data, which shows that the reconciled trajectories improve the accuracy on all the tested input data for a wide range of measures. An online version of the reconciliation pipeline is implemented and will be applied in a continuous video processing system running on a camera network covering a 4-mile stretch of Interstate-24 near Nashville, Tennessee.

translated by 谷歌翻译

Industry-Scale Orchestrated Federated Learning for Drug Discovery

Martijn Oldenhof , Gergely Ács , Balázs Pejó , Ansgar Schuffenhauer , Nicholas Holway , Noé Sturm , Arne Dieckmann , Oliver Fortmeier , Eric Boniface , Clément Mayer

分类：机器学习 | (统计)机器学习

2022-10-17

To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n{\deg}831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.

translated by 谷歌翻译

A Snapshot of the Frontiers of Client Selection in Federated Learning

Gergely Dániel Németh , Miguel Ángel Lozano , Novi Quadrianto , Nuria Oliver

分类：人工智能 | 机器学习

2022-09-27

Federated learning (FL) has been proposed as a privacy-preserving approach in distributed machine learning. A federated learning architecture consists of a central server and a number of clients that have access to private, potentially sensitive data. Clients are able to keep their data in their local machines and only share their locally trained model's parameters with a central server that manages the collaborative learning process. FL has delivered promising results in real-life scenarios, such as healthcare, energy, and finance. However, when the number of participating clients is large, the overhead of managing the clients slows down the learning. Thus, client selection has been introduced as a strategy to limit the number of communicating parties at every step of the process. Since the early na\"{i}ve random selection of clients, several client selection methods have been proposed in the literature. Unfortunately, given that this is an emergent field, there is a lack of a taxonomy of client selection methods, making it hard to compare approaches. In this paper, we propose a taxonomy of client selection in Federated Learning that enables us to shed light on current progress in the field and identify potential areas of future research in this promising area of machine learning.

translated by 谷歌翻译

Proximal Point Imitation Learning

Luca Viano , Angeliki Kamoutsi , Gergely Neu , Igor Krawczuk , Volkan Cevher

分类：机器学习

2022-09-22

这项工作开发了具有严格效率的新算法，可确保无限的地平线模仿学习（IL）具有线性函数近似而无需限制性相干假设。我们从问题的最小值开始，然后概述如何从优化中利用经典工具，尤其是近端点方法（PPM）和双平滑性，分别用于在线和离线IL。多亏了PPM，我们避免了在以前的文献中出现在线IL的嵌套政策评估和成本更新。特别是，我们通过优化单个凸的优化和在成本和Q函数上的平稳目标来消除常规交替更新。当不确定地解决时，我们将优化错误与恢复策略的次级优势联系起来。作为额外的奖励，通过将PPM重新解释为双重平滑以专家政策为中心，我们还获得了一个离线IL IL算法，该算法在所需的专家轨迹方面享有理论保证。最后，我们实现了线性和神经网络功能近似的令人信服的经验性能。

translated by 谷歌翻译

Subdiffusive semantic evolution in Indo-European languages

Bogdán Asztalos , Gergely Palla , Dániel Czégel

分类：自然语言处理

2022-09-10

单词如何改变他们的含义？尽管语义演化是由多种不同的因素（包括语言，社会和技术方面的）驱动的，但我们发现，有一项法律在五种主要的印欧语语言中普遍存在：这种语义演化非常宽容。使用控制基础对称性的直觉分布语义嵌入的自动管道，我们表明单词遵循含义空间中的随机轨迹，具有异常扩散指数$ \ alpha = 0.45 \ pm 0.05 \ pm 0.05 \ pm 0.05 $ 0.05 $，相反，与扩散的粒子相比之下\ alpha = 1 $。随机化方法表明，在语义变化方向上保留时间相关性是为了恢复强烈延伸的行为所必需的。但是，变化大小的相关性也起着重要作用。我们此外表明，在数据分析和解释中，强大的亚扩散是一种强大的现象，例如选择拟合位移平均值或平均单个单词轨迹的最佳拟合指数的选择。

translated by 谷歌翻译

A Review of the Convergence of 5G/6G Architecture and Deep Learning

Olusola T. Odeyomi , Olubiyi O. Akintade , Temitayo O. Olowu , Gergely Zaruba

分类：机器学习 | 人工智能

2022-08-16

5G建筑和深度学习的融合在无线通信和人工智能领域都获得了许多研究兴趣。这是因为深度学习技术已被确定为构成5G体系结构的5G技术的潜在驱动力。因此，关于5G架构和深度学习的融合进行了广泛的调查。但是，大多数现有的调查论文主要集中于深度学习如何与特定的5G技术融合，因此，不涵盖5G架构的全部范围。尽管最近有一份调查文件似乎很强大，但对该论文的评论表明，它的结构不佳，无法专门涵盖深度学习和5G技术的收敛性。因此，本文概述了关键5G技术和深度学习的融合。讨论了这种融合面临的挑战。此外，还讨论了对未来6G体系结构的简要概述，以及如何与深度学习进行融合。

translated by 谷歌翻译

Online Learning with Off-Policy Feedback

Germano Gabbianelli , Matteo Papini , Gergely Neu

分类：机器学习 | (统计)机器学习

2022-07-18

我们研究了在偏见的可观察性模型下，在对抗性匪徒问题中的在线学习问题，称为政策反馈。在这个顺序决策问题中，学习者无法直接观察其奖励，而是看到由另一个未知策略并行运行的奖励（行为策略）。学习者必须在这种情况下面临另一个挑战：由于他们的控制之外的观察结果有限，学习者可能无法同样估算每个政策的价值。为了解决这个问题，我们提出了一系列算法，以保证任何比较者政策与行为政策之间的自然不匹配概念的范围，从而提高了对观察结果良好覆盖的比较者的绩效。我们还为对抗性线性上下文匪徒的设置提供了扩展，并通过一组实验验证理论保证。我们的关键算法想法是调整最近在非政策强化学习背景下流行的悲观奖励估计量的概念。

translated by 谷歌翻译

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space

Marcell Stippinger , Dávid Hanák , Marcell T. Kurbucz , Gergely Hanczár , Olivér M. Törteli , Zoltán Somogyvári

分类：机器学习 | 人工智能

2022-06-21

缺乏自由获得的（现实生活或合成）高或超高维度的多级数据集可能会阻碍对特征筛查的快速增长的研究，尤其是在生物识别技术领域，在这种情况下，此类数据集使用很常见。本文报告了一个名为Biometricblender的Python软件包，它是一种超高维，多级合成数据生成器，可基于广泛的功能筛选方法进行基准测试。在数据生成过程中，用户可以控制混合特征的总体实用性和相互关系，因此合成特征空间能够模仿真实生物识别数据集的关键属性。

translated by 谷歌翻译

Functional Output Regression with Infimal Convolution: Exploring the Huber and $ε$-insensitive Losses

Alex Lambert , Dimitri Bouche , Zoltan Szabo , Florence d'Alché-Buc

分类： (统计)机器学习 | 机器学习

2022-06-16

本文的重点是具有复杂损失的功能输出回归（用于）。尽管大多数现有的工作都考虑了正方形损失设置，但我们利用了Huber的扩展以及$ \ Epsilon $不敏感的损失（由虚拟卷积引起的），并提出了一个灵活的框架，能够处理家庭中各种形式的异常值和稀疏。我们得出了依靠二元性来解决矢量价值复制内核希尔伯特空间中所得任务的计算算法。该方法的效率与合成基准和现实基准的经典平方损耗设置相反。

translated by 谷歌翻译