智能论文笔记

Distributive Justice as the Foundational Premise of Fair ML: Unification, Extension, and Interpretation of Group Fairness Metrics

Joachim Baumann , Corinna Hertweck , Michele Loi , Christoph Heitz

分类：机器学习

2022-06-06

团体公平指标是评估基于预测决策系统公平性的既定方法。但是，这些指标仍然与哲学理论相关，其道德含义通常不清楚。我们提出了一个一般框架，用于分析基于分配正义理论的决策系统的公平性，包括与不同规范立场相对应的不同既定的“正义模式”。我们表明，最受欢迎的集体公平度量指标可以解释为我们方法的特殊情况。因此，我们为群体公平指标提供了一个统一和解释的框架，该框架揭示了与它们相关的规范性选择，并允许理解其道德实质。同时，我们提供了可能公平指标的延伸空间，而不是公平ML文献中当前讨论的指标。我们的框架还允许克服几个群体公平指标的局限性，这些指标在文献中受到批评，最著名的是（1）它们是基于平等的，即他们要求群体之间的某种形式的平等性，这有时可能有害于边缘化。组，（2）他们仅比较跨群体的决策，但对这些群体的后果没有比较，并且（3）分配正义文献的全部广度不足。

translated by 谷歌翻译

A Justice-Based Framework for the Analysis of Algorithmic Fairness-Utility Trade-Offs

Corinna Hertweck , Joachim Baumann , Michele Loi , Eleonora Viganò , Christoph Heitz

分类：机器学习

2022-06-06

在基于预测的决策系统中，不同的观点可能是矛盾的：决策者的短期业务目标通常与决策主体的愿望相抵触。平衡这两个观点是一个价值问题。我们提供一个框架，使这些具有价值的选择清晰可见。为此，我们假设我们获得了一个训练有素的模型，并希望找到平衡决策者和决策主体观点的决策规则。我们提供了一种形式化这两种观点的方法，即评估决策者的效用和对决策主体的公平性。在这两种情况下，这个想法都是从决策者和决策主题中引起的价值观，然后将其变成可衡量的东西。为了进行公平评估，我们以基于福利的公平性的文献为基础，并询问公用事业（或福利）的公平分布是什么样的。在此步骤中，我们以分配正义的著名理论为基础。这使我们能够得出一个公平分数，然后将其与许多不同决策规则的决策者实用程序进行比较。这样，我们提供了一种平衡决策者的实用性的方法，以及对基于预测的决策系统的决策主体的公平性。

translated by 谷歌翻译

PCCC: The Pairwise-Confidence-Constraints-Clustering Algorithm

Philipp Baumann , Dorit S. Hochbaum

分类：机器学习

2022-12-29

We consider a semi-supervised $k$-clustering problem where information is available on whether pairs of objects are in the same or in different clusters. This information is either available with certainty or with a limited level of confidence. We introduce the PCCC algorithm, which iteratively assigns objects to clusters while accounting for the information provided on the pairs of objects. Our algorithm can include relationships as hard constraints that are guaranteed to be satisfied or as soft constraints that can be violated subject to a penalty. This flexibility distinguishes our algorithm from the state-of-the-art in which all pairwise constraints are either considered hard, or all are considered soft. Unlike existing algorithms, our algorithm scales to large-scale instances with up to 60,000 objects, 100 clusters, and millions of cannot-link constraints (which are the most challenging constraints to incorporate). We compare the PCCC algorithm with state-of-the-art approaches in an extensive computational study. Even though the PCCC algorithm is more general than the state-of-the-art approaches in its applicability, it outperforms the state-of-the-art approaches on instances with all hard constraints or all soft constraints both in terms of running time and various metrics of solution quality. The source code of the PCCC algorithm is publicly available on GitHub.

translated by 谷歌翻译

Deep Learning Models for River Classification at Sub-Meter Resolutions from Multispectral and Panchromatic Commercial Satellite Imagery

Joachim Moortgat , Ziwei Li , Michael Durand , Ian Howat , Bidhyananda Yadav , Chunli Dai

分类：计算机视觉 | 机器学习

2022-12-27

Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.

translated by 谷歌翻译

Is it worth it? An experimental comparison of six deep- and classical machine learning methods for unsupervised anomaly detection in time series

Ferdinand Rewicki , Joachim Denzler , Julia Niebling

分类：机器学习 | 人工智能

2022-12-21

The detection of anomalies in time series data is crucial in a wide range of applications, such as system monitoring, health care or cyber security. While the vast number of available methods makes selecting the right method for a certain application hard enough, different methods have different strengths, e.g. regarding the type of anomalies they are able to find. In this work, we compare six unsupervised anomaly detection methods with different complexities to answer the questions: Are the more complex methods usually performing better? And are there specific anomaly types that those method are tailored to? The comparison is done on the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We compare the six methods by analyzing the experimental results on a dataset- and anomaly type level after tuning the necessary hyperparameter for each method. Additionally we examine the ability of individual methods to incorporate prior knowledge about the anomalies and analyse the differences of point-wise and sequence wise features. We show with broad experiments, that the classical machine learning methods show a superior performance compared to the deep learning methods across a wide range of anomaly types.

translated by 谷歌翻译

Découvrir de nouvelles classes dans des données tabulaires

Colin Troisemaine , Joachim Flocon-Cholet , Stéphane Gosselin , Sandrine Vaton , Alexandre Reiffers-Masson , Vincent Lemaire

分类：机器学习

2022-11-28

In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.

translated by 谷歌翻译

Sensor Visibility Estimation: Metrics and Methods for Systematic Performance Evaluation and Improvement

Joachim Börger , Marc Patrick Zapf , Marat Kopytjuk , Xinrun Li 2 , Claudius Gläser

分类：计算机视觉 | 机器人

2022-11-11

Sensor visibility is crucial for safety-critical applications in automotive, robotics, smart infrastructure and others: In addition to object detection and occupancy mapping, visibility describes where a sensor can potentially measure or is blind. This knowledge can enhance functional safety and perception algorithms or optimize sensor topologies. Despite its significance, to the best of our knowledge, neither a common definition of visibility nor performance metrics exist yet. We close this gap and provide a definition of visibility, derived from a use case review. We introduce metrics and a framework to assess the performance of visibility estimators. Our metrics are verified with labeled real-world and simulation data from infrastructure radars and cameras: The framework easily identifies false visible or false invisible estimations which are safety-critical. Applying our metrics, we enhance the radar and camera visibility estimators by modeling the 3D elevation of sensor and objects. This refinement outperforms the conventional planar 2D approach in trustfulness and thus safety.

translated by 谷歌翻译

Rmagine: 3D Range Sensor Simulation in Polygonal Maps via Raytracing for Embedded Hardware on Mobile Robots

Alexander Mock , Thomas Wiemann , Joachim Hertzberg

分类：机器人

2022-09-27

传感器仿真已成为一种有前途且强大的技术，可以找到许多现实世界机器人任务（例如本地化和姿势跟踪）的解决方案。但是，常用的模拟器具有高硬件要求，因此主要用于高端计算机。在本文中，我们提出了一种方法，可以直接在使用三角形网格作为环境图的移动机器人的嵌入式硬件上模拟范围传感器。这个名为Rmagine的库允许机器人直接通过射线缩放模拟传感器数据为任意范围传感器。由于机器人通常只有有限的计算资源，因此Rmagine的目的是灵活且轻巧，同时甚至可以很好地扩展到大型环境图。它通过将统一的API放在硬件制造商提供的特定专有库上，将统一的API放置在诸如Nvidia Jetson之类的多个平台上，例如Nvidia Jetson。这项工作旨在根据范围数据的模拟来支持机器人应用程序的未来开发，这些数据以前在移动系统上的合理时间内无法计算。

translated by 谷歌翻译

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

Đorđe Miladinović , Kumar Shridhar , Kushal Jain , Max B. Paulus , Joachim M. Buhmann , Carl Allen

分类：机器学习

2022-09-26

原则上，将变异自动编码器（VAE）应用于顺序数据提供了一种用于控制序列生成，操纵和结构化表示学习的方法。但是，训练序列VAE具有挑战性：自回归解码器通常可以解释数据而无需使用潜在空间，即后置倒塌。为了减轻这种情况，最新的模型通过将均匀的随机辍学量应用于解码器输入来削弱强大的解码器。从理论上讲，我们表明，这可以消除解码器输入提供的点式互信息，该信息通过利用潜在空间来补偿。然后，我们提出了一种对抗性训练策略，以实现基于信息的随机辍学。与标准文本基准数据集上的均匀辍学相比，我们的目标方法同时提高了序列建模性能和潜在空间中捕获的信息。

translated by 谷歌翻译

Sequential Causal Effect Variational Autoencoder: Time Series Causal Link Estimation under Hidden Confounding

Violeta Teodora Trifunov , Maha Shadaydeh , Joachim Denzler

分类：机器学习

2022-09-23

在存在潜在变量的情况下，从观察数据中估算因果关系的效果有时会导致虚假关系，这可能被错误地认为是因果关系。这是许多领域的重要问题，例如金融和气候科学。我们提出了序性因果效应变异自动编码器（SCEVAE），这是一种在隐藏混杂下的时间序列因果关系分析的新方法。它基于CEVAE框架和复发性神经网络。通过基于Pearl的Do-Calculus使用直接因果标准来计算因果链接的混杂变量强度。我们通过将其应用于具有线性和非线性因果链接的合成数据集，以显示SCEVAE的功效。此外，我们将方法应用于真实的气溶胶气候观察数据。我们将我们的方法与在合成数据上有或没有替代混杂因素的时间序列变形方法进行比较。我们证明我们的方法通过将两种方法与地面真理进行比较来表现更好。对于真实数据，我们使用因果链接的专家知识，并显示正确的代理变量的使用如何帮助数据重建。

translated by 谷歌翻译