The goal of this paper is to detect objects by exploiting their interrelationships. Rather than relying on predefined and labeled graph structures, we infer a graph prior from object co-occurrence statistics. The key idea of our paper is to model object relations as a function of initial class predictions and co-occurrence priors to generate a graph representation of an image for improved classification and bounding box regression. We additionally learn the object-relation joint distribution via energy based modeling. Sampling from this distribution generates a refined graph representation of the image which in turn produces improved detection performance. Experiments on the Visual Genome and MS-COCO datasets demonstrate our method is detector agnostic, end-to-end trainable, and especially beneficial for rare object classes. What is more, we establish a consistent improvement over object detectors like DETR and Faster-RCNN, as well as state-of-the-art methods modeling object interrelationships.
translated by 谷歌翻译
In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.
translated by 谷歌翻译
We present a machine-learning framework to accurately characterize morphologies of Active Galactic Nucleus (AGN) host galaxies within $z<1$. We first use PSFGAN to decouple host galaxy light from the central point source, then we invoke the Galaxy Morphology Network (GaMorNet) to estimate whether the host galaxy is disk-dominated, bulge-dominated, or indeterminate. Using optical images from five bands of the HSC Wide Survey, we build models independently in three redshift bins: low $(0<z<0.25)$, medium $(0.25<z<0.5)$, and high $(0.5<z<1.0)$. By first training on a large number of simulated galaxies, then fine-tuning using far fewer classified real galaxies, our framework predicts the actual morphology for $\sim$ $60\%-70\%$ host galaxies from test sets, with a classification precision of $\sim$ $80\%-95\%$, depending on redshift bin. Specifically, our models achieve disk precision of $96\%/82\%/79\%$ and bulge precision of $90\%/90\%/80\%$ (for the 3 redshift bins), at thresholds corresponding to indeterminate fractions of $30\%/43\%/42\%$. The classification precision of our models has a noticeable dependency on host galaxy radius and magnitude. No strong dependency is observed on contrast ratio. Comparing classifications of real AGNs, our models agree well with traditional 2D fitting with GALFIT. The PSFGAN+GaMorNet framework does not depend on the choice of fitting functions or galaxy-related input parameters, runs orders of magnitude faster than GALFIT, and is easily generalizable via transfer learning, making it an ideal tool for studying AGN host galaxy morphology in forthcoming large imaging survey.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from historical redlining practices; and studies have found varying quality and coverage in the collection and sharing of open-source geospatial data. Despite the extensive literature on machine learning (ML) fairness, few algorithmic strategies have been proposed to mitigate such biases. In this paper we highlight the unique challenges for quantifying and addressing spatio-temporal biases, through the lens of use cases presented in the scientific literature and media. We envision a roadmap of ML strategies that need to be developed or adapted to quantify and overcome these challenges -- including transfer learning, active learning, and reinforcement learning techniques. Further, we discuss the potential role of ML in providing guidance to policy makers on issues related to spatial fairness.
translated by 谷歌翻译
Annotation of multimedia data by humans is time-consuming and costly, while reliable automatic generation of semantic metadata is a major challenge. We propose a framework to extract semantic metadata from automatically generated video captions. As metadata, we consider entities, the entities' properties, relations between entities, and the video category. We employ two state-of-the-art dense video captioning models with masked transformer (MT) and parallel decoding (PVDC) to generate captions for videos of the ActivityNet Captions dataset. Our experiments show that it is possible to extract entities, their properties, relations between entities, and the video category from the generated captions. We observe that the quality of the extracted information is mainly influenced by the quality of the event localization in the video as well as the performance of the event caption generation.
translated by 谷歌翻译
With the rising adoption of Machine Learning across the domains like banking, pharmaceutical, ed-tech, etc, it has become utmost important to adopt responsible AI methods to ensure models are not unfairly discriminating against any group. Given the lack of clean training data, generative adversarial techniques are preferred to generate synthetic data with several state-of-the-art architectures readily available across various domains from unstructured data such as text, images to structured datasets modelling fraud detection and many more. These techniques overcome several challenges such as class imbalance, limited training data, restricted access to data due to privacy issues. Existing work focusing on generating fair data either works for a certain GAN architecture or is very difficult to tune across the GANs. In this paper, we propose a pipeline to generate fairer synthetic data independent of the GAN architecture. The proposed paper utilizes a pre-processing algorithm to identify and remove bias inducing samples. In particular, we claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples. Our experimental evaluation on two open-source datasets demonstrates how the proposed pipeline is generating fair data along with improved performance in some cases.
translated by 谷歌翻译
与训练数据中心的训练传统机器学习(ML)模型相反,联合学习(FL)训练ML模型,这些模型在资源受限的异质边缘设备上包含的本地数据集上。现有的FL算法旨在为所有参与的设备学习一个单一的全球模型,这对于所有参与培训的设备可能没有帮助,这是由于整个设备的数据的异质性。最近,Hanzely和Richt \'{A} Rik(2020)提出了一种新的配方,以培训个性化的FL模型,旨在平衡传统的全球模型与本地模型之间的权衡,该模型可以使用其私人数据对单个设备进行培训只要。他们得出了一种称为无环梯度下降(L2GD)的新算法,以解决该算法,并表明该算法会在需要更多个性化的情况下,可以改善沟通复杂性。在本文中,我们为其L2GD算法配备了双向压缩机制,以进一步减少本地设备和服务器之间的通信瓶颈。与FL设置中使用的其他基于压缩的算法不同,我们的压缩L2GD算法在概率通信协议上运行,在概率通信协议中,通信不会按固定的时间表进行。此外,我们的压缩L2GD算法在没有压缩的情况下保持与香草SGD相似的收敛速率。为了验证算法的效率,我们在凸和非凸问题上都进行了多种数值实验,并使用各种压缩技术。
translated by 谷歌翻译
随着图像识别中深度学习模型的快速发展和使用的增加,安全成为其在安全至关重要系统中的部署的主要关注点。由于深度学习模型的准确性和鲁棒性主要归因于训练样本的纯度,因此,深度学习体系结构通常容易受到对抗性攻击的影响。对抗性攻击通常是通过对正常图像的微妙扰动而获得的,正常图像对人类最不可感知,但可能会严重混淆最新的机器学习模型。我们提出了一个名为Apudae的框架,利用DeNoing AutoCoders(DAES)通过以自适应方式使用这些样品来纯化这些样本,从而提高了已攻击目标分类器网络的分类准确性。我们还展示了如何自适应地使用DAE,而不是直接使用它们,而是进一步提高分类精度,并且更强大,可以设计自适应攻击以欺骗它们。我们在MNIST,CIFAR-10,Imagenet数据集上展示了我们的结果,并展示了我们的框架(Apudae)如何在净化对手方面提供可比性和在大多数情况下的基线方法。我们还设计了专门设计的自适应攻击,以攻击我们的净化模型,并展示我们的防御方式如何强大。
translated by 谷歌翻译
野火越来越多地影响环境,人类健康和安全。在加利福尼亚前20名野火中,2020 - 2021年的野火比上世纪的燃烧更大。加利福尼亚的2018年野火季节造成了1485亿美元的损失。在数百万受影响的人中,由于不足的警报手段,残疾人(约占世界人口的15%)受到不成比例的影响。在该项目中,基于先进的机器学习体系结构开发了多模式野火预测和个性化预警系统。从2012年到2018年的环境保护局和历史野火数据的传感器数据已编译,以建立一个全面的野火数据库,即同类最大的数据库。接下来,设计了一种新型的U-Convolutional-LSTM(长短期记忆)神经网络,设计了一种特殊的体系结构,可从连续的环境参数中提取关键的空间和时间特征,以指示即将来临的野火。环境和气象因素被纳入数据库,并分类为主要指标和落后指标,分别与野火构想和传播的风险相关。此外,地质数据还用于提供更好的野火风险评估。这种新颖的时空神经网络使用传统的卷积神经网络实现了> 97%的精度,而左右的卷积神经网络则达到了约76%,成功地预测了2018年2018年最具破坏性的野火,提前5-14天提前5-14天。最后,提出了一种个性化的预警系统,该警告系统针对有感觉障碍或呼吸系统加剧条件的人量身定制。该技术将使消防部门在袭击之前预测和防止野火,并为处于危险中的个人提供早期警告以更好地准备,从而挽救生命并减少经济损失。
translated by 谷歌翻译