Vision transformers have emerged as powerful tools for many computer vision tasks. It has been shown that their features and class tokens can be used for salient object segmentation. However, the properties of segmentation transformers remain largely unstudied. In this work we conduct an in-depth study of the spatial attentions of different backbone layers of semantic segmentation transformers and uncover interesting properties. The spatial attentions of a patch intersecting with an object tend to concentrate within the object, whereas the attentions of larger, more uniform image areas rather follow a diffusive behavior. In other words, vision transformers trained to segment a fixed set of object classes generalize to objects well beyond this set. We exploit this by extracting heatmaps that can be used to segment unknown objects within diverse backgrounds, such as obstacles in traffic scenes. Our method is training-free and its computational overhead negligible. We use off-the-shelf transformers trained for street-scene segmentation to process other scene types.
translated by 谷歌翻译
Active learning as a paradigm in deep learning is especially important in applications involving intricate perception tasks such as object detection where labels are difficult and expensive to acquire. Development of active learning methods in such fields is highly computationally expensive and time consuming which obstructs the progression of research and leads to a lack of comparability between methods. In this work, we propose and investigate a sandbox setup for rapid development and transparent evaluation of active learning in deep object detection. Our experiments with commonly used configurations of datasets and detection architectures found in the literature show that results obtained in our sandbox environment are representative of results on standard configurations. The total compute time to obtain results and assess the learning behavior can thereby be reduced by factors of up to 14 when comparing with Pascal VOC and up to 32 when comparing with BDD100k. This allows for testing and evaluating data acquisition and labeling strategies in under half a day and contributes to the transparency and development speed in the field of active learning for object detection.
translated by 谷歌翻译
Current state-of-the-art deep neural networks for image classification are made up of 10 - 100 million learnable weights and are therefore inherently prone to overfitting. The complexity of the weight count can be seen as a function of the number of channels, the spatial extent of the input and the number of layers of the network. Due to the use of convolutional layers the scaling of weight complexity is usually linear with regards to the resolution dimensions, but remains quadratic with respect to the number of channels. Active research in recent years in terms of using multigrid inspired ideas in deep neural networks have shown that on one hand a significant number of weights can be saved by appropriate weight sharing and on the other that a hierarchical structure in the channel dimension can improve the weight complexity to linear. In this work, we combine these multigrid ideas to introduce a joint framework of multigrid inspired architectures, that exploit multigrid structures in all relevant dimensions to achieve linear weight complexity scaling and drastically reduced weight counts. Our experiments show that this structured reduction in weight count is able to reduce overfitting and thus shows improved performance over state-of-the-art ResNet architectures on typical image classification benchmarks at lower network complexity.
translated by 谷歌翻译
域的适应性引起了极大的兴趣,因为标签是一项昂贵且容易出错的任务,尤其是当像素级在语义分段中需要标签时。因此,人们希望能够在数据丰富并且标签精确的合成域上训练神经网络。但是,这些模型通常在室外图像上表现不佳。为了减轻输入的变化,可以使用图像到图像的方法。然而,使用合成训练域桥接部署领域的标准图像到图像方法并不关注下游任务,而仅关注视觉检查级别。因此,我们在图像到图像域的适应方法中提出了gan的“任务意识”版本。借助少量标记的地面真实数据,我们将图像到图像翻译指导为更合适的输入图像,用于培训合成数据(合成域专家)的语义分割网络。这项工作的主要贡献是1)一种模块化半监督域适应方法,通过训练下游任务Aware Cycean,同时避免适应合成语义分割专家2)该方法适用于复杂的域适应任务3)通过使用从头开始网络进行较不偏见的域间隙分析。我们在分类任务以及语义细分方面评估我们的方法。我们的实验表明,我们的方法比仅使用70(10%)地面真实图像的分类任务中的准确性优于标准图像到图像方法 - 准确性的准确性7%。对于语义细分,我们可以在训练过程中仅使用14个地面真相图像,在均值评估数据集上,平均交叉点比联合的平均交叉点约4%至7%。
translated by 谷歌翻译
在这项工作中,我们首次提出了一种用于检测具有语义分割图像数据集中标签错误的方法,即Pixel-Wise类标签。语义细分数据集的注释获取是耗时的,需要大量的人工劳动。特别是,审查过程是耗时的,人类很容易忽略标签错误。后果是有偏见的基准,在极端情况下,也是在此类数据集上训练的深神经网络(DNNS)的性能降解。语义分割的DNN会产生像素的预测,这使得通过不确定性量化来检测标签错误是一个复杂的任务。在预测的连接组件之间的过渡中,不确定性特别明显。通过将不确定性考虑到预测组件的水平,我们可以使用DNN以及组件级的不确定性定量来检测标签误差。我们提出了一种原则性的方法,可以通过从Carla驾驶模拟器中提取的数据集中从CityScapes数据集中删除标签,以基准标记错误检测的任务,在后一种情况下,我们可以控制标签。我们的实验表明,我们的方法能够在控制错误标签误差检测的数量时检测到绝大多数标签错误。此外,我们将方法应用于计算机视觉社区经常使用的语义分割数据集,并提出标签错误的集合以及示例统计信息。
translated by 谷歌翻译
最先进的深神经网络在语义细分方面表现出了出色的表现。但是,它们的性能与培训数据所代表的领域相关。开放世界的场景会导致不准确的预测,这在安全相关应用中是危险的,例如自动驾驶。在这项工作中,我们使用单眼深度估计来增强语义分割预测,从而通过减少存在域移位时未检测到的对象的发生来改善分割。为此,我们通过修改后的分割网络推断出深度热图,该网络生成前后背面的掩模,该面具与给定的语义分割网络并行运行。两种细分面具均汇总,重点关注前景类(此处的道路使用者),以减少虚假负面因素。为了减少假阳性的发生,我们根据不确定性估计进行修剪。从某种意义上说,我们的方法是模块化的,它后处理了任何语义分割网络的输出。在我们的实验中,与基本的语义分割预测相比,我们观察到大多数重要类别的未检测到的对象,并增强对其他领域的概括。
translated by 谷歌翻译
深度神经网络(DNN)在解释图像数据方面取得了令人印象深刻的进步,因此可以在某种程度上可以在某种程度上使用它们,以在自动驾驶(例如自动驾驶)中使用它们。从道德的角度来看,AI算法应考虑到街道上的物体或受试者的脆弱性,范围从“完全没有”,例如这条路本身,是行人的“高脆弱性”。考虑到这一点的一种方法是定义一个语义类别与另一个语义类别的混淆成本,并使用基于成本的决策规则来解释概率,即DNN的输出。但是,如何定义成本结构是一个开放的问题,应该负责谁来执行此操作,从而定义了AI-Algorithms实际上将“看到”。作为一个可能的答案,我们遵循一种参与式方法,并建立在线调查,要求公众定义成本结构。我们介绍了调查设计和获取的数据以及评估,该评估还区分了视角(汽车乘客与外部交通参与者)和性别。使用基于仿真的$ f $检验,我们发现两组之间存在很大的显着差异。这些差异对在与自动驾驶汽车的安全临界距离内的可靠检测有后果。我们讨论与这种方法相关的道德问题,并从心理学的角度讨论了从人机相互作用到调查出现的问题。最后,我们在AI安全领域的行业领导者对基于调查的元素在自动驾驶中的AI功能设计中的适用性进行了评论。
translated by 谷歌翻译
We present an approach to quantifying both aleatoric and epistemic uncertainty for deep neural networks in image classification, based on generative adversarial networks (GANs). While most works in the literature that use GANs to generate out-of-distribution (OoD) examples only focus on the evaluation of OoD detection, we present a GAN based approach to learn a classifier that produces proper uncertainties for OoD examples as well as for false positives (FPs). Instead of shielding the entire in-distribution data with GAN generated OoD examples which is state-of-the-art, we shield each class separately with out-of-class examples generated by a conditional GAN and complement this with a one-vs-all image classifier. In our experiments, in particular on CIFAR10, CIFAR100 and Tiny ImageNet, we improve over the OoD detection and FP detection performance of state-of-the-art GAN-training based classifiers. Furthermore, we also find that the generated GAN examples do not significantly affect the calibration error of our classifier and result in a significant gain in model accuracy.
translated by 谷歌翻译
对于图像的语义分割,如果该任务限于一组封闭的类,则最先进的深神经网络(DNN)实现高分性精度。然而,截至目前,DNN具有有限的开放世界能够在开放世界中运行,在那里他们任务是识别属于未知对象的像素,最终逐步学习新颖的类。人类有能力说:我不知道那是什么,但我已经看到了这样的东西。因此,希望以无监督的方式执行这种增量学习任务。我们介绍一种基于视觉相似性群集未知对象的方法。这些集群用于定义新课程,并作为无监督增量学习的培训数据。更确切地说,通过分割质量估计来评估预测语义分割的连接组件。具有低估计预测质量的连接组件是随后聚类的候选者。另外,组件明智的质量评估允许获得可能包含未知对象的图像区域的预测分段掩模。这种掩模的各个像素是伪标记的,然后用于重新训练DNN,即,在不使用由人类产生的地面真理。在我们的实验中,我们证明,在没有访问地面真理甚至几个数据中,DNN的类空间可以由新颖的类扩展,实现了相当大的分割精度。
translated by 谷歌翻译
虽然自动化驾驶通常是具有更好的人力驾驶性能的宣传,但这项工作审查了几乎不可能为系统级别提供直接统计证据,这实际上是这种情况。所需的标记数据量将超过现行技术和经济能力的维度。因此,常用的策略是利用冗余以及足够的子系统的性能证明。如已知的,这种策略特别是对于独立运行的子系统的情况,即误差的发生在统计学中。在这里,我们提供了一些第一个考虑因素和实验证据,即这种策略不是自由乘坐作为满足相同计算机视觉任务的神经网络的错误,至少在某些情况下,显示相关的错误存在。如果培训数据,架构和培训保持单独或独立,则仍然是正确的,使用特殊损失函数培训。在我们的实验中,使用来自不同传感器的数据(通过多达五个Mnist数据集的预测)更有效地降低相关性,但是没有一种程度,即实现可以获得为冗余获得的测试数据的潜力和统计上独立的子系统。
translated by 谷歌翻译