Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance of several baselines across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot fashion. Our code can be found at \url{https://github.com/shlokk/HCL/tree/main/HCL}.
translated by 谷歌翻译
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.
translated by 谷歌翻译
我们介绍了异常聚类,其目标是将数据分组为语义相干的异常类型簇。这与异常检测不同,其目标是将异常从正常数据分开。与目标居中图像聚类应用程序不同,异常群集尤其具有挑战性,因为异常模式是微妙和本地的。我们使用基于补丁的预嵌入和现成的聚类方法提供了一个简单而有效的聚类框架。我们在图像之间定义距离功能,每个距离由加权平均嵌入的欧几里德距离表示为嵌入袋。重量定义了袋子中的实例(即贴片嵌入)的重要性,这可能会突出缺陷区域。如果标记为标记的正常数据,我们以无监督的方式计算权重或以半监督方式计算权重。广泛的实验研究表明,所提出的聚类框架的有效性以及在现有多实例或深簇框架上的新距离功能。总体而言,我们的框架在MVTEC对象和纹理类别上实现了0.451和0.674标准化的相互信息分数,并进一步改善了一些标记的正常数据(0.577,0.669),远远超过基线(0.244,0.273)或最先进的深层聚类方法(0.176,0.277)。
translated by 谷歌翻译
生成的对抗网络(GANS)是一个强大的模型系列,用于学习生成合成数据的底层分布。许多对GAN的现有研究侧重于改善所生成的图像数据的实际性,因为它们很少有人关注提高所生成的数据的质量,以训练其他分类器 - 一种称为模型兼容性问题的任务。结果,现有的GAN经常更喜欢生成远离分类器的边界的“更容易”的合成数据,并且避免了已知近边界数据,这些数据是在训练分类器中发挥重要作用的近边界数据。为了在模型兼容性方面改进GaN,我们提出了使用原始数据从一组预先训练的分类器中利用边界信息的边界校准GAN(BCGANS)。特别地,我们将辅助边界校准丢失(BC丢失)引入GaN的发电机中,以匹配原始数据的后部分布和相对于预先训练的分类器的边界的数据之间的统计数据。 BC损耗可被证明是无偏的,并且可以容易地与不同的GaN变体耦合,以提高其模型兼容性。实验结果表明,BCGANS不仅产生原始GAN等现实形象,而且还达到了比原来的GAN更优越的型号。
translated by 谷歌翻译
有条件的生成对抗性网络(CGANS)是隐式生成模型,允许从类条件分布中进行采样。现有的CGANS基于各种不同的不同鉴别器设计和培训目标。早期作品中的一个流行的设计是在培训期间包括分类器,假设良好的分类器可以帮助消除使用错误类生成的样本。然而,包括CGANs的分类器通常具有仅产生易于分类的样本的副作用。最近,一些代表性的CGANS避免了缺点和达到最先进的表现而没有分类器。不知何故,它仍然未解决分类器是否可以复活以设计更好的CGANS。在这项工作中,我们证明可以正确利用分类器来改善CGANS。我们首先使用联合概率分布的分解来将CGANS的目标连接为统一框架。该框架以及经典能源模型与参数化分配,以原则方式为CGANS的分类器的使用证明了对标准的。它解释了几种流行的Cgan变体,例如acgan,projgan和contragan,作为具有不同近似水平的特殊情况,这提供了统一的观点,并为理解CGAN带来了新的见解。实验结果表明,由所提出的框架灵感的设计优于多个基准数据集上的最先进的CGAN,特别是在最具挑战性的想象中。该代码可在https://github.com/sian-chen/pytorch-ecgan获得。
translated by 谷歌翻译
由于开发更有效的对比学习方法,最近的学习最近取得了特殊的进展。然而,CNNS容易依赖于人类认为非语义的低级特征。据推测这种依赖性促使图像扰动或域移位缺乏鲁棒性。在本文中,我们表明,通过仔细设计的负样本,对比学习可以了解更强大的表现形式,较少依赖这些特征。对比度学习利用正对对保存语义信息的同时在训练图像中扰乱肤浅的特征。类似地,我们建议以反向的方式产生负样本,其中仅保留多余的代言特征。我们开发两种方法,基于纹理和基于补丁的增强,以生成负样本。这些样品达到更好的泛化,尤其是在域外设置下。我们还分析了我们的方法和生成的基于纹理的样本,显示纹理特征在分类特定的ImageNet类以及尤其更精细的类中是必不可少的。我们还表明,在不同的测试设置下,模型偏见有利于纹理和形状不同。我们的代码,培训的模型和想象的纹理数据集可以在https://github.com/songsoneige/contrastive-learning-with-non-semantic-negatiens找到。
translated by 谷歌翻译
异常检测(AD),将异常与正常数据分开,从安全性到医疗保健都有许多范围内的应用程序。尽管大多数以前的作品都被证明对具有完全或部分标记数据的案例有效,但由于标记对此任务特别乏味,因此设置在实践中较不常见。在本文中,我们专注于完全无监督的AD,其中包含正常样本和异常样本的整个培训数据集未标记。为了有效地解决这个问题,我们建议通过使用数据改进过程来提高接受自我监督表示的一类分类的鲁棒性。我们提出的数据完善方法基于单级分类器(OCCS)的集合,每个分类器均经过培训的训练数据子集。随着数据改进的改进,通过自我监督学习学到的表示的表示。我们在具有图像和表格数据的各种无监督的AD任务上演示了我们的方法。 CIFAR-10图像数据的异常比率为10% /甲状腺表格数据的2.5%异常比率,该方法的表现优于最先进的单级分类器,高于6.3 AUC和12.5平均精度 / 22.9 F1评分。 。
translated by 谷歌翻译
We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data. To this end, we propose a two-stage framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at a random location of a large image. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects. We bring the improvement upon previous arts by 3.1 AUCs when learning representations from scratch. By transfer learning on pretrained representations on ImageNet, we achieve a new state-of-theart 96.6 AUC. Lastly, we extend the framework to learn and extract representations from patches to allow localizing defective areas without annotations during training.
translated by 谷歌翻译
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. This domain has seen fast progress recently, at the cost of requiring more complex methods. In this paper we propose FixMatch, an algorithm that is a significant simplification of existing SSL methods. FixMatch first generates pseudo-labels using the model's predictions on weaklyaugmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -just 4 labels per class. We carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. The code is available at https://github.com/google-research/fixmatch.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译