智能论文笔记

Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning

Ajinkya Tejankar , Soroush Abbasi Koohpayegani , KL Navaneet , Kossar Pourahmadi , Akshayvarun Subramanya , Hamed Pirsiavash

分类：计算机视觉

2021-12-08

我们对自我监督，监督或半监督设置的代表学习感兴趣。在应用自我监督学习的平均移位思想的事先工作，通过拉动查询图像来概括拜尔的想法，不仅更接近其其他增强，而且还可以到其他增强的最近邻居（NNS）。我们认为，学习可以从选择远处与查询相关的邻居选择遥远的邻居。因此，我们建议通过约束最近邻居的搜索空间来概括MSF算法。我们显示我们的方法在SSL设置中优于MSF，当约束使用不同的图像时，并且当约束确保NNS具有与查询相同的伪标签时，在半监控设置中优于培训资源的半监控设置中的爪子。

translated by 谷歌翻译

Delving into Inter-Image Invariance for Unsupervised Visual Representations

Jiahao Xie , Xiaohang Zhan , Ziwei Liu , Yew Soon Ong , Chen Change Loy

分类：计算机视觉 | 机器学习

2020-08-26

对比度学习最近在无监督的视觉表示学习中显示出巨大的潜力。在此轨道中的现有研究主要集中于图像内不变性学习。学习通常使用丰富的图像内变换来构建正对，然后使用对比度损失最大化一致性。相反，相互影响不变性的优点仍然少得多。利用图像间不变性的一个主要障碍是，尚不清楚如何可靠地构建图像间的正对，并进一步从它们中获得有效的监督，因为没有配对注释可用。在这项工作中，我们提出了一项全面的实证研究，以更好地了解从三个主要组成部分的形象间不变性学习的作用：伪标签维护，采样策略和决策边界设计。为了促进这项研究，我们引入了一个统一的通用框架，该框架支持无监督的内部和间形内不变性学习的整合。通过精心设计的比较和分析，揭示了多个有价值的观察结果：1）在线标签收敛速度比离线标签更快； 2）半硬性样品比硬否定样品更可靠和公正； 3）一个不太严格的决策边界更有利于形象间的不变性学习。借助所有获得的食谱，我们的最终模型（即InterCLR）对多个标准基准测试的最先进的内图内不变性学习方法表现出一致的改进。我们希望这项工作将为设计有效的无监督间歇性不变性学习提供有用的经验。代码：https：//github.com/open-mmlab/mmselfsup。

translated by 谷歌翻译

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Mathilde Caron , Ishan Misra , Julien Mairal , Priya Goyal , Piotr Bojanowski , Armand Joulin

分类：

2020-06-17

Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or "views") of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a "swapped" prediction mechanism where we predict the code of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.

translated by 谷歌翻译

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron , Hugo Touvron , Ishan Misra , Hervé Jégou , Julien Mairal , Piotr Bojanowski , Armand Joulin

分类：

2021-04-29

translated by 谷歌翻译

Bootstrap your own latent: A new approach to self-supervised Learning

Jean-Bastien Grill , Florian Strub , Florent Altché , Corentin Tallec , Pierre H. Richemond , Elena Buchatskaya , Carl Doersch , Bernardo Avila Pires , Zhaohan Daniel Guo , Mohammad Gheshlaghi Azar

分类：

2020-06-13

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches 74.3% top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and 79.6% with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub. 3 * Equal contribution; the order of first authors was randomly selected.

translated by 谷歌翻译

Revisiting Unsupervised Meta-Learning via the Characteristics of Few-Shot Tasks

Han-Jia Ye , Lu Han , De-Chuan Zhan

分类：计算机视觉

2020-11-30

元学习已成为几乎没有图像分类的实用方法，在该方法中，“学习分类器的策略”是在标记的基础类别上进行元学习的，并且可以应用于具有新颖类的任务。我们删除了基类标签的要求，并通过无监督的元学习（UML）学习可通用的嵌入。具体而言，任务发作是在元训练过程中使用未标记的基本类别的数据增强构建的，并且我们将基于嵌入式的分类器应用于新的任务，并在元测试期间使用标记的少量示例。我们观察到两个元素在UML中扮演着重要角色，即进行样本任务和衡量实例之间的相似性的方法。因此，我们获得了具有两个简单修改的强基线 - 一个足够的采样策略，每情节有效地构建多个任务以及半分解的相似性。然后，我们利用来自两个方向的任务特征以获得进一步的改进。首先，合成的混淆实例被合并以帮助提取更多的判别嵌入。其次，我们利用额外的特定任务嵌入转换作为元训练期间的辅助组件，以促进预先适应的嵌入式的概括能力。几乎没有学习基准的实验证明，我们的方法比以前的UML方法优于先前的UML方法，并且比其监督变体获得了可比甚至更好的性能。

translated by 谷歌翻译

Improving the Generalization of Supervised Models

Mert Bulent Sariyildiz , Yannis Kalantidis , Karteek Alahari , Diane Larlus

分类：计算机视觉 | 机器学习

2022-06-30

我们考虑在给定的分类任务（例如Imagenet-1k（IN1K））上训练深神网络的问题，以便它在该任务以及其他（未来）转移任务方面擅长。这两个看似矛盾的属性在改善模型的概括的同时保持其在原始任务上的性能之间实现了权衡。接受自我监督学习训练的模型（SSL）倾向于比其受监督的转移学习更好地概括。但是，他们仍然落后于In1k上的监督模型。在本文中，我们提出了一个有监督的学习设置，以利用两全其美的方式。我们使用最近的SSL模型的两个关键组成部分丰富了普通的监督培训框架：多尺度农作物用于数据增强和使用可消耗的投影仪。我们用内存库在即时计算的类原型中代替了班级权重的最后一层。我们表明，这三个改进导致IN1K培训任务和13个转移任务之间的权衡取决于更加有利的权衡。在所有探索的配置中，我们都会挑出两种模型：T-Rex实现了转移学习的新状态，并且超过了In1k上的Dino和Paws等最佳方法，以及与高度优化的RSB--相匹配的T-Rex*在IN1K上的A1模型，同时在转移任务上表现更好。项目页面和预估计的模型：https：//europe.naverlabs.com/t-rex

translated by 谷歌翻译

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam , Chun-Fu Chen , Rameswar Panda , Leonid Karlinsky , Rogerio Feris , Richard J. Radke

分类：计算机视觉

2021-06-14

大多数现有的工作在几次学习中，依赖于Meta-Learning网络在大型基础数据集上，该网络通常是与目标数据集相同的域。我们解决了跨域几秒钟的问题，其中基础和目标域之间存在大移位。与未标记的目标数据的跨域几秒识别问题在很大程度上在文献中毫无根据。启动是使用自我训练解决此问题的第一个方法。但是，它使用固定的老师在标记的基础数据集上返回，以为未标记的目标样本创建软标签。由于基本数据集和未标记的数据集来自不同的域，因此将基本数据集的类域中的目标图像投影，具有固定的预制模型可能是子最优的。我们提出了一种简单的动态蒸馏基方法，以方便来自新颖/基础数据集的未标记图像。我们通过从教师网络中的未标记图像的未标记版本的预测计算并将其与来自学生网络相同的相同图像的强大版本匹配来施加一致性正常化。教师网络的参数被更新为学生网络参数的指数移动平均值。我们表明所提出的网络了解可以轻松适应目标域的表示，即使它尚未在预先预测阶段的目标专用类别训练。我们的车型优于当前最先进的方法，在BSCD-FSL基准中的5次分类，3.6％的3.6％，并在传统的域名几枪学习任务中显示出竞争性能。

translated by 谷歌翻译

Adaptive Soft Contrastive Learning

Chen Feng , Ioannis Patras

分类：计算机视觉

2022-07-22

自我监督的学习最近在没有人类注释的情况下在表示学习方面取得了巨大的成功。主要方法（即对比度学习）通常基于实例歧视任务，即单个样本被视为独立类别。但是，假定所有样品都是不同的，这与普通视觉数据集中类似样品的自然分组相矛盾，例如同一狗的多个视图。为了弥合差距，本文提出了一种自适应方法，该方法引入了软样本间关系，即自适应软化对比度学习（ASCL）。更具体地说，ASCL将原始实例歧视任务转换为多实体软歧视任务，并自适应地引入样本间关系。作为现有的自我监督学习框架的有效简明的插件模块，ASCL就性能和效率都实现了多个基准的最佳性能。代码可从https://github.com/mrchenfeng/ascl_icpr2022获得。

translated by 谷歌翻译

SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

K L Navaneet , Soroush Abbasi Koohpayegani , Ajinkya Tejankar , Hamed Pirsiavash

分类：计算机视觉

2022-01-13

特征回归是将大型神经网络模型蒸馏到较小的功能回归。我们表明，随着网络架构的简单变化，回归可能会优于自我监督模型的知识蒸馏更复杂的最先进方法。令人惊讶的是，即使仅在蒸馏过程中仅使用并且在下游任务中丢弃时，将多层的Perceptron头部添加到CNN骨架上是有益的。因此，更深的非线性投影可以使用在不改变推理架构和时间的情况下准确地模仿老师。此外，我们利用独立的投影头来同时蒸馏多个教师网络。我们还发现，使用与教师和学生网络的输入相同的弱增强图像辅助蒸馏。Imagenet DataSet上的实验证明了各种自我监督蒸馏环境中提出的变化的功效。

translated by 谷歌翻译

HIERMATCH: Leveraging Label Hierarchies for Improving Semi-Supervised Learning

Ashima Garg , Shaurya Bagga , Yashvardhan Singh , Saket Anand

分类：计算机视觉

2021-10-30

半监督学习方法已成为对打击获得大量注释数据的挑战的活跃研究领域。为了提高半监督学习方法表现的目标，我们提出了一种新颖的框架，Hiematch，一种半监督方法，利用分层信息来降低标签成本并表现以及vanilla半监督学习方法。分层信息通常是具有细粒标签的粗标签（例如，啄木鸟）的粗标签（例如，啄木鸟）的现有知识（例如，柔软的啄木鸟或金朝啄木鸟）。但是，尚未探讨使用使用粗类标签来改进半监督技术的监督。在没有细粒度的标签的情况下，Himatch利用标签层次结构，并使用粗级标签作为弱监控信号。此外，Himatch是一种改进任何半熟的学习框架的通用方法，我们使用我们的结果在最近的最先进的技术Mixmatch和Fixmatch上展示了这一点。我们评估了在两个基准数据集，即CiFar-100和Nabirds上的Himatch疗效。与MixMatch相比，HOMACHACT可以在CIFAR-100上减少50％的粒度标签50％的用量，仅在前1个精度的边缘下降0.59％。代码：https://github.com/07agarg/hiermatch.

translated by 谷歌翻译

Unsupervised Data Selection for Data-Centric Semi-Supervised Learning

Xudong Wang , Long Lian , Stella X. Yu

分类：机器学习 | 计算机视觉

2021-10-06

我们研究了用于半监控学习（SSL）的无监督数据选择，其中可以提供大规模的未标记数据集，并且为标签采集预算小额数据子集。现有的SSL方法专注于学习一个有效地集成了来自给定小标记数据和大型未标记数据的信息的模型，而我们专注于选择正确的数据以用于SSL的注释，而无需任何标签或任务信息。直观地，要标记的实例应统称为下游任务的最大多样性和覆盖范围，并且单独具有用于SSL的最大信息传播实用程序。我们以三步数据为中心的SSL方法形式化这些概念，使稳定性和精度的纤维液改善8％的CiFar-10（标记为0.08％）和14％的Imagenet -1k（标记为0.2％）。它也是一种具有各种SSL方法的通用框架，提供一致的性能增益。我们的工作表明，在仔细选择注释数据上花费的小计算带来了大注释效率和模型性能增益，而无需改变学习管道。我们完全无监督的数据选择可以轻松扩展到其他弱监督的学习设置。

translated by 谷歌翻译

Hard Negative Mixing for Contrastive Learning

Yannis Kalantidis , Mert Bulent Sariyildiz , Noe Pion , Philippe Weinzaepfel , Diane Larlus

分类：

2020-10-02

Contrastive learning has become a key component of self-supervised learning approaches for computer vision. By learning to embed two augmented versions of the same image close to each other and to push the embeddings of different images apart, one can train highly transferable visual representations. As revealed by recent studies, heavy data augmentation and large sets of negatives are both crucial in learning such representations. At the same time, data mixing strategies, either at the image or the feature level, improve both supervised and semi-supervised learning by synthesizing novel examples, forcing networks to learn more robust features. In this paper, we argue that an important aspect of contrastive learning, i.e. the effect of hard negatives, has so far been neglected. To get more meaningful negative samples, current top contrastive self-supervised learning approaches either substantially increase the batch sizes, or keep very large memory banks; increasing memory requirements, however, leads to diminishing returns in terms of performance. We therefore start by delving deeper into a top-performing framework and show evidence that harder negatives are needed to facilitate better and faster learning. Based on these observations, and motivated by the success of data mixing, we propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead. We exhaustively ablate our approach on linear classification, object detection, and instance segmentation and show that employing our hard negative mixing procedure improves the quality of visual representations learned by a state-of-the-art self-supervised learning method.Project page: https://europe.naverlabs.com/mochi 34th Conference on Neural Information Processing Systems (NeurIPS 2020),

translated by 谷歌翻译

Incremental False Negative Detection for Contrastive Learning

Tsai-Shien Chen , Wei-Chih Hung , Hung-Yu Tseng , Shao-Yi Chien , Ming-Hsuan Yang

分类：计算机视觉

2021-06-07

通过对比学习，自我监督学习最近在视觉任务中显示了巨大的潜力，这旨在在数据集中区分每个图像或实例。然而，这种情况级别学习忽略了实例之间的语义关系，有时不希望地从语义上类似的样本中排斥锚，被称为“假否定”。在这项工作中，我们表明，对于具有更多语义概念的大规模数据集来说，虚假否定的不利影响更为重要。为了解决这个问题，我们提出了一种新颖的自我监督的对比学习框架，逐步地检测并明确地去除假阴性样本。具体地，在训练过程之后，考虑到编码器逐渐提高，嵌入空间变得更加语义结构，我们的方法动态地检测增加的高质量假否定。接下来，我们讨论两种策略，以明确地在对比学习期间明确地消除检测到的假阴性。广泛的实验表明，我们的框架在有限的资源设置中的多个基准上表现出其他自我监督的对比学习方法。

translated by 谷歌翻译

Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations

Chaitanya K. Ryali , David J. Schwab , Ari S. Morcos

分类：计算机视觉 | 人工智能

2021-03-23

自我监督学习的最新进展证明了多种视觉任务的有希望的结果。高性能自我监督方法中的一个重要成分是通过培训模型使用数据增强，以便在嵌入空间附近的相同图像的不同增强视图。然而，常用的增强管道整体地对待图像，忽略图像的部分的语义相关性-e.g。主题与背景 - 这可能导致学习杂散相关性。我们的工作通过调查一类简单但高度有效的“背景增强”来解决这个问题，这鼓励模型专注于语义相关内容，劝阻它们专注于图像背景。通过系统的调查，我们表明背景增强导致在各种任务中跨越一系列最先进的自我监督方法（MOCO-V2，BYOL，SWAV）的性能大量改进。 $ \ SIM $ + 1-2％的ImageNet收益，使得与监督基准的表现有关。此外，我们发现有限标签设置的改进甚至更大（高达4.2％）。背景技术增强还改善了许多分布换档的鲁棒性，包括天然对抗性实例，想象群-9，对抗性攻击，想象成型。我们还在产生了用于背景增强的显着掩模的过程中完全无监督的显着性检测进展。

translated by 谷歌翻译

Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment

Dong Hoon Lee , Sungik Choi , Hyunwoo Kim , Sae-Young Chung

分类：计算机视觉

2022-11-04

This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.

translated by 谷歌翻译

Extending Momentum Contrast with Cross Similarity Consistency Regularization

Mehdi Seyfi , Amin Banitalebi-Dehkordi , Yong Zhang

分类：机器学习 | 人工智能 | 计算机视觉

2022-06-07

对比性自我监督表示方法学习方法最大程度地提高了正对之间的相似性，同时倾向于最大程度地减少负对之间的相似性。但是，总的来说，负面对之间的相互作用被忽略了，因为它们没有根据其特定差异和相似性而采用的特殊机制来对待负面对。在本文中，我们提出了扩展的动量对比（Xmoco），这是一种基于MOCO家族配置中提出的动量编码单元的遗产，一种自我监督的表示方法。为此，我们引入了交叉一致性正则化损失，并通过该损失将转换一致性扩展到不同图像（负对）。在交叉一致性正则化规则下，我们认为与任何一对图像（正或负）相关的语义表示应在借口转换下保留其交叉相似性。此外，我们通过在批处理上的负面对上实施相似性的均匀分布来进一步规范训练损失。可以轻松地将所提出的正规化添加到现有的自我监督学习算法中。从经验上讲，我们报告了标准Imagenet-1K线性头部分类基准的竞争性能。此外，通过将学习的表示形式转移到常见的下游任务中，我们表明，将Xmoco与普遍使用的增强功能一起使用可以改善此类任务的性能。我们希望本文的发现是研究人员考虑自我监督学习中负面例子的重要相互作用的动机。

translated by 谷歌翻译

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

Sheng Zhang , Salman Khan , Zhiqiang Shen , Muzammal Naseer , Guangyi Chen , Fahad Khan

分类：计算机视觉

2022-12-11

Although existing semi-supervised learning models achieve remarkable success in learning with unannotated in-distribution data, they mostly fail to learn on unlabeled data sampled from novel semantic classes due to their closed-set assumption. In this work, we target a pragmatic but under-explored Generalized Novel Category Discovery (GNCD) setting. The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes. We propose a two-stage Contrastive Affinity Learning method with auxiliary visual Prompts, dubbed PromptCAL, to address this challenging problem. Our approach discovers reliable pairwise sample affinities to learn better semantic clustering of both known and novel classes for the class token and visual prompts. First, we propose a discriminative prompt regularization loss to reinforce semantic discriminativeness of prompt-adapted pre-trained vision transformer for refined affinity relationships. Besides, we propose a contrastive affinity learning stage to calibrate semantic representations based on our iterative semi-supervised affinity graph generation method for semantically-enhanced prompt supervision. Extensive experimental evaluation demonstrates that our PromptCAL method is more effective in discovering novel classes even with limited annotations and surpasses the current state-of-the-art on generic and fine-grained benchmarks (with nearly $11\%$ gain on CUB-200, and $9\%$ on ImageNet-100) on overall accuracy.

translated by 谷歌翻译

Exploring Non-Contrastive Representation Learning for Deep Clustering

Zhizhong Huang , Jie Chen , Junping Zhang , Hongming Shan

分类：计算机视觉

2021-11-23

现有的深度聚类方法依赖于对比学习的对比学习，这需要否定例子来形成嵌入空间，其中所有情况都处于良好分离状态。但是，否定的例子不可避免地引起阶级碰撞问题，损害了群集的表示学习。在本文中，我们探讨了对深度聚类的非对比表示学习，被称为NCC，其基于Byol，一种没有负例的代表性方法。首先，我们建议将一个增强的实例与嵌入空间中的另一个视图的邻居对齐，称为正抽样策略，该域避免了由否定示例引起的类碰撞问题，从而提高了集群内的紧凑性。其次，我们建议鼓励在所有原型中的一个原型和均匀性的两个增强视图之间的对准，命名的原型是原型的对比损失或protocl，这可以最大化簇间距离。此外，我们在期望 - 最大化（EM）框架中制定了NCC，其中E-Step利用球面K手段来估计实例的伪标签和来自目标网络的原型的分布，并且M-Step利用了所提出的损失优化在线网络。结果，NCC形成了一个嵌入空间，其中所有集群都处于分离良好，而内部示例都很紧凑。在包括ImageNet-1K的几个聚类基准数据集上的实验结果证明了NCC优于最先进的方法，通过显着的余量。

translated by 谷歌翻译

A soft nearest-neighbor framework for continual semi-supervised learning

Zhiqi Kang , Enrico Fini , Moin Nabi , Elisa Ricci , Karteek Alahari

分类：计算机视觉 | 机器学习

2022-12-09

Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).

translated by 谷歌翻译