Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.
translated by 谷歌翻译
Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. Compared with classical binary classification, the task of PU learning is much more challenging due to the existence of many incompletely-annotated data instances. Since only part of the most confident positive samples are available and evidence is not enough to categorize the rest samples, many of these unlabeled data may also be the positive samples. Research on this topic is particularly useful and essential to many real-world tasks which demand very expensive labelling cost. For example, the recognition tasks in disease diagnosis, recommendation system and satellite image recognition may only have few positive samples that can be annotated by the experts. These methods mainly omit the intrinsic hardness of some unlabeled data, which can result in sub-optimal performance as a consequence of fitting the easy noisy data and not sufficiently utilizing the hard data. In this paper, we focus on improving the commonly-used nnPU with a novel training pipeline. We highlight the intrinsic difference of hardness of samples in the dataset and the proper learning strategies for easy and hard data. By considering this fact, we propose first splitting the unlabeled dataset with an early-stop strategy. The samples that have inconsistent predictions between the temporary and base model are considered as hard samples. Then the model utilizes a noise-tolerant Jensen-Shannon divergence loss for easy data; and a dual-source consistency regularization for hard data which includes a cross-consistency between student and base model for low-level features and self-consistency for high-level features and predictions, respectively.
translated by 谷歌翻译
The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great practical values in the real-world applications. Despite extensive previous efforts are made on few-shot learning tasks, we emphasize that most existing methods did not take into account the distributional shift caused by sample selection bias in the FSL scenario. Such a selection bias can induce spurious correlation between the semantic causal features, that are causally and semantically related to the class label, and the other non-causal features. Critically, the former ones should be invariant across changes in distributions, highly related to the classes of interest, and thus well generalizable to novel classes, while the latter ones are not stable to changes in the distribution. To resolve this problem, we propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency by replacing the patch-level information and supervision of the query images with random gallery images from different classes from the query ones. We theoretically show that such an augmentation mechanism, different from existing ones, is able to identify the causal features. To further make these features to be discriminative enough for classification, we propose Correlation-guided Reconstruction (CGR) and Hardness-Aware module for instance discrimination and easier discrimination between similar classes. Moreover, such a framework can be adapted to the unsupervised FSL scenario.
translated by 谷歌翻译
This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.
translated by 谷歌翻译
4D隐式表示中的最新进展集中在全球控制形状和运动的情况下,低维潜在向量,这很容易缺少表面细节和累积跟踪误差。尽管许多深层的本地表示显示了3D形状建模的有希望的结果,但它们的4D对应物尚不存在。在本文中,我们通过提出一个新颖的局部4D隐性代表来填补这一空白,以动态穿衣人,名为Lord,具有4D人类建模和局部代表的优点,并实现具有详细的表面变形的高保真重建,例如衣服皱纹。特别是,我们的主要见解是鼓励网络学习本地零件级表示的潜在代码,能够解释本地几何形状和时间变形。为了在测试时间进行推断,我们首先估计内部骨架运动在每个时间步中跟踪本地零件,然后根据不同类型的观察到的数据通过自动编码来优化每个部分的潜在代码。广泛的实验表明,该提出的方法具有强大的代表4D人类的能力,并且在实际应用上胜过最先进的方法,包括从稀疏点,非刚性深度融合(质量和定量)进行的4D重建。
translated by 谷歌翻译
异质图卷积网络在解决异质网络数据的各种网络分析任务方面已广受欢迎,从链接预测到节点分类。但是,大多数现有作品都忽略了多型节点之间的多重网络的关系异质性,而在元路径中,元素嵌入中关系的重要性不同,这几乎无法捕获不同关系跨不同关系的异质结构信号。为了应对这一挑战,这项工作提出了用于异质网络嵌入的多重异质图卷积网络(MHGCN)。我们的MHGCN可以通过多层卷积聚合自动学习多重异质网络中不同长度的有用的异质元路径相互作用。此外,我们有效地将多相关结构信号和属性语义集成到学习的节点嵌入中,并具有无监督和精选的学习范式。在具有各种网络分析任务的五个现实世界数据集上进行的广泛实验表明,根据所有评估指标,MHGCN与最先进的嵌入基线的优势。
translated by 谷歌翻译
功能表示学习是基于学习的多视图立体声(MVS)的关键配方。作为基于学习的MVS的共同特征提取器,香草特征金字塔网络(FPN)遭受了灰心的功能表示形式,用于反射和无纹理区域,这限制了MV的概括。即使是FPN与预训练的卷积神经网络(CNN)一起工作,也无法解决这些问题。另一方面,视觉变形金刚(VIT)在许多2D视觉任务中取得了突出的成功。因此,我们问VIT是否可以促进MV中的功能学习?在本文中,我们提出了一个名为MVSFormer的预先培训的VIT增强MVS网络,该网络可以学习更多可靠的功能表示,从VIT提供的信息学先验受益。然后,分别使用固定的VIT权重和可训练的MVSFormer-P和MVSFormer-H进一步提出。 MVSFormer-P更有效,而MVSFormer-H可以实现卓越的性能。为了使VIT对MVS任务的任意分辨率进行强大的vits,我们建议使用有效的多尺度培训并积累梯度。此外,我们讨论了分类和基于回归的MVS方法的优点和缺点,并进一步建议将其统一使用基于温度的策略。 MVSFormer在DTU数据集上实现最先进的性能。特别是,与其他已发表的作品相比,我们对MVSFormer的匿名提交在中级和高级坦克排行榜上排名最高的位置。代码和模型将发布。
translated by 谷歌翻译
通过利用深层神经网络(DNN)来建模各种先前的信息以恢复图像,许多最近的介绍作品都取得了令人印象深刻的结果。不幸的是,这些方法的性能在很大程度上受到了香草卷积神经网络(CNNS)骨架的表示能力的限制。另一方面,具有自我监督的预训练的视觉变压器(VIT)显示出许多视觉识别和许多视觉识别的潜力对象检测任务。一个自然的问题是,VIT主链是否可以大大受益?但是,直接替换在内部网络中的新骨干是不是很普遍的,因为indpainting与识别任务根本不同。为此,本文将基于训练的胶面膜自动编码器(MAE)结合到了indpaining模型中,该模型具有更丰富的信息学先验,以增强涂漆过程。此外,我们建议使用MAE的注意力学先验,以使介绍模型学习掩盖区域和未掩盖区域之间更多的长距离依赖关系。已经讨论了有关本文内部介绍和自我监督的预训练模型的足够消融。此外,对Ploce2和FFHQ的实验证明了我们提出的模型的有效性。代码和预培训模型在https://github.com/ewrfcas/mae-far中发布。
translated by 谷歌翻译
车道检测是许多实际自治系统的重要组成部分。尽管已经提出了各种各样的车道检测方法,但随着时间的推移报告了基准的稳定改善,但车道检测仍然是一个未解决的问题。这是因为大多数现有的车道检测方法要么将车道检测视为密集的预测或检测任务,因此很少有人考虑泳道标记的独特拓扑(Y形,叉形,几乎是水平的车道),该拓扑标记物是该标记的。导致亚最佳溶液。在本文中,我们提出了一种基于继电器链预测的新方法检测。具体而言,我们的模型预测了分割图以对前景和背景区域进行分类。对于前景区域中的每个像素点,我们穿过前向分支和后向分支以恢复整个车道。每个分支都会解码传输图和距离图,以产生移动到下一个点的方向,以及逐步预测继电器站的步骤(下一个点)。因此,我们的模型能够沿车道捕获关键点。尽管它很简单,但我们的策略使我们能够在包括Tusimple,Culane,Curvelanes和Llamas在内的四个主要基准上建立新的最先进。
translated by 谷歌翻译