Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, the temperature scalar is exactly the scale of uncertainty noise implicitly added in the softmax function. Meanwhile, the uncertainty of images in a dataset should follow a prior distribution. Based on the observation, a unified framework for uncertainty modeling and FR, Random Temperature Scaling (RTS), is proposed to learn a reliable FR algorithm. The benefits of RTS are two-fold. (1) In the training phase, it can adjust the learning strength of clean and noisy samples for stability and accuracy. (2) In the test phase, it can provide a score of confidence to detect uncertain, low-quality and even OOD samples, without training on extra labels. Extensive experiments on FR benchmarks demonstrate that the magnitude of variance in RTS, which serves as an OOD detection metric, is closely related to the uncertainty of the input image. RTS can achieve top performance on both the FR and OOD detection tasks. Moreover, the model trained with RTS can perform robustly on datasets with noise. The proposed module is light-weight and only adds negligible computation cost to the model.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) has been highly successful in transferring knowledge acquired from a label-rich source domain to a label-scarce target domain. Open-set domain adaptation (ODA) and universal domain adaptation (UNDA) have been proposed as solutions to the problem concerning the presence of additional novel categories in the target domain. Existing ODA and UNDA approaches treat all novel categories as one unified unknown class and attempt to detect this unknown class during the training process. We find that domain variance leads to more significant view-noise in unsupervised data augmentation, affecting the further applications of contrastive learning~(CL), as well as the current closed-set classifier and open-set classifier causing the model to be overconfident in novel class discovery. To address the above two issues, we propose Soft-contrastive All-in-one Network~(SAN) for ODA and UNDA tasks. SAN includes a novel data-augmentation-based CL loss, which is used to improve the representational capability, and a more human-intuitive classifier, which is used to improve the new class discovery capability. The soft contrastive learning~(SCL) loss is used to weaken the adverse effects of the data-augmentation label noise problem, which is amplified in domain transfer. The All-in-One~(AIO) classifier overcomes the overconfidence problem of the current mainstream closed-set classifier and open-set classifier in a more human-intuitive way. The visualization results and ablation experiments demonstrate the importance of the two proposed innovations. Moreover, extensive experimental results on ODA and UNDA show that SAN has advantages over the existing state-of-the-art methods.
translated by 谷歌翻译
在受监督和无监督的设置的基于学习的多视图立体声(MV)中,已经看到了重大进展。为了结合其在准确性和完整性方面的优点,同时减少了对昂贵标签数据的需求,本文探讨了一种新型的基于学习的MVS问题的新型半监督设置,该设置只有MVS数据的一小部分与密集的深度地面真相相连。但是,由于方案和视图中灵活的设置的巨大变化,半监督的MVS问题(半MV)可能会破坏经典的半监督学习中的基本假设,该假设未标记的数据和标记的数据共享相同的标签空间和数据分布。为了解决这些问题,我们提出了一个新颖的半监督MVS框架,即SE-MVS。对于基本假设在MVS数据中起作用的简单情况,一致性正则化鼓励模型预测在原始样本和随机增强样品之间通过KL差异的限制保持一致。对于MVS数据中基本假设有冲突的进一步麻烦案例,我们提出了一种新型的样式一致性损失,以减轻分布差距引起的负面影响。未标记的样品的视觉样式被转移到标记的样品中以缩小差距,并且在原始标记的样品中使用标签进一步监督了生成样品的模型预测。 DTU,BlendenDMV,GTA-SFM和Tanks \&Temples数据集的实验结果显示了该方法的出色性能。在骨干网络中使用相同的设置,我们提出的SE-MV优于其完全监督和无监督的基线。
translated by 谷歌翻译
流动学习〜(ML)旨在从高维数据中找到低维的嵌入。以前的作品专注于具有简单和理想场景的手工艺品或简单的数据集;但是,我们发现它们在带有不足数据的现实世界数据集上的性能很差。通常,ML方法主要是对数据结构进行建模,并随后处理低维嵌入,在前步骤中,不足采样数据的局部连通性较差,而后来步骤中不适当的优化目标将导致\ emph {结构失真}和\ \ \ \ \ \ \ \ \ \ \ emph {不合适的嵌入}。为了解决这个问题,我们提出了深层局部流动性歧管嵌入(DLME),这是一种新型的ML框架,可通过减少失真来获得可靠的歧管嵌入。我们提出的DLME通过数据增强来构建语义歧管,并在其平滑框架的帮助下克服了\ emph {结构失真}问题。为了克服\ emph {不合适的嵌入},我们为DLME设计了一个特定的损失,并在数学上表明它会根据我们提出的局部平坦度假设导致更合适的嵌入。在实验中,通过显示DLME对具有三种类型的数据集(玩具,生物学和图像)的下游分类,聚类和可视化任务的有效性,我们的实验结果表明,DLME胜过SOTA ML \&Chortantive Learning(CL)方法(CL)方法。
translated by 谷歌翻译
零射门学习(ZSL)旨在通过将语义知识从看见课程转移到看不见者来识别新颖的课程。从不同类别之间共享的属性描述中学到的语义知识,该属性描述是用于本地化代表歧视区域特征的对象属性的强子指数,从而实现了显着的视觉语义交互。尽管基于注意的模型已经尝试学习单个图像中的这种区域特征,但是通常忽略视觉特征的可转换性和辨别性属性定位。在本文中,我们提出了一个属性引导的变压器网络,称为Transzero,以改进视觉特征,并在ZSL中鉴定鉴别的视觉嵌入表示。具体而言,Transzero采用特征增强编码器来缓解想象集和ZSL基准之间的交叉数据集偏压,并通过减少区域特征之间的缠结的相对几何关系来提高视觉特征的可转换性。为了学习地区增强的可视功能,Transzero使用视觉语义解码器来在语义属性信息的指导下本地化与给定图像中的每个属性最相关的图像区域。然后,用于在视觉语义嵌入网络中进行有效的视觉语义交互来实现局部增强的视觉特征和语义向量。广泛的实验表明,Transzero在三个ZSL基准上实现了新的最新状态。该代码可用于:\ url {https://github.com/shiming-chen/transzero}。
translated by 谷歌翻译
受益于通用对象探测器的开创性设计,面部检测领域已经取得了重大成就。通常,骨干,特征金字塔层和面部检测器内的检测头模块的架构都同化了一般物体探测器的优异体验。然而,几种有效方法,包括标签分配和尺度级数据增强策略\脚注{丰富了培训数据的规模分布,以解决尺度方差挑战。},在直接施加面部探测器时,不能保持一致的优势。具体地,前者策略涉及庞大的超参数,后者患有不同检测任务之间的规模分布偏差的挑战,这都限制了它们的概括能力。此外,为了提供用于面部下游任务的精确面边界盒,面部检测器要求消除误报。因此,对推进面部检测器需要对标签分配,尺度级数据增强和减少误报的实用解决方案。在本文中,我们专注于解决三个上述挑战,即退出方法难以结束并呈现新的面部探测器,称为摩戈。在我们的MOGFACE中,三个关键组件,自适应在线增量锚挖掘策略,选择性缩放增强策略和分层上下文感知模块,分别提出促进面部检测的性能。最后,据我们所知,我们的摩日脸是更广泛的面部领导板上最好的面部探测器,在不同的测试场景中实现所有冠军。代码可在https://github.com/idstcv/mogface上获得
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译