translated by 谷歌翻译
深度神经网络(DNN)的训练过程通常是用阶段进行管道的,用于在CPU上进行数据制备,然后对GPU等加速器进行梯度计算。在理想的管道中,端到端训练吞吐量最终受到加速器的吞吐量的限制,而不是数据准备。过去,DNN训练管道通过使用使用轻巧,有损的图像格式(如JPEG)编码的数据集实现了近乎最佳的吞吐量。但是,随着高分辨率,无损编码的数据集变得越来越流行,对于需要高精度的应用程序,由于CPU上的低通量图像解码,在数据准备阶段出现了性能问题。因此,我们提出了L3,这是一种用于高分辨率,高通量DNN训练的定制轻巧,无损的图像格式。 L3的解码过程在加速器上有效平行,从而最大程度地减少了在DNN培训期间进行数据制备的CPU干预。 L3比最流行的无损图像格式PNG获得了9.29倍的数据准备吞吐量,用于NVIDIA A100 GPU上的CityScapes数据集,该数据集可导致1.71倍更高的端到端训练吞吐量。与JPEG和WebP相比,两种流行的有损图像格式,L3分别以同等的度量性能为Imagenet提供高达1.77倍和2.87倍的端到端训练吞吐量。
translated by 谷歌翻译
实际数据集中不可避免地有许多错误标记的数据。由于深度神经网络(DNNS)具有记忆标签的巨大能力,因此需要强大的训练方案来防止标签错误降低DNN的概括性能。当前的最新方法提出了一种共同训练方案,该方案使用与小损失相关的样本训练双网络。但是,实际上,培训两个网络可以同时负担计算资源。在这项研究中,我们提出了一种简单而有效的健壮培训计划,该计划仅通过培训一个网络来运行。在训练过程中,提出的方法通过从随机梯度下降优化形成的重量轨迹中抽样中间网络参数来生成时间自我启动。使用这些自我归档评估的损失总和用于识别错误标记的样品。同时,我们的方法通过将输入数据转换为各种形式,并考虑其协议以识别错误标记的样本来生成多视图预测。通过结合上述指标,我们介绍了提出的{\ it基于自动化的鲁棒训练}(SRT)方法,该方法可以用嘈杂的标签过滤样品,以减少其对训练的影响。广泛使用的公共数据集的实验表明,所提出的方法在某些类别中实现了最新的性能,而无需训练双网络。
translated by 谷歌翻译
我们通过补充每个图像的弱点将内扫描(iOS)和牙科锥形电脑层析术(CBCT)图像集成到一个图像中的完全自动化方法。单独的牙科CBCT可能无法通过有限的图像分辨率和各种CBCT伪像(包括金属诱导的伪像)来描绘牙齿表面的精确细节。 iOS非常准确地扫描窄区域,但它在全拱扫描过程中产生累积缝合误差。该方法不仅要补偿具有iOS的CBCT衍生的牙齿表面的低质量,而且还要校正整个牙弓的IOS的累积拼接误差。此外,整合提供了一种图像中CBCT的IOS和齿根的牙龈结构。所提出的全自动方法包括四个部分; (i)iOS数据(TSIM-iOS)的单个牙齿分割和识别模块; (ii)CBCT数据(TSIM-CBCT)的个体齿分割和识别模块; (iii)IOS和CBCT之间的全球到局部牙齿登记; (iv)全拱ios的缝合纠错。实验结果表明,该方法分别达到了0.11mm和0.30mm的地标和表面距离误差。
translated by 谷歌翻译
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
Domain adaptation methods reduce domain shift typically by learning domain-invariant features. Most existing methods are built on distribution matching, e.g., adversarial domain adaptation, which tends to corrupt feature discriminability. In this paper, we propose Discriminative Radial Domain Adaptation (DRDR) which bridges source and target domains via a shared radial structure. It's motivated by the observation that as the model is trained to be progressively discriminative, features of different categories expand outwards in different directions, forming a radial structure. We show that transferring such an inherently discriminative structure would enable to enhance feature transferability and discriminability simultaneously. Specifically, we represent each domain with a global anchor and each category a local anchor to form a radial structure and reduce domain shift via structure matching. It consists of two parts, namely isometric transformation to align the structure globally and local refinement to match each category. To enhance the discriminability of the structure, we further encourage samples to cluster close to the corresponding local anchors based on optimal-transport assignment. Extensively experimenting on multiple benchmarks, our method is shown to consistently outperforms state-of-the-art approaches on varied tasks, including the typical unsupervised domain adaptation, multi-source domain adaptation, domain-agnostic learning, and domain generalization.
translated by 谷歌翻译
This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.
translated by 谷歌翻译