未配对的图像到图像翻译旨在找到源域和目标域之间的映射。为了减轻缺乏源图像的监督标签的问题,通过假设未配对的图像之间的可逆关系,已经提出了基于周期矛盾的方法来保存图像结构。但是,此假设仅使用图像对之间的有限对应关系。最近,使用基于贴片的正/负学习,对比度学习(CL)已被用来进一步研究未配对图像翻译中的图像对应关系。基于贴片的对比例程通过自相似度计算获得阳性,并将其余的斑块视为负面。这种灵活的学习范式以低成本获得辅助上下文化信息。由于负面的样本人数令人印象深刻,因此我们有好奇心,我们基于一个问题进行了调查:是否需要所有负面的对比度学习?与以前的CL方法不同,在本文中,我们从信息理论的角度研究了负面因素,并通过稀疏和对补丁进行排名来引入一种新的负面修剪技术,以用于未配对的图像到图像翻译(PUT) 。所提出的算法是有效的,灵活的,并使模型能够稳定地学习相应贴片之间的基本信息。通过将质量置于数量上,只需要几个负贴片即可获得更好的结果。最后,我们通过比较实验验证了模型的优势,稳定性和多功能性。
translated by 谷歌翻译
In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so -maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire images. Furthermore, we draw negatives from within the input image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each "domain" is only a single image.
translated by 谷歌翻译
培训监督图像综合模型需要批评评论权来比较两个图像:结果的原始真相。然而,这种基本功能仍然是一个公开问题。流行的方法使用L1(平均绝对误差)丢失,或者在预先预留的深网络的像素或特征空间中。然而,我们观察到这些损失倾向于产生过度模糊和灰色的图像,以及其他技术,如GAN需要用于对抗这些伪影。在这项工作中,我们介绍了一种基于信息理论的方法来测量两个图像之间的相似性。我们认为,良好的重建应该具有较高的相互信息与地面真相。这种观点使得能够以对比的方式学习轻量级批评者以“校准”特征空间,使得相应的空间贴片的重建被置于擦除其他贴片。我们表明,当用作L1损耗的替代时,我们的配方立即提升输出图像的感知现实主义,有或没有额外的GaN丢失。
translated by 谷歌翻译
实体图像超分辨率旨在将现实世界的低分辨率图像恢复到其高质量版本中。典型的RealSR框架通常包括针对不同图像属性设计的多个标准的优化,通过隐含的假设,即基地图像可以在不同标准之间提供良好的权衡。但是,由于不同图像属性之间固有的对比关系,因此在实践中很容易违反该假设。对比学习(CL)提供了一种有希望的食谱,可以通过使用三重态对比损失学习判别特征来缓解此问题。尽管CL在许多计算机视觉任务中取得了重大成功,但由于在这种情况下很难定义有效的阳性图像对,因此将CL引入REALSR是不平凡的。受到观察的启发,即标准之间也可能存在对比的关系,在这项工作中,我们提出了一种新颖的室友训练范式,称为标准比较学习(CRIA-CL),通过开发根据标准而不是图像贴片定义的对比损失。此外,提出了一个空间投影仪,以便在Realsr中获得CRIA-CL的良好视图。我们的实验表明,与典型的加权回归策略相比,我们的方法在相似的参数设置下取得了重大改进。
translated by 谷歌翻译
Image harmonization task aims at harmonizing different composite foreground regions according to specific background image. Previous methods would rather focus on improving the reconstruction ability of the generator by some internal enhancements such as attention, adaptive normalization and light adjustment, $etc.$. However, they pay less attention to discriminating the foreground and background appearance features within a restricted generator, which becomes a new challenge in image harmonization task. In this paper, we propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme. For the external style fusion, we leverage the external background appearance from the encoder as the style reference to generate harmonized foreground in the decoder. This approach enhances the harmonization ability of the decoder by external background guidance. Moreover, for the contrastive learning scheme, we design a region-wise contrastive loss function for image harmonization task. Specifically, we first introduce a straight-forward samples generation method that selects negative samples from the output harmonized foreground region and selects positive samples from the ground-truth background region. Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles, which desirably makes our harmonization network more robust to discriminate the foreground and background style features when harmonizing composite images. Extensive experiments on the benchmark datasets show that our method can achieve a clear improvement in harmonization quality and demonstrate the good generalization capability in real-scenario applications.
translated by 谷歌翻译
这项工作提出了一个新颖的框架CISFA(对比图像合成和自我监督的特征适应),该框架建立在图像域翻译和无监督的特征适应性上,以进行跨模式生物医学图像分割。与现有作品不同,我们使用单方面的生成模型,并在输入图像的采样贴片和相应的合成图像之间添加加权贴片对比度损失,该图像用作形状约束。此外,我们注意到生成的图像和输入图像共享相似的结构信息,但具有不同的方式。因此,我们在生成的图像和输入图像上强制实施对比损失,以训练分割模型的编码器,以最大程度地减少学到的嵌入空间中成对图像之间的差异。与依靠对抗性学习进行特征适应的现有作品相比,这种方法使编码器能够以更明确的方式学习独立于域的功能。我们对包含腹腔和全心的CT和MRI图像的分割任务进行了广泛评估。实验结果表明,所提出的框架不仅输出了较小的器官形状变形的合成图像,而且还超过了最先进的域适应方法的较大边缘。
translated by 谷歌翻译
图像到图像转换是最近使用生成对冲网络(GaN)将图像从一个域转换为另一个域的趋势。现有的GaN模型仅利用转换的输入和输出方式执行培训。在本文中,我们执行GaN模型的语义注射训练。具体而言,我们用原始输入和输出方式训练,并注入几个时代,用于从输入到语义地图的翻译。让我们将原始培训称为输入图像转换为目标域的培训。原始训练中的语义训练注射改善了训练的GaN模型的泛化能力。此外,它还以更好的方式在生成的图像中以更好的方式保留分类信息。语义地图仅在训练时间使用,并且在测试时间不需要。通过在城市景观和RGB-NIR立体数据集上使用最先进的GaN模型进行实验。与原始训练相比,在注入语义训练后,我们遵守SSIM,FID和KID等方面的提高性能。
translated by 谷歌翻译
Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
translated by 谷歌翻译
对比学习在各种高级任务中取得了显着的成功,但是为低级任务提出了较少的方法。采用VANILLA对比学习技术采用直接为低级视觉任务提出的VANILLA对比度学习技术,因为所获得的全局视觉表现不足以用于需要丰富的纹理和上下文信息的低级任务。在本文中,我们提出了一种用于单图像超分辨率(SISR)的新型对比学习框架。我们从两个视角调查基于对比的学习的SISR:样品施工和特征嵌入。现有方法提出了一些天真的样本施工方法(例如,考虑到作为负样本的低质量输入以及作为正样品的地面真理),并且它们采用了先前的模型(例如,预先训练的VGG模型)来获得该特征嵌入而不是探索任务友好的。为此,我们向SISR提出了一个实用的对比学习框架,涉及在频率空间中产生许多信息丰富的正负样本。我们不是利用其他预先训练的网络,我们设计了一种从鉴别器网络继承的简单但有效的嵌入网络,并且可以用主SR网络迭代优化,使其成为任务最通报。最后,我们对我们的方法进行了广泛的实验评估,与基准方法相比,在目前的最先进的SISR方法中显示出高达0.21 dB的显着增益。
translated by 谷歌翻译
未配对的图像到图像转换的目标是产生反映目标域样式的输出图像,同时保持输入源图像的不相关内容不变。但是,由于缺乏对现有方法的内容变化的关注,来自源图像的语义信息遭受翻译期间的降级。在论文中,为了解决这个问题,我们介绍了一种新颖的方法,全局和局部对齐网络(GLA-NET)。全局对齐网络旨在将输入图像从源域传输到目标域。要有效地这样做,我们通过使用MLP-MILLER基于MATY编码器将多元高斯分布的参数(均值和标准偏差)作为样式特征学习。要更准确地传输样式,我们在编码器中使用自适应实例归一化层,具有目标多功能高斯分布的参数作为输入。我们还采用正常化和可能性损失,以进一步降低领域差距并产生高质量的产出。另外,我们介绍了局部对准网络,该网络采用预磨平的自我监督模型来通过新颖的局部对准丢失来产生注意图,确保翻译网络专注于相关像素。在五个公共数据集上进行的广泛实验表明,我们的方法有效地产生比现有方法更锐利和更现实的图像。我们的代码可在https://github.com/ygjwd12345/glanet获得。
translated by 谷歌翻译
自动驾驶汽车必须能够可靠地处理不利的天气条件(例如,雪地)安全运行。在本文中,我们研究了以不利条件捕获的转动传感器输入(即图像)的想法,将其下游任务(例如,语义分割)可以达到高精度。先前的工作主要将其作为未配对的图像到图像翻译问题,因为缺乏在完全相同的相机姿势和语义布局下捕获的配对图像。虽然没有完美对准的图像,但可以轻松获得粗配上的图像。例如,许多人每天在好天气和不利的天气中驾驶相同的路线;因此,在近距离GPS位置捕获的图像可以形成一对。尽管来自重复遍历的数据不太可能捕获相同的前景对象,但我们认为它们提供了丰富的上下文信息来监督图像翻译模型。为此,我们提出了一个新颖的训练目标,利用了粗糙的图像对。我们表明,我们与一致的训练方案可提高更好的图像翻译质量和改进的下游任务,例如语义分割,单眼深度估计和视觉定位。
translated by 谷歌翻译
Person re-identification (re-ID) models trained on one domain often fail to generalize well to another. In our attempt, we present a "learning via translation" framework. In the baseline, we translate the labeled images from source to target domain in an unsupervised manner. We then train re-ID models with the translated images by supervised methods. Yet, being an essential part of this framework, unsupervised image-image translation suffers from the information loss of source-domain labels during translation.Our motivation is two-fold. First, for each image, the discriminative cues contained in its ID label should be maintained after translation. Second, given the fact that two domains have entirely different persons, a translated image should be dissimilar to any of the target IDs. To this end, we propose to preserve two types of unsupervised similarities, 1) self-similarity of an image before and after translation, and 2) domain-dissimilarity of a translated source image and a target image. Both constraints are implemented in the similarity preserving generative adversarial network (SPGAN) which consists of an Siamese network and a Cy-cleGAN. Through domain adaptation experiment, we show that images generated by SPGAN are more suitable for domain adaptation and yield consistent and competitive re-ID accuracy on two large-scale datasets.
translated by 谷歌翻译
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph, 1) the similarity between the original graph and the generated augmented graph gradually decreases; 2) the discrimination between all nodes within each augmented view gradually increases. In this paper, we argue that both such prior information can be incorporated (differently) into the contrastive learning paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
translated by 谷歌翻译
最近流行的对比学习范式提出了无监督的哈希的发展。但是,以前的基于学习的作品受到(1)基于全球图像表示的数据相似性挖掘的障碍,以及(2)由数据增强引起的哈希代码语义损失。在本文中,我们提出了一种新颖的方法,即加权的伴侣哈希(WCH),以朝着解决这两个问题迈出一步。我们介绍了一个新型的相互注意模块,以减轻由缺失的图像结构引起的网络特征中信息不对称问题的问题。此外,我们探索了图像之间的细粒语义关系,即,我们将图像分为多个斑块并计算斑块之间的相似性。反映深层图像关系的聚合加权相似性是经过蒸馏而来的,以促进哈希码以蒸馏损失的方式学习,从而获得更好的检索性能。广泛的实验表明,所提出的WCH在三个基准数据集上显着优于现有的无监督哈希方法。
translated by 谷歌翻译
尽管具有生成对抗网络(GAN)的图像到图像(I2I)翻译的显着进步,但使用单对生成器和歧视器将图像有效地转换为多个目标域中的一组不同图像仍然具有挑战性。现有的I2i翻译方法采用多个针对不同域的特定于域的内容编码,其中每个特定于域的内容编码器仅经过来自同一域的图像的训练。然而,我们认为应从所有域之间的图像中学到内容(域变相)特征。因此,现有方案的每个特定于域的内容编码器都无法有效提取域不变特征。为了解决这个问题,我们提出了一个灵活而通用的Sologan模型,用于在多个域之间具有未配对数据的多模式I2I翻译。与现有方法相反,Solgan算法使用具有附加辅助分类器的单个投影鉴别器,并为所有域共享编码器和生成器。因此,可以使用来自所有域的图像有效地训练Solgan,从而可以有效提取域 - 不变性内容表示。在多个数据集中,针对多个同行和sologan的变体的定性和定量结果证明了该方法的优点,尤其是对于挑战i2i翻译数据集的挑战,即涉及极端形状变化的数据集或在翻译后保持复杂的背景,需要保持复杂的背景。此外,我们通过消融研究证明了Sogan中每个成分的贡献。
translated by 谷歌翻译
Segmenting the fine structure of the mouse brain on magnetic resonance (MR) images is critical for delineating morphological regions, analyzing brain function, and understanding their relationships. Compared to a single MRI modality, multimodal MRI data provide complementary tissue features that can be exploited by deep learning models, resulting in better segmentation results. However, multimodal mouse brain MRI data is often lacking, making automatic segmentation of mouse brain fine structure a very challenging task. To address this issue, it is necessary to fuse multimodal MRI data to produce distinguished contrasts in different brain structures. Hence, we propose a novel disentangled and contrastive GAN-based framework, named MouseGAN++, to synthesize multiple MR modalities from single ones in a structure-preserving manner, thus improving the segmentation performance by imputing missing modalities and multi-modality fusion. Our results demonstrate that the translation performance of our method outperforms the state-of-the-art methods. Using the subsequently learned modality-invariant information as well as the modality-translated images, MouseGAN++ can segment fine brain structures with averaged dice coefficients of 90.0% (T2w) and 87.9% (T1w), respectively, achieving around +10% performance improvement compared to the state-of-the-art algorithms. Our results demonstrate that MouseGAN++, as a simultaneous image synthesis and segmentation method, can be used to fuse cross-modality information in an unpaired manner and yield more robust performance in the absence of multimodal data. We release our method as a mouse brain structural segmentation tool for free academic usage at https://github.com/yu02019.
translated by 谷歌翻译
大多数现代脸部完成方法采用AutoEncoder或其变体来恢复面部图像中缺失的区域。编码器通常用于学习强大的表现,在满足复杂的学习任务的挑战方面发挥着重要作用。具体地,各种掩模通常在野外的面部图像中呈现,形成复杂的图案,特别是在Covid-19的艰难时期。编码器很难在这种复杂的情况下捕捉如此强大的陈述。为了解决这一挑战,我们提出了一个自我监督的暹罗推论网络,以改善编码器的泛化和鲁棒性。它可以从全分辨率图像编码上下文语义并获得更多辨别性表示。为了处理面部图像的几何变型,将密集的对应字段集成到网络中。我们进一步提出了一种具有新型双重关注融合模块(DAF)的多尺度解码器,其可以以自适应方式将恢复和已知区域组合。这种多尺度架构有利于解码器利用从编码器学习到图像中的辨别性表示。广泛的实验清楚地表明,与最先进的方法相比,拟议的方法不仅可以实现更具吸引力的结果,而且还提高了蒙面的面部识别的性能。
translated by 谷歌翻译
无教师的在线知识蒸馏(KD)旨在培训多个学生模型的合奏,并彼此提炼知识。尽管现有的在线KD方法实现了理想的性能,但它们通常专注于阶级概率作为核心知识类型,而忽略了宝贵的特征代表性信息。我们为在线KD提供了一个相互的对比学习(MCL)框架。 MCL的核心思想是以在线方式进行对比分布的相互交互和对比度分布的转移。我们的MCL可以汇总跨网络嵌入信息,并最大化两个网络之间的相互信息的下限。这使每个网络能够从他人那里学习额外的对比知识,从而提供更好的特征表示形式,从而提高视觉识别任务的性能。除最后一层外,我们还将MCL扩展到辅助特征细化模块辅助的几个中间层。这进一步增强了在线KD的表示能力。关于图像分类和转移学习到视觉识别任务的实验表明,MCL可以针对最新的在线KD方法带来一致的性能提高。优势表明,MCL可以指导网络生成更好的特征表示。我们的代码可在https://github.com/winycg/mcl上公开获取。
translated by 谷歌翻译
我们从一组未配对的清晰和朦胧的图像中提供了实用的基于学习的图像飞行网络。本文提供了一种新的观点,可以将图像除去作为两类分离的因子分离任务,即清晰图像重建的任务相关因素以及与雾霾相关的分布的任务含量。为了在深度特征空间中实现这两类因素的分离,将对比度学习引入了一个自行车框架中,以通过指导与潜在因素相关的生成的图像来学习分离的表示形式。通过这种表述,提出的对比度拆除的脱掩护方法(CDD-GAN)采用负面发电机与编码器网络合作以交替进行更新,以产生挑战性负面对手的队列。然后,这些负面的对手是端到端训练的,以及骨干代表网络,以通过最大化对抗性对比损失来增强歧视性信息并促进因素分离性能。在培训期间,我们进一步表明,硬性负面例子可以抑制任务 - 无关紧要的因素和未配对的清晰景象可以增强与任务相关的因素,以便更好地促进雾霾去除并帮助图像恢复。对合成和现实世界数据集的广泛实验表明,我们的方法对现有的未配对飞行基线的表现良好。
translated by 谷歌翻译
This work focuses on unsupervised representation learning in person re-identification (ReID). Recent self-supervised contrastive learning methods learn invariance by maximizing the representation similarity between two augmented views of a same image. However, traditional data augmentation may bring to the fore undesirable distortions on identity features, which is not always favorable in id-sensitive ReID tasks. In this paper, we propose to replace traditional data augmentation with a generative adversarial network (GAN) that is targeted to generate augmented views for contrastive learning. A 3D mesh guided person image generator is proposed to disentangle a person image into id-related and id-unrelated features. Deviating from previous GAN-based ReID methods that only work in id-unrelated space (pose and camera style), we conduct GAN-based augmentation on both id-unrelated and id-related features. We further propose specific contrastive losses to help our network learn invariance from id-unrelated and id-related augmentations. By jointly training the generative and the contrastive modules, our method achieves new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks.
translated by 谷歌翻译