图像到图像翻译(I2I)是一个充满挑战的计算机视觉问题,用于多个任务的众多域。最近,眼科成为I2i的应用迅速增加的主要领域之一。一种这样的应用是合成视网膜光学相干断层(OCT)扫描的产生。现有的I2I方法需要培训多种模型,将图像从正常扫描转换为特定病理学:限制由于它们的复杂性而对这些模型的使用。要解决此问题,我们提出了一个无监督的多域I2I网络,具有预先培训的样式编码器,可将一个域中的视网膜OCT图像转换为多个域。我们假设图像分裂到域不变内容和域特定的样式代码,并预先培训这些样式代码。所执行的实验表明,所提出的模型优于Munit和Cyclangan合成不同的病理扫描等最先进的模型。
translated by 谷歌翻译
生成的对抗网络(GANS)最近引入了执行图像到图像翻译的有效方法。这些模型可以应用于图像到图像到图像转换中的各种域而不改变任何参数。在本文中,我们调查并分析了八个图像到图像生成的对策网络:PIX2PX,Cyclegan,Cogan,Stargan,Munit,Stargan2,Da-Gan,以及自我关注GaN。这些模型中的每一个都呈现了最先进的结果,并引入了构建图像到图像的新技术。除了对模型的调查外,我们还调查了他们接受培训的18个数据集,并在其上进行了评估的9个指标。最后,我们在常见的一组指标和数据集中呈现6种这些模型的受控实验的结果。结果混合并显示,在某些数据集,任务和指标上,某些型号优于其他型号。本文的最后一部分讨论了这些结果并建立了未来研究领域。由于研究人员继续创新新的图像到图像GAN,因此他们非常重要地了解现有方法,数据集和指标。本文提供了全面的概述和讨论,以帮助构建此基础。
translated by 谷歌翻译
尽管具有生成对抗网络(GAN)的图像到图像(I2I)翻译的显着进步,但使用单对生成器和歧视器将图像有效地转换为多个目标域中的一组不同图像仍然具有挑战性。现有的I2i翻译方法采用多个针对不同域的特定于域的内容编码,其中每个特定于域的内容编码器仅经过来自同一域的图像的训练。然而,我们认为应从所有域之间的图像中学到内容(域变相)特征。因此,现有方案的每个特定于域的内容编码器都无法有效提取域不变特征。为了解决这个问题,我们提出了一个灵活而通用的Sologan模型,用于在多个域之间具有未配对数据的多模式I2I翻译。与现有方法相反,Solgan算法使用具有附加辅助分类器的单个投影鉴别器,并为所有域共享编码器和生成器。因此,可以使用来自所有域的图像有效地训练Solgan,从而可以有效提取域 - 不变性内容表示。在多个数据集中,针对多个同行和sologan的变体的定性和定量结果证明了该方法的优点,尤其是对于挑战i2i翻译数据集的挑战,即涉及极端形状变化的数据集或在翻译后保持复杂的背景,需要保持复杂的背景。此外,我们通过消融研究证明了Sogan中每个成分的贡献。
translated by 谷歌翻译
交换自动编码器在深层图像操纵和图像到图像翻译中实现了最先进的性能。我们通过基于梯度逆转层引入简单而有效的辅助模块来改善这项工作。辅助模块的损失迫使发电机学会使用全零纹理代码重建图像,从而鼓励结构和纹理信息之间更好地分解。提出的基于属性的转移方法可以在样式传输中进行精致的控制,同时在不使用语义掩码的情况下保留结构信息。为了操纵图像,我们将对象的几何形状和输入图像的一般样式编码为两个潜在代码,并具有实施结构一致性的附加约束。此外,由于辅助损失,训练时间大大减少。提出的模型的优越性在复杂的域中得到了证明,例如已知最先进的卫星图像。最后,我们表明我们的模型改善了广泛的数据集的质量指标,同时通过多模式图像生成技术实现了可比的结果。
translated by 谷歌翻译
Figure 1. Multi-domain image-to-image translation results on the CelebA dataset via transferring knowledge learned from the RaFD dataset. The first and sixth columns show input images while the remaining columns are images generated by StarGAN. Note that the images are generated by a single generator network, and facial expression labels such as angry, happy, and fearful are from RaFD, not CelebA.
translated by 谷歌翻译
生成的对抗网络(GANS)已经促进了解决图像到图像转换问题的新方向。不同的GANS在目标函数中使用具有不同损耗的发电机和鉴别器网络。仍然存在差距来填补所生成的图像的质量并靠近地面真理图像。在这项工作中,我们介绍了一个名为循环辨别生成的对抗网络(CDGAN)的新的图像到图像转换网络,填补了上述空白。除了加速本的原始架构之外,所提出的CDGAN通过结合循环图像的附加鉴别器网络来产生高质量和更现实的图像。所提出的CDGAN在三个图像到图像转换数据集上进行测试。分析了定量和定性结果,并与最先进的方法进行了比较。在三个基线图像到图像转换数据集中,所提出的CDGAN方法优于最先进的方法。该代码可在https://github.com/kishankancharagunta/cdgan获得。
translated by 谷歌翻译
Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of data diversity, or uneven data distributions. In this paper, we present a method to generate brain tumor MRI images using generative adversarial networks. We have utilized StyleGAN2 with ADA methodology to generate high-quality brain MRI with tumors while using a significantly smaller amount of training data when compared to the existing approaches. We use three pre-trained models for transfer learning. Results demonstrate that the proposed method can learn the distributions of brain tumors. Furthermore, the model can generate high-quality synthetic brain MRI with a tumor that can limit the small sample size issues. The approach can addresses the limited data availability by generating realistic-looking brain MRI with tumors. The code is available at: ~\url{https://github.com/rizwanqureshi123/Brain-Tumor-Synthetic-Data}.
translated by 谷歌翻译
Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.
translated by 谷歌翻译
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available. To facilitate GAN training, current methods propose to use data-specific augmentation techniques. Despite the effectiveness, it is difficult for these methods to scale to practical applications. In this work, we present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks. We first produce augmented samples using the convex combinations of the real samples. Then, we optimize the augmented samples by minimizing the norms of the data scores, i.e., the gradients of the log-density functions. This procedure enforces the augmented samples close to the data manifold. To estimate the scores, we train a deep estimation network with multi-scale score matching. For different image synthesis tasks, we train the score estimation network using different data. We do not require the tuning of the hyperparameters or modifications to the network architecture. The ScoreMix method effectively increases the diversity of data and reduces the overfitting problem. Moreover, it can be easily incorporated into existing GAN models with minor modifications. Experimental results on numerous tasks demonstrate that GAN models equipped with the ScoreMix method achieve significant improvements.
translated by 谷歌翻译
生成的对抗网络(GAN)已受过培训,成为能够创作出令人惊叹的艺术品(例如面部生成和图像样式转移)的专业艺术家。在本文中,我们专注于现实的业务方案:具有所需的移动应用程序和主题样式的可自定义图标的自动生成。我们首先引入一个主题应用图标数据集,称为Appicon,每个图标都有两个正交主题和应用标签。通过研究强大的基线样式,我们观察到由正交标签的纠缠引起的模式崩溃。为了解决这一挑战,我们提出了由有条件的发电机和双重歧视器组成的ICONGAN,具有正交扩大,并且进一步设计了对比的特征分离策略,以使两个歧视器的特征空间正常。与其他方法相比,ICONGAN在Appicon基准测试中表明了优势。进一步的分析还证明了解开应用程序和主题表示的有效性。我们的项目将在以下网址发布:https://github.com/architect-road/icongan。
translated by 谷歌翻译
组织病理学图像合成的现有深网无法为聚类核生成准确的边界,并且无法输出与不同器官一致的图像样式。为了解决这些问题,我们提出了一种样式引导的实例自适应标准化(SIAN),以合成不同器官的逼真的颜色分布和纹理。 Sian包含四个阶段:语义,风格化,实例化和调制。这四个阶段共同起作用,并集成到生成网络中,以嵌入图像语义,样式和实例级级边界。实验结果证明了所有组件在Sian中的有效性,并表明所提出的方法比使用Frechet Inception Inception距离(FID),结构相似性指数(SSIM),检测质量胜过组织病理学图像合成的最新条件gan。 (DQ),分割质量(SQ)和圆锥体质量(PQ)。此外,通过合并使用Sian产生的合成图像,可以显着改善分割网络的性能。
translated by 谷歌翻译
This work proposes a framework developed to generalize Critical Heat Flux (CHF) detection classification models using an Unsupervised Image-to-Image (UI2I) translation model. The framework enables a typical classification model that was trained and tested on boiling images from domain A to predict boiling images coming from domain B that was never seen by the classification model. This is done by using the UI2I model to transform the domain B images to look like domain A images that the classification model is familiar with. Although CNN was used as the classification model and Fixed-Point GAN (FP-GAN) was used as the UI2I model, the framework is model agnostic. Meaning, that the framework can generalize any image classification model type, making it applicable to a variety of similar applications and not limited to the boiling crisis detection problem. It also means that the more the UI2I models advance, the better the performance of the framework.
translated by 谷歌翻译
Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at https://github.com/dariant/BiOcularGAN.
translated by 谷歌翻译
Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
translated by 谷歌翻译
组织病理学依赖于微观组织图像的分析来诊断疾病。组织制备的关键部分正在染色,从而使染料用于使显着的组织成分更具区分。但是,实验室协议和扫描设备的差异导致相应图像的显着混淆外观变化。这种变异增加了人类错误和评估者间的变异性,并阻碍了自动或半自动方法的性能。在本文中,我们引入了一个无监督的对抗网络,以在多个数据采集域中翻译(因此使)整个幻灯片图像。我们的关键贡献是:(i)一种对抗性体系结构,该架构使用信息流分支通过单个发电机 - 歧视器网络在多个域中学习,该信息流分支优化可感知损失,以及(ii)在培训过程中包含一个附加功能提取网络,以指导指导指导的额外功能提取网络。转换网络以保持组织图像中的所有结构特征完整。我们:(i)首先证明了提出的方法对120例肾癌的H \&e幻灯片的有效性,以及(ii)显示了该方法对更一般问题的好处,例如基于灵活照明的自然图像增强功能和光源适应。
translated by 谷歌翻译
最近,无监督的人重新识别(RE-ID)引起了人们的关注,因为其开放世界情景设置有限,可用的带注释的数据有限。现有的监督方法通常无法很好地概括在看不见的域上,而无监督的方法(大多数缺乏多范围的信息),并且容易患有确认偏见。在本文中,我们旨在从两个方面从看不见的目标域上找到更好的特征表示形式,1)在标记的源域上进行无监督的域适应性和2)2)在未标记的目标域上挖掘潜在的相似性。此外,提出了一种协作伪标记策略,以减轻确认偏见的影响。首先,使用生成对抗网络将图像从源域转移到目标域。此外,引入了人身份和身份映射损失,以提高生成图像的质量。其次,我们提出了一个新颖的协作多元特征聚类框架(CMFC),以学习目标域的内部数据结构,包括全局特征和部分特征分支。全球特征分支(GB)在人体图像的全球特征上采用了无监督的聚类,而部分特征分支(PB)矿山在不同人体区域内的相似性。最后,在两个基准数据集上进行的广泛实验表明,在无监督的人重新设置下,我们的方法的竞争性能。
translated by 谷歌翻译
这项工作旨在将在一个图像域上预先训练的生成的对抗网络(GaN)转移到新域名,其仅仅是只有一个目标图像。主要挑战是,在有限的监督下,综合照片现实和高度多样化的图像非常困难,同时获取目标的代表性。不同于采用Vanilla微调策略的现有方法,我们分别将两个轻量级模块导入发电机和鉴别器。具体地,我们将属性适配器引入发电机中冻结其原始参数,通过该参数,它可以通过其重复利用现有知识,因此保持合成质量和多样性。然后,我们用一个属性分类器装备了学习良好的鉴别器骨干,以确保生成器从引用中捕获相应的字符。此外,考虑到培训数据的多样性差(即,只有一个图像),我们建议在培训过程中建议在生成域中的多样性限制,减轻优化难度。我们的方法在各种环境下提出了吸引力的结果,基本上超越了最先进的替代方案,特别是在合成多样性方面。明显的是,我们的方法即使具有大域间隙,并且在几分钟内为每个实验提供鲁棒地收敛。
translated by 谷歌翻译