气候变化正在增加有害藻华(HAB)的频率和严重程度,这些藻类在水产养殖场中造成大量鱼类死亡。这有助于海洋污染和温室气体(GHG)的排放,因为死鱼要么被倾倒到海洋中,要么被带到垃圾填埋场,进而对气候产生负面影响。当前,列举有害藻类和其他浮游植物的标准方法是在显微镜下手动观察并对其进行计数。这是一个耗时,乏味且容易出错的过程,导致农民的管理决定妥协。因此,自动化此过程以进行快速准确的HAB监控非常有帮助。但是,这需要大量且多样化的浮游植物图像数据集,并且这些数据集很难快速生产。在这项工作中,我们探讨了产生新型高分辨率的光真逼真的合成浮游植物图像的可行性,这些图像包含相同图像中的多个物种,并且给定了一小部分真实图像。为此,我们采用生成的对抗网络(GAN)来生成合成图像。我们使用标准图像质量指标评估了三种不同的GAN架构:ProjectedGan,Fastgan和styleganv2。我们从经验上显示了仅使用961个真实图像的训练数据集的高保真合成浮游植物图像的产生。因此,这项工作证明了甘斯从小型培训数据集中创建大型浮游植物的大型合成数据集的能力,从而朝着可持续的系统监测有害藻类绽放迈出了关键的一步。
translated by 谷歌翻译
Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of data diversity, or uneven data distributions. In this paper, we present a method to generate brain tumor MRI images using generative adversarial networks. We have utilized StyleGAN2 with ADA methodology to generate high-quality brain MRI with tumors while using a significantly smaller amount of training data when compared to the existing approaches. We use three pre-trained models for transfer learning. Results demonstrate that the proposed method can learn the distributions of brain tumors. Furthermore, the model can generate high-quality synthetic brain MRI with a tumor that can limit the small sample size issues. The approach can addresses the limited data availability by generating realistic-looking brain MRI with tumors. The code is available at: ~\url{https://github.com/rizwanqureshi123/Brain-Tumor-Synthetic-Data}.
translated by 谷歌翻译
生成的对抗网络(GANS)产生高质量的图像,但致力于训练。它们需要仔细正常化,大量计算和昂贵的超参数扫描。我们通过将生成和真实样本投影到固定的预级特征空间中,在这些问题上进行了重要的头路。发现鉴别者无法充分利用来自预押模型的更深层次的特征,我们提出了更有效的策略,可以在渠道和分辨率中混合特征。我们预计的GaN提高了图像质量,样品效率和收敛速度。它与最多一个百万像素的分辨率进一步兼容,并在二十二个基准数据集上推进最先进的FR \'Echet Inception距离(FID)。重要的是,预计的GAN符合先前最低的FID速度快40倍,鉴于相同的计算资源,将壁钟时间从5天切割到不到3小时。
translated by 谷歌翻译
组织病理学分析是对癌前病变诊断的本金标准。从数字图像自动组织病理学分类的目标需要监督培训,这需要大量的专家注释,这可能是昂贵且耗时的收集。同时,精确分类从全幻灯片裁剪的图像斑块对于基于标准滑动窗口的组织病理学幻灯片分类方法是必不可少的。为了减轻这些问题,我们提出了一个精心设计的条件GaN模型,即hostogan,用于在类标签上合成现实组织病理学图像补丁。我们还研究了一种新颖的合成增强框架,可选择地添加由我们提出的HADOGAN生成的新的合成图像补丁,而不是直接扩展与合成图像的训练集。通过基于其指定标签的置信度和实际标记图像的特征相似性选择合成图像,我们的框架为合成增强提供了质量保证。我们的模型在两个数据集上进行评估:具有有限注释的宫颈组织病理学图像数据集,以及具有转移性癌症的淋巴结组织病理学图像的另一个数据集。在这里,我们表明利用具有选择性增强的组织产生的图像导致对宫颈组织病理学和转移性癌症数据集分别的分类性能(分别为6.7%和2.8%)的显着和一致性。
translated by 谷歌翻译
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 1024 2 . We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CELEBA dataset.
translated by 谷歌翻译
生成对抗网络(GAN)是现实图像合成的最新生成模型之一。虽然培训和评估GAN变得越来越重要,但当前的GAN研究生态系统并未提供可靠的基准,以始终如一地进行评估。此外,由于GAN实施很少,因此研究人员将大量时间用于重现基线。我们研究了GAN方法的分类法,并提出了一个名为Studiogan的新开源库。 Studiogan支持7种GAN体系结构,9种调理方法,4种对抗损失,13个正则化模块,3个可区分的增强,7个评估指标和5个评估骨干。通过我们的培训和评估协议,我们使用各种数据集(CIFAR10,ImageNet,AFHQV2,FFHQ和Baby/Papa/Granpa-Imagenet)和3个不同的评估骨干(InceptionV3,Swav,Swav和Swin Transformer)提出了大规模的基准。与GAN社区中使用的其他基准不同,我们在统一的培训管道中培训了包括Biggan,stylegan2和stylegan3在内的代表GAN,并使用7个评估指标量化了生成性能。基准测试评估其他尖端生成模型(例如,stylegan-xl,adm,maskgit和rq-transformer)。 Studiogan提供了预先训练的权重的GAN实现,培训和评估脚本。 Studiogan可从https://github.com/postech-cvlab/pytorch-studiogan获得。
translated by 谷歌翻译
It is well known that the performance of any classification model is effective if the dataset used for the training process and the test process satisfy some specific requirements. In other words, the more the dataset size is large, balanced, and representative, the more one can trust the proposed model's effectiveness and, consequently, the obtained results. Unfortunately, large-size anonymous datasets are generally not publicly available in biomedical applications, especially those dealing with pathological human face images. This concern makes using deep-learning-based approaches challenging to deploy and difficult to reproduce or verify some published results. In this paper, we suggest an efficient method to generate a realistic anonymous synthetic dataset of human faces with the attributes of acne disorders corresponding to three levels of severity (i.e. Mild, Moderate and Severe). Therefore, a specific hierarchy StyleGAN-based algorithm trained at distinct levels is considered. To evaluate the performance of the proposed scheme, we consider a CNN-based classification system, trained using the generated synthetic acneic face images and tested using authentic face images. Consequently, we show that an accuracy of 97,6\% is achieved using InceptionResNetv2. As a result, this work allows the scientific community to employ the generated synthetic dataset for any data processing application without restrictions on legal or ethical concerns. Moreover, this approach can also be extended to other applications requiring the generation of synthetic medical images. We can make the code and the generated dataset accessible for the scientific community.
translated by 谷歌翻译
The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method. Finally, we extend our metric to estimate the perceptual quality of individual samples, and use this to study latent space interpolations.
translated by 谷歌翻译
生成的对抗网络(GANS)最近引入了执行图像到图像翻译的有效方法。这些模型可以应用于图像到图像到图像转换中的各种域而不改变任何参数。在本文中,我们调查并分析了八个图像到图像生成的对策网络:PIX2PX,Cyclegan,Cogan,Stargan,Munit,Stargan2,Da-Gan,以及自我关注GaN。这些模型中的每一个都呈现了最先进的结果,并引入了构建图像到图像的新技术。除了对模型的调查外,我们还调查了他们接受培训的18个数据集,并在其上进行了评估的9个指标。最后,我们在常见的一组指标和数据集中呈现6种这些模型的受控实验的结果。结果混合并显示,在某些数据集,任务和指标上,某些型号优于其他型号。本文的最后一部分讨论了这些结果并建立了未来研究领域。由于研究人员继续创新新的图像到图像GAN,因此他们非常重要地了解现有方法,数据集和指标。本文提供了全面的概述和讨论,以帮助构建此基础。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
组织病理学图像合成的现有深网无法为聚类核生成准确的边界,并且无法输出与不同器官一致的图像样式。为了解决这些问题,我们提出了一种样式引导的实例自适应标准化(SIAN),以合成不同器官的逼真的颜色分布和纹理。 Sian包含四个阶段:语义,风格化,实例化和调制。这四个阶段共同起作用,并集成到生成网络中,以嵌入图像语义,样式和实例级级边界。实验结果证明了所有组件在Sian中的有效性,并表明所提出的方法比使用Frechet Inception Inception距离(FID),结构相似性指数(SSIM),检测质量胜过组织病理学图像合成的最新条件gan。 (DQ),分割质量(SQ)和圆锥体质量(PQ)。此外,通过合并使用Sian产生的合成图像,可以显着改善分割网络的性能。
translated by 谷歌翻译
Generating new fonts is a time-consuming and labor-intensive, especially in a language with a huge amount of characters like Chinese. Various deep learning models have demonstrated the ability to efficiently generate new fonts with a few reference characters of that style. This project aims to develop a few-shot cross-lingual font generator based on AGIS-Net and improve the performance metrics mentioned. Our approaches include redesigning the encoder and the loss function. We will validate our method on multiple languages and datasets mentioned.
translated by 谷歌翻译
为了稳定地训练生成对抗网络(GAN),将实例噪声注入歧视器的输入中被认为是理论上的声音解决方案,但是,在实践中尚未实现其承诺。本文介绍了采用高斯混合物分布的扩散 - 在正向扩散链的所有扩散步骤中定义,以注入实例噪声。从观察到或生成的数据扩散的混合物中的随机样品被作为歧视器的输入。通过将其梯度通过前向扩散链进行反向传播来更新,该链的长度可自适应地调节以控制每个训练步骤允许的最大噪声与数据比率。理论分析验证了所提出的扩散gan的声音,该扩散器提供了模型和域 - 不可分割的可区分增强。在各种数据集上进行的一系列实验表明,扩散 - GAN可以提供稳定且具有数据效率的GAN训练,从而使对强GAN基准的性能保持一致,以综合构成照片现实的图像。
translated by 谷歌翻译
深尾学习旨在培训有用的深层网络,以实用现实世界中的不平衡分布,其中大多数尾巴类别的标签都与一些样本相关联。有大量的工作来训练判别模型,以进行长尾分布的视觉识别。相比之下,我们旨在训练有条件的生成对抗网络,这是一类长尾分布的图像生成模型。我们发现,类似于识别图像产生的最新方法类似,也遭受了尾部类别的性能降解。性能降解主要是由于尾部类别的类别模式塌陷,我们观察到与调节参数矩阵的光谱爆炸相关。我们提出了一种新型的组光谱正规剂(GSR),以防止光谱爆炸减轻模式崩溃,从而导致尾巴类别的形象产生多样化和合理的图像产生。我们发现GSR有效地与现有的增强和正则化技术结合在一起,从而导致长尾数据上的最新图像生成性能。广泛的实验证明了我们的常规器在不同程度不平衡的长尾数据集上的功效。
translated by 谷歌翻译
生成对抗网络(GAN)具有许多潜在的医学成像应用,包括数据扩展,域适应和模型解释。由于图形处理单元(GPU)的记忆力有限,因此在低分辨率的医学图像上对当前的3D GAN模型进行了训练,因此这些模型要么无法扩展到高分辨率,要么容易出现斑驳的人工制品。在这项工作中,我们提出了一种新颖的端到端GAN体系结构,可以生成高分辨率3D图像。我们通过使用训练和推理之间的不同配置来实现这一目标。在训练过程中,我们采用了层次结构,该结构同时生成图像的低分辨率版本和高分辨率图像的随机选择子量。层次设计具有两个优点:首先,对高分辨率图像训练的记忆需求在子量之间摊销。此外,将高分辨率子体积固定在单个低分辨率图像上可确保子量化之间的解剖一致性。在推断期间,我们的模型可以直接生成完整的高分辨率图像。我们还将具有类似层次结构的编码器纳入模型中,以从图像中提取特征。 3D胸CT和脑MRI的实验表明,我们的方法在图像生成中的表现优于最新技术。我们还证明了所提出的模型在数据增强和临床相关特征提取中的临床应用。
translated by 谷歌翻译
从文本描述中综合现实图像是计算机视觉中的主要挑战。当前对图像合成方法的文本缺乏产生代表文本描述符的高分辨率图像。大多数现有的研究都依赖于生成的对抗网络(GAN)或变异自动编码器(VAE)。甘斯具有产生更清晰的图像的能力,但缺乏输出的多样性,而VAE擅长生产各种输出,但是产生的图像通常是模糊的。考虑到gan和vaes的相对优势,我们提出了一个新的有条件VAE(CVAE)和条件gan(CGAN)网络架构,用于合成以文本描述为条件的图像。这项研究使用条件VAE作为初始发电机来生成文本描述符的高级草图。这款来自第一阶段的高级草图输出和文本描述符被用作条件GAN网络的输入。第二阶段GAN产生256x256高分辨率图像。所提出的体系结构受益于条件加强和有条件的GAN网络的残留块,以实现结果。使用CUB和Oxford-102数据集进行了多个实验,并将所提出方法的结果与Stackgan等最新技术进行了比较。实验表明,所提出的方法生成了以文本描述为条件的高分辨率图像,并使用两个数据集基于Inception和Frechet Inception评分产生竞争结果
translated by 谷歌翻译
生成的对抗网络(GANS)能够生成从真实图像视觉无法区分的图像。然而,最近的研究表明,生成和实际图像在频域中共享显着差异。在本文中,我们探讨了高频分量在GAN训练中的影响。根据我们的观察,在大多数GAN的培训期间,严重的高频差异使鉴别器聚焦在过度高频成分上,阻碍了发电机拟合了对学习图像内容很重要的低频分量。然后,我们提出了两个简单但有效的频率操作,以消除由GAN训练的高频差异引起的副作用:高频混淆(HFC)和高频滤波器(HFF)。拟议的操作是一般的,可以应用于大多数现有的GAN,一小部分成本。在多丢失函数,网络架构和数据集中验证了所提出的操作的高级性能。具体而言,拟议的HFF在Celeba(128 * 128)基于SSNGAN的Celeba无条件生成的Celeba(128 * 128)无条件一代,在Celeba无条件一代基于SSGAN的13.2 \%$ 30.2 \%$ 69.3 \%$ 69.3 \%$ FID在Celeba无条件一代基于Infomaxgan。
translated by 谷歌翻译
图像到图像翻译(I2I)是一个充满挑战的计算机视觉问题,用于多个任务的众多域。最近,眼科成为I2i的应用迅速增加的主要领域之一。一种这样的应用是合成视网膜光学相干断层(OCT)扫描的产生。现有的I2I方法需要培训多种模型,将图像从正常扫描转换为特定病理学:限制由于它们的复杂性而对这些模型的使用。要解决此问题,我们提出了一个无监督的多域I2I网络,具有预先培训的样式编码器,可将一个域中的视网膜OCT图像转换为多个域。我们假设图像分裂到域不变内容和域特定的样式代码,并预先培训这些样式代码。所执行的实验表明,所提出的模型优于Munit和Cyclangan合成不同的病理扫描等最先进的模型。
translated by 谷歌翻译
数字艺术与非娱乐令牌(NFTS)的出现获得了前所未有的普及程度。NFT是存储在区块链网络上的加密资产,并表示无法伪造的数字所有权证书。NFT可以纳入智能合同,允许所有者从未来的销售百分比中受益。虽然数字艺术生产者可以用NFTs造福非常受益,但它们的生产是耗时的。因此,本文探讨了使用生成的对抗性网络(GANS)来自动生成数字艺术的可能性。GAN是深度学习架构,广泛而有效地用于综合音频,图像和视频内容。然而,他们对NFT艺术的应用受到限制。在本文中,对数字艺术生成实施和评估了基于GAN的架构。定性案例研究的结果表明所生成的艺术品与真实样品相当。
translated by 谷歌翻译
The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we propose Medfusion, a conditional latent DDPM for medical images. We compare our DDPM-based model against GAN-based models, which constitute the current state-of-the-art in the medical domain. Medfusion was trained and compared with (i) StyleGan-3 on n=101,442 images from the AIROGS challenge dataset to generate fundoscopies with and without glaucoma, (ii) ProGAN on n=191,027 from the CheXpert dataset to generate radiographs with and without cardiomegaly and (iii) wGAN on n=19,557 images from the CRCMS dataset to generate histopathological images with and without microsatellite stability. In the AIROGS, CRMCS, and CheXpert datasets, Medfusion achieved lower (=better) FID than the GANs (11.63 versus 20.43, 30.03 versus 49.26, and 17.28 versus 84.31). Also, fidelity (precision) and diversity (recall) were higher (=better) for Medfusion in all three datasets. Our study shows that DDPM are a superior alternative to GANs for image synthesis in the medical domain.
translated by 谷歌翻译