图像deBlurring是一种对给定输入图像的多种合理的解决方案是一个不适的问题。然而,大多数现有方法产生了清洁图像的确定性估计,并且训练以最小化像素级失真。已知这些指标与人类感知差,并且通常导致不切实际的重建。我们基于条件扩散模型介绍了盲脱模的替代框架。与现有技术不同,我们训练一个随机采样器,它改进了确定性预测器的输出,并且能够为给定输入产生多样化的合理重建。这导致跨多个标准基准的现有最先进方法的感知质量的显着提高。与典型的扩散模型相比,我们的预测和精致方法也能实现更有效的采样。结合仔细调整的网络架构和推理过程,我们的方法在PSNR等失真度量方面具有竞争力。这些结果表明了我们基于扩散和挑战的扩散和挑战的策略的显着优势,生产单一确定性重建的广泛使用策略。
translated by 谷歌翻译
Diffusion Probabilistic Models (DPMs) have recently been employed for image deblurring. DPMs are trained via a stochastic denoising process that maps Gaussian noise to the high-quality image, conditioned on the concatenated blurry input. Despite their high-quality generated samples, image-conditioned Diffusion Probabilistic Models (icDPM) rely on synthetic pairwise training data (in-domain), with potentially unclear robustness towards real-world unseen images (out-of-domain). In this work, we investigate the generalization ability of icDPMs in deblurring, and propose a simple but effective guidance to significantly alleviate artifacts, and improve the out-of-distribution performance. Particularly, we propose to first extract a multiscale domain-generalizable representation from the input image that removes domain-specific information while preserving the underlying image structure. The representation is then added into the feature maps of the conditional diffusion model as an extra guidance that helps improving the generalization. To benchmark, we focus on out-of-distribution performance by applying a single-dataset trained model to three external and diverse test sets. The effectiveness of the proposed formulation is demonstrated by improvements over the standard icDPM, as well as state-of-the-art performance on perceptual quality and competitive distortion metrics compared to existing methods.
translated by 谷歌翻译
Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of natural images. The delicate conditions gradually enlarge the divergence during each sampling timestep. To address this issue, we introduce a new method that brings the predicted samples to the training data manifold using a pretrained unconditional diffusion model. The unconditional model acts as a regularizer and reduces the divergence introduced by the conditional model at each sampling step. We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks. The improvements obtained by our method suggest that the priors can be incorporated as a general plugin for improving conditional diffusion models.
translated by 谷歌翻译
尽管许多远程成像系统旨在支持扩展视力应用,但由于大气湍流,其操作的自然障碍是退化。大气湍流通过引入模糊和几何变形而导致图像质量的显着降解。近年来,在文献中提出了各种基于深度学习的单图像缓解方法,包括基于CNN的基于CNN和基于GAN的反转方法,这些方法试图消除图像中的失真。但是,其中一些方法很难训练,并且通常无法重建面部特征并产生不切实际的结果,尤其是在高湍流的情况下。降级扩散概率模型(DDPM)最近由于其稳定的训练过程和产生高质量图像的能力而获得了一些吸引力。在本文中,我们提出了第一个基于DDPM的解决方案,用于缓解大气湍流问题。我们还提出了一种快速采样技术,用于减少条件DDPM的推理时间。对合成和现实世界数据进行了广泛的实验,以显示我们模型的重要性。为了促进进一步的研究,在审查过程之后,所有代码和验证的模型都将公开。
translated by 谷歌翻译
While deep learning-based methods for blind face restoration have achieved unprecedented success, they still suffer from two major limitations. First, most of them deteriorate when facing complex degradations out of their training data. Second, these methods require multiple constraints, e.g., fidelity, perceptual, and adversarial losses, which require laborious hyper-parameter tuning to stabilize and balance their influences. In this work, we propose a novel method named DifFace that is capable of coping with unseen and complex degradations more gracefully without complicated loss designs. The key of our method is to establish a posterior distribution from the observed low-quality (LQ) image to its high-quality (HQ) counterpart. In particular, we design a transition distribution from the LQ image to the intermediate state of a pre-trained diffusion model and then gradually transmit from this intermediate state to the HQ target by recursively applying a pre-trained diffusion model. The transition distribution only relies on a restoration backbone that is trained with $L_2$ loss on some synthetic data, which favorably avoids the cumbersome training process in existing methods. Moreover, the transition distribution can contract the error of the restoration backbone and thus makes our method more robust to unknown degradations. Comprehensive experiments show that DifFace is superior to current state-of-the-art methods, especially in cases with severe degradations. Our code and model are available at https://github.com/zsyOAOA/DifFace.
translated by 谷歌翻译
在不利天气条件下的图像恢复对各种计算机视觉应用引起了重大兴趣。最近的成功方法取决于深度神经网络架构设计(例如,具有视觉变压器)的当前进展。由最新的条件生成模型取得的最新进展的动机,我们提出了一种基于贴片的图像恢复算法,基于脱氧扩散概率模型。我们的基于贴片的扩散建模方法可以通过使用指导的DeNoising过程进行尺寸 - 不足的图像恢复,并在推理过程中对重叠贴片进行平滑的噪声估计。我们在基准数据集上经验评估了我们的模型,以进行图像,混合的降低和飞行以及去除雨滴的去除。我们展示了我们在特定天气和多天气图像恢复上实现最先进的表演的方法,并在质量上表现出对现实世界测试图像的强烈概括。
translated by 谷歌翻译
扩散模型已显示出令人印象深刻的图像产生性能,并已用于各种计算机视觉任务。不幸的是,使用扩散模型的图像生成非常耗时,因为它需要数千个采样步骤。为了解决这个问题,我们在这里提出了一种新型的金字塔扩散模型,以使用训练有位置嵌入的单个分数函数从更粗的分辨率图像开始生成高分辨率图像。这使图像生成的时间效率抽样可以解决,并在资源有限的训练时也可以解决低批量的大小问题。此外,我们表明,使用单个分数函数可以有效地用于多尺度的超分辨率问题。
translated by 谷歌翻译
通过将图像形成过程分解成逐个申请的去噪自身额,扩散模型(DMS)实现了最先进的合成导致图像数据和超越。另外,它们的配方允许引导机构来控制图像生成过程而不会再刷新。然而,由于这些模型通常在像素空间中直接操作,因此强大的DMS的优化通常消耗数百个GPU天,并且由于顺序评估,推理是昂贵的。为了在保留其质量和灵活性的同时启用有限计算资源的DM培训,我们将它们应用于强大的佩带自动化器的潜在空间。与以前的工作相比,这种代表上的培训扩散模型允许第一次达到复杂性降低和细节保存之间的近乎最佳点,极大地提高了视觉保真度。通过将跨关注层引入模型架构中,我们将扩散模型转化为强大而柔性的发电机,以进行诸如文本或边界盒和高分辨率合成的通用调节输入,以卷积方式变得可以实现。我们的潜在扩散模型(LDMS)实现了一种新的技术状态,可在各种任务中进行图像修复和高竞争性能,包括无条件图像生成,语义场景合成和超级分辨率,同时与基于像素的DMS相比显着降低计算要求。代码可在https://github.com/compvis/lattent-diffusion获得。
translated by 谷歌翻译
机器学习模型通常培训端到端和监督设置,使用配对(输入,输出)数据。示例包括最近的超分辨率方法,用于在(低分辨率,高分辨率)图像上培训。然而,这些端到端的方法每当输入中存在分布偏移时需要重新训练(例如,夜间图像VS日光)或相关的潜在变量(例如,相机模糊或手动运动)。在这项工作中,我们利用最先进的(SOTA)生成模型(这里是Stylegan2)来构建强大的图像前提,这使得贝叶斯定理应用于许多下游重建任务。我们的方法是通过生成模型(BRGM)的贝叶斯重建,使用单个预先训练的发生器模型来解决不同的图像恢复任务,即超级分辨率和绘画,通过与不同的前向腐败模型相结合。我们将发电机模型的重量保持固定,并通过估计产生重建图像的输入潜在的跳过载体来重建图像来估计图像。我们进一步使用变分推理来近似潜伏向量的后部分布,我们对多种解决方案进行采样。我们在三个大型和多样化的数据集中展示了BRGM:(i)来自Flick的60,000个图像面向高质量的数据集(II)来自MIMIC III的高质量数据集(II)240,000胸X射线,(III)的组合收集5脑MRI数据集,具有7,329个扫描。在所有三个数据集和没有任何DataSet特定的HyperParameter调整,我们的简单方法会在超级分辨率和绘画上对当前的特定任务最先进的方法产生性能竞争力,同时更加稳定,而不需要任何培训。我们的源代码和预先训练的型号可在线获取:https://razvanmarinescu.github.io/brgm/。
translated by 谷歌翻译
扩散模型是一类新的生成模型,在依靠固体概率原理的同时,标志着高质量图像生成中的里程碑。这使他们成为神经图像压缩的有前途的候选模型。本文概述了基于有条件扩散模型的端到端优化框架。除了扩散过程固有的潜在变量外,该模型还引入了额外的“ content”潜在变量,以调节降解过程。解码后,扩散过程有条件地生成/重建祖先采样。我们的实验表明,这种方法的表现优于表现最佳的传统图像编解码器之一(BPG)和一个在两个压缩基准上的神经编解码器,我们将重点放在速率感知权衡方面。定性地,我们的方法显示出比经典方法更少的减压工件。
translated by 谷歌翻译
我们使用条件扩散模型介绍调色板,这是一种简单而一般的框架,可用于图像到图像到图像转换。在四个具有挑战性的图像到图像转换任务(着色,染色,un折叠和JPEG减压),调色板优于强大的GaN和回归基线,并建立了新的最新状态。这是在没有特定于任务特定的超参数调整,架构定制或任何辅助损耗的情况下实现的,展示了理想的一般性和灵活性。我们揭示了使用$ l_2 $与vs. $ l_1 $损失在样本多样性上的越来越多的影响,并通过经验架构研究表明自我关注的重要性。重要的是,我们倡导基于想象项目的统一评估协议,并报告包括预先训练的Reset-50的FID,成立得分,分类准确度的多个样本质量评分,以及针对各种基线的参考图像的感知距离。我们预计这一标准化评估协议在推进图像到图像翻译研究方面发挥着关键作用。最后,我们表明,在3个任务(着色,染色,JPEG减压)上培训的单个通用调色板模型也表现或优于特定于任务专家的专家对应物。
translated by 谷歌翻译
本文提出了图像恢复的新变异推理框架和一个卷积神经网络(CNN)结构,该结构可以解决所提出的框架所描述的恢复问题。较早的基于CNN的图像恢复方法主要集中在网络体系结构设计或培训策略上,具有非盲方案,其中已知或假定降解模型。为了更接近现实世界的应用程序,CNN还接受了整个数据集的盲目培训,包括各种降解。然而,给定有多样化的图像的高质量图像的条件分布太复杂了,无法通过单个CNN学习。因此,也有一些方法可以提供其他先验信息来培训CNN。与以前的方法不同,我们更多地专注于基于贝叶斯观点以及如何重新重新重构目标的恢复目标。具体而言,我们的方法放松了原始的后推理问题,以更好地管理子问题,因此表现得像分裂和互动方案。结果,与以前的框架相比,提出的框架提高了几个恢复问题的性能。具体而言,我们的方法在高斯denoising,现实世界中的降噪,盲图超级分辨率和JPEG压缩伪像减少方面提供了最先进的性能。
translated by 谷歌翻译
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.
translated by 谷歌翻译
Generative adversarial networks (GANs) have made great success in image inpainting yet still have difficulties tackling large missing regions. In contrast, iterative algorithms, such as autoregressive and denoising diffusion models, have to be deployed with massive computing resources for decent effect. To overcome the respective limitations, we present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image, largely enhancing the inference efficiency. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion. On multiple benchmarks, we achieve new state-of-the-art performance. Code is released at https://github.com/fenglinglwb/SDM.
translated by 谷歌翻译
深度MRI重建通常是使用有条件的模型进行的,该模型将其映射到完全采样的数据作为输出中。有条件的模型在加速成像运算符的知识下执行了脱氧,因此在操作员的域转移下,它们概括了很差。无条件模型是一种强大的替代方法,相反,它可以学习生成图像先验,以提高针对领域转移的可靠性。鉴于它们的高度代表性多样性和样本质量,最近的扩散模型特别有希望。然而,事先通过静态图像进行预测会导致次优性能。在这里,我们提出了一种基于适应性扩散的新型MRI重建Adadiff。为了启用有效的图像采样,引入了一个可以使用大扩散步骤的对抗映射器。使用受过训练的先验进行两阶段的重建:一个快速扩散阶段,产生初始重建阶段,以及一个适应阶段,其中更新扩散先验以最大程度地减少获得的K空间数据的重建损失。关于多对比的大脑MRI的演示清楚地表明,Adadiff在跨域任务中的竞争模型以及域内任务中的卓越或PAR性能方面取得了出色的性能。
translated by 谷歌翻译
DeNoising扩散模型代表了计算机视觉中最新的主题,在生成建模领域表现出了显着的结果。扩散模型是一个基于两个阶段的深层生成模型,一个正向扩散阶段和反向扩散阶段。在正向扩散阶段,通过添加高斯噪声,输入数据在几个步骤中逐渐受到干扰。在反向阶段,模型的任务是通过学习逐步逆转扩散过程来恢复原始输入数据。尽管已知的计算负担,即由于采样过程中涉及的步骤数量,扩散模型对生成样品的质量和多样性得到了广泛赞赏。在这项调查中,我们对视觉中应用的denoising扩散模型的文章进行了全面综述,包括该领域的理论和实际贡献。首先,我们识别并介绍了三个通用扩散建模框架,这些框架基于扩散概率模型,噪声调节得分网络和随机微分方程。我们进一步讨论了扩散模型与其他深层生成模型之间的关系,包括变异自动编码器,生成对抗网络,基于能量的模型,自回归模型和正常流量。然后,我们介绍了计算机视觉中应用的扩散模型的多角度分类。最后,我们说明了扩散模型的当前局限性,并设想了一些有趣的未来研究方向。
translated by 谷歌翻译
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256×256 and 3.85 on ImageNet 512×512. We release our code at https://github.com/openai/guided-diffusion.
translated by 谷歌翻译
Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).
translated by 谷歌翻译
Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples. We show that with a few simple modifications, DDPMs can also achieve competitive loglikelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code at https://github.com/ openai/improved-diffusion.
translated by 谷歌翻译
最近,Rissanen等人(2022年)提出了一种基于热量耗散或模糊的生成建模的新型扩散过程,作为各向同性高斯扩散的替代方法。在这里,我们表明,可以通过与非各向异性噪声的高斯扩散过程来等效地定义模糊。在建立这一联系时,我们弥合了反热量耗散和降解扩散之间的缝隙,并阐明了由于这种建模选择而导致的感应偏置。最后,我们提出了一类普遍的扩散模型,该模型既可以提供标准的高斯denoisis扩散和逆热散热,我们称之为模糊的扩散模型。
translated by 谷歌翻译