Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of natural images. The delicate conditions gradually enlarge the divergence during each sampling timestep. To address this issue, we introduce a new method that brings the predicted samples to the training data manifold using a pretrained unconditional diffusion model. The unconditional model acts as a regularizer and reduces the divergence introduced by the conditional model at each sampling step. We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks. The improvements obtained by our method suggest that the priors can be incorporated as a general plugin for improving conditional diffusion models.
translated by 谷歌翻译
Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html
translated by 谷歌翻译
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality.
translated by 谷歌翻译
Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors. ATR algorithms are extensively used in real-world scenarios such as military and surveillance applications. Existing ATR algorithms are developed for traditional closed-set methods where training and testing have the same class distribution. Thus, these algorithms have not been robust to unknown classes not seen during the training phase, limiting their utility in real-world applications. To this end, we propose an Open-set Automatic Target Recognition framework where we enable open-set recognition capability for ATR algorithms. In addition, we introduce a plugin Category-aware Binary Classifier (CBC) module to effectively tackle unknown classes seen during inference. The proposed CBC module can be easily integrated with any existing ATR algorithms and can be trained in an end-to-end manner. Experimental results show that the proposed approach outperforms many open-set methods on the DSIAC and CIFAR-10 datasets. To the best of our knowledge, this is the first work to address the open-set classification problem for ATR algorithms. Source code is available at: https://github.com/bardisafa/Open-set-ATR.
translated by 谷歌翻译
近年来,基于神经网络的深度恢复方法已实现了最先进的方法,从而导致了各种图像过度的任务。但是,基于深度学习的Deblurring网络的一个主要缺点是,训练需要大量模糊清洁图像对才能实现良好的性能。此外,当测试过程中的模糊图像和模糊内核与训练过程中使用的图像和模糊内核时,深层网络通常无法表现良好。这主要是因为网络参数在培训数据上过度拟合。在这项工作中,我们提出了一种解决这些问题的方法。我们将非盲图像脱毛问题视为一个脱氧问题。为此,我们在一对模糊图像上使用相应的模糊内核进行Wiener过滤。这导致一对具有彩色噪声的图像。因此,造成造成的问题被转化为一个降解问题。然后,我们在不使用明确的清洁目标图像的情况下解决了降解问题。进行了广泛的实验,以表明我们的方法取得了与最先进的非盲人脱毛作品相提并论的结果。
translated by 谷歌翻译
现代监视系统使用基于深度学习的面部验证网络执行人员认可。大多数最先进的面部验证系统都是使用可见光谱图像训练的。但是,在弱光和夜间条件的情况下,在可见光谱中获取图像是不切实际的,并且通常在诸如热红外域之类的替代域中捕获图像。在检索相应的可见域图像后,通常在热图像中进行面部验证。这是一个公认的问题,通常称为热能(T2V)图像翻译。在本文中,我们建议针对面部图像的T2V翻译基于Denoising扩散概率模型(DDPM)解决方案。在训练过程中,该模型通过扩散过程了解了它们相应的热图像,可见面部图像的条件分布。在推断过程中,可见的域图像是通过从高斯噪声开始并反复执行的。 DDPM的现有推理过程是随机且耗时的。因此,我们提出了一种新颖的推理策略,以加快DDPM的推理时间,特别是用于T2V图像翻译问题。我们在多个数据集上实现了最新结果。代码和验证的模型可在http://github.com/nithin-gk/t2v-ddpm上公开获得
translated by 谷歌翻译
尽管许多远程成像系统旨在支持扩展视力应用,但由于大气湍流,其操作的自然障碍是退化。大气湍流通过引入模糊和几何变形而导致图像质量的显着降解。近年来,在文献中提出了各种基于深度学习的单图像缓解方法,包括基于CNN的基于CNN和基于GAN的反转方法,这些方法试图消除图像中的失真。但是,其中一些方法很难训练,并且通常无法重建面部特征并产生不切实际的结果,尤其是在高湍流的情况下。降级扩散概率模型(DDPM)最近由于其稳定的训练过程和产生高质量图像的能力而获得了一些吸引力。在本文中,我们提出了第一个基于DDPM的解决方案,用于缓解大气湍流问题。我们还提出了一种快速采样技术,用于减少条件DDPM的推理时间。对合成和现实世界数据进行了广泛的实验,以显示我们模型的重要性。为了促进进一步的研究,在审查过程之后,所有代码和验证的模型都将公开。
translated by 谷歌翻译
单眼深度估计(MDE)由于其低成本和机器人任务的关键功能,例如定位,映射和障碍物检测而吸引了激烈的研究。经过深入学习的发展,监督的方法已取得了巨大的成功,但它们依靠大量的地面深度注释,这些深度昂贵。无监督的域适应性(UDA)将知识从标记的源数据转移到未标记的目标数据,以放大监督学习的约束。但是,由于域移位问题,现有的UDA方法可能无法完全跨不同数据集的域差距对齐。我们认为,可以通过精心设计的特征分解来实现更好的域对齐。在本文中,我们提出了一种针对MDE的新型UDA方法,称为适应的学习功能分解(LFDA),该方法学会将功能空间分解为内容和样式组件。 LFDA仅尝试对齐内容组件,因为它具有较小的域间隙。同时,它不包括针对源域的样式组件,而不是训练主要任务。此外,LFDA使用单独的特征分布估计来进一步弥合域间隙。在三个域适应性MDE方案上进行了广泛的实验表明,与最先进的方法相比,所提出的方法可实现卓越的准确性和较低的计算成本。
translated by 谷歌翻译
图像恢复(如Denoising)的最终目的是找到嘈杂和清晰图像域之间的确切相关性。但是,以样本到样本的方式进行了端到端的denoising学习,例如像素损失,这忽略了图像的内在相关性,尤其是语义。在本文中,我们介绍了深层语义统计匹配(D2SM)DENOISISN网络。它利用了预审预测的分类网络的语义特征,然后隐含地与语义特征空间上清晰图像的概率分布匹配。通过学习保留Denocied图像的语义分布,我们从经验上发现我们的方法显着提高了网络的可转换功能,并且可以通过高级视觉任务更好地理解了deNo的结果。在嘈杂的CityScapes数据集上进行的综合实验证明了我们方法对降级性能和语义分割精度的优越性。此外,在我们的扩展任务上观察到的绩效改进,包括超分辨率和飞行实验,表明了其作为新的一般插件组件的潜力。
translated by 谷歌翻译
大气湍流可以通过在大气折射索引中引起空间和时间随机的波动,从而显着降低远程成像系统获得的图像质量。折射率的变化导致捕获的图像几何扭曲和模糊。因此,重要的是要补偿由大气湍流引起的图像中的视觉降解。在本文中,我们提出了一种基于深度学习的方法,用于限制大气湍流降解的单个图像。我们利用基于蒙特卡洛辍学的认知不确定性来捕获网络很难恢复的图像中的区域。然后,使用估计的不确定性图来指导网络以获得还原图像。对合成图像和真实图像进行了广泛的实验,以显示拟议工作的重要性。代码可在以下网址找到:https://github.com/rajeevyasarla/at-net
translated by 谷歌翻译