生成辐射田地的出现显着促进了3D感知图像合成的发展。辐射字段中的累积渲染过程使得这些生成模型更容易,因为渐变在整个音量上分布,但导致扩散的物体表面。与此同时,与Radiance Fields相比,占用表示可以本质地确保确定性表面。但是,如果我们直接向生成模型应用占用表示,在培训期间,它们只会在物体表面上接收稀疏梯度,并最终遭受收敛问题。在本文中,我们提出了一种基于生成的辐射场的新型模型的生成占用场(GOF),这些模型可以在不妨碍其训练收敛的情况下学习紧凑的物体表面。 GOF的关键介绍是从辐射字段中累积渲染到渲染的专用过渡,只有在学习的表面越来越准确的情况下,只有曲面点渲染。通过这种方式,GOF将两个表示的优点组合在统一的框架中。在实践中,通过逐渐将采样区域从整个体积逐渐缩小到表面周围的最小相邻区域,在GOF中实现了从辐射场和3月到占用表示的训练时间转换。通过对多个数据集的全面实验,我们证明了GOF可以合成具有3D一致性的高质量图像,并同时学习紧凑且光滑的物体表面。代码,模型和演示视频可在https://shedontsui.g​​ithub.io/projects/gof中获得
translated by 谷歌翻译
生成辐射场的进步推动了3D感知图像合成的边界。通过观察到3D对象应该从多个观点看起来真实的观察,这些方法将多视图约束引入正则化以从2D图像学习有效的3D辐射场。尽管有了进步,但由于形状彩色模糊,它们通常会缺少准确的3D形状,这限制了它们在下游任务中的适用性。在这项工作中,我们通过提出一种新的阴影引导的生成隐式模型来解决这种模糊性,能够学习持续改进的形状表示。我们的主要洞察力是,在不同的照明条件下,精确的3D形状还应产生逼真的渲染。通过明确地模拟照明和具有各种照明条件的阴影来实现这种多照明约束。通过将合成的图像馈送到鉴别器来导出梯度。为了补偿计算表面法线的额外计算负担,我们进一步通过表面跟踪设计了高效的体积渲染策略,将培训和推理时间分别将培训和推理时间减少了24%和48%。我们在多个数据集上的实验表明,该方法在捕获准确的基础3D形状时实现了光电型3D感知图像合成。我们展示了我们对现有方法的3D形重建的方法的改进性能,并展示了其对图像复兴的适用性。我们的代码将在https://github.com/xingangpan/shadegan发布。
translated by 谷歌翻译
制作生成模型3D感知桥梁2D图像空间和3D物理世界仍然挑战。最近尝试用神经辐射场(NERF)配备生成的对抗性网络(GAN),其将3D坐标映射到像素值,作为3D之前。然而,nerf中的隐式功能具有一个非常局部的接收领域,使得发电机难以意识到全局结构。与此同时,NERF建立在体积渲染上,这可能太昂贵,无法产生高分辨率结果,提高优化难度。为了减轻这两个问题,我们通过明确学习结构表示和纹理表示,向高保真3D感知图像综合提出了一种作为Volumegan称为Volumegan的新颖框架。我们首先学习一个特征卷来表示底层结构,然后使用类似NERF的模型转换为特征字段。特征字段进一步累积到作为纹理表示的2D特征图中,然后是用于外观合成的神经渲染器。这种设计使得能够独立控制形状和外观。广泛的数据集的大量实验表明,我们的方法比以前的方法实现了足够更高的图像质量和更好的3D控制。
translated by 谷歌翻译
利用图像生成模型的最新进展,现有的可控面图像合成方法能够生成具有某些可控性的高保真图像,例如控制生成的面部图像的形状,表达,纹理和姿势。但是,这些方法集中在2D图像生成模型上,这些模型容易在大表达和姿势变化下产生不一致的面部图像。在本文中,我们提出了一个新的基于NERF的条件3D面部合成框架,该框架可以通过从3D脸先进的3D面部施加显式3D条件来对生成的面部图像进行3D可控性。其核心是有条件的生成占用场(CGOF),可有效地强制生成的面部形状,以使其对给定的3D形态模型(3DMM)网格进行。为了准确控制合成图像的细粒3D面部形状,我们还将3D地标损耗以及体积翘曲损失纳入我们的合成算法中。实验验证了所提出的方法的有效性,该方法能够生成高保真的面部图像,并显示出比基于2D的最新可控制的面部合成方法更精确的3D可控性。在https://keqiangsun.github.io/projects/cgof上查找代码和演示。
translated by 谷歌翻译
3D感知图像生成建模旨在生成具有明确可控相机姿势的3D一致图像。最近的作品通过在非结构化的2D图像上培训神经辐射场(NERF)发电机,但仍然无法产生具有精细细节的高度现实图像。一个关键原因是体积表示学习的高记忆和计算成本大大限制了训练期间辐射集成的点样本的数量。不足的采样不仅限制了发电机的表现力,以处理细节细节,而且由于不稳定的蒙特卡罗采样引起的噪音,因此阻碍了有效的GaN训练。我们提出了一种新的方法,调节点采样和辐射场地学习在2D歧管上,体现为3D音量中的一组学习隐式表面。对于每个观看射线,我们计算射线表面交叉点并累积由网络产生的亮度。通过培训和渲染如此光辉的歧管,我们的发电机可以产生具有现实细节和强大的视觉3D一致性的高质量图像。
translated by 谷歌翻译
We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches however fall short in two ways: first, they may lack an underlying 3D representation or rely on view-inconsistent rendering, hence synthesizing images that are not multi-view consistent; second, they often depend upon representation network architectures that are not expressive enough, and their results thus lack in image quality. We propose a novel generative model, named Periodic Implicit Generative Adversarial Networks (π-GAN or pi-GAN), for high-quality 3D-aware image synthesis. π-GAN leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent radiance fields. The proposed approach obtains state-of-the-art results for 3D-aware image synthesis with multiple real and synthetic datasets.
translated by 谷歌翻译
我们介绍了我们称呼STYLESDF的高分辨率,3D一致的图像和形状生成技术。我们的方法仅在单视图RGB数据上培训,并站在StyleGan2的肩部,用于图像生成,同时解决3D感知GANS中的两个主要挑战:1)RGB图像的高分辨率,视图 - 一致生成RGB图像,以及2)详细的3D形状。通过使用基于样式的2D发生器合并基于SDF的3D表示来实现这一目标。我们的3D隐式网络呈现出低分辨率的特征映射,其中基于样式的网络生成了View-Consive,1024x1024图像。值得注意的是,基于SDF的3D建模定义了详细的3D曲面,导致一致的卷渲染。在视觉和几何质量方面,我们的方法显示出更高的质量结果。
translated by 谷歌翻译
Neural implicit 3D representations have emerged as a powerful paradigm for reconstructing surfaces from multiview images and synthesizing novel views. Unfortunately, existing methods such as DVR or IDR require accurate perpixel object masks as supervision. At the same time, neural radiance fields have revolutionized novel view synthesis. However, NeRF's estimated volume density does not admit accurate surface reconstruction. Our key insight is that implicit surface models and radiance fields can be formulated in a unified way, enabling both surface and volume rendering using the same model. This unified perspective enables novel, more efficient sampling procedures and the ability to reconstruct accurate surfaces without input masks. We compare our method on the DTU, BlendedMVS, and a synthetic indoor dataset. Our experiments demonstrate that we outperform NeRF in terms of reconstruction quality while performing on par with IDR without requiring masks.
translated by 谷歌翻译
图像翻译和操纵随着深层生成模型的快速发展而引起了越来越多的关注。尽管现有的方法带来了令人印象深刻的结果,但它们主要在2D空间中运行。鉴于基于NERF的3D感知生成模型的最新进展,我们介绍了一项新的任务,语义到网络翻译,旨在重建由NERF模型的3D场景,该场景以一个单视语义掩码作为输入为条件。为了启动这项新颖的任务,我们提出了SEM2NERF框架。特别是,SEM2NERF通过将语义面膜编码到控制预训练的解码器的3D场景表示形式中来解决高度挑战的任务。为了进一步提高映射的准确性,我们将新的区域感知学习策略集成到编码器和解码器的设计中。我们验证了提出的SEM2NERF的功效,并证明它在两个基准数据集上的表现优于几个强基础。代码和视频可从https://donydchen.github.io/sem2nerf/获得
translated by 谷歌翻译
随着几个行业正在朝着建模大规模的3D虚拟世界迈进,因此需要根据3D内容的数量,质量和多样性来扩展的内容创建工具的需求变得显而易见。在我们的工作中,我们旨在训练Parterant 3D生成模型,以合成纹理网格,可以通过3D渲染引擎直接消耗,因此立即在下游应用中使用。 3D生成建模的先前工作要么缺少几何细节,因此在它们可以生成的网格拓扑中受到限制,通常不支持纹理,或者在合成过程中使用神经渲染器,这使得它们在常见的3D软件中使用。在这项工作中,我们介绍了GET3D,这是一种生成模型,该模型直接生成具有复杂拓扑,丰富几何细节和高保真纹理的显式纹理3D网格。我们在可区分的表面建模,可区分渲染以及2D生成对抗网络中桥接了最新成功,以从2D图像集合中训练我们的模型。 GET3D能够生成高质量的3D纹理网格,从汽车,椅子,动物,摩托车和人类角色到建筑物,对以前的方法进行了重大改进。
translated by 谷歌翻译
2D图像是对用几何形状,材料和照明组件描绘的3D物理世界的观察。从2D图像(也称为逆渲染)中恢复这些基本的内在组件通常需要有监督的设置,并从多个观点和照明条件中收集的配对图像,这是资源要求的。在这项工作中,我们提出了GAN2X,这是一种无监督的逆渲染方法,仅使用未配对的图像进行训练。与以前主要集中在3D形状的形状 - 从GAN的方法不同,我们首次尝试通过利用GAN生成的伪配对数据来恢复非陆层材料的性能。为了实现精确的逆渲染,我们设计了一种镜面感知的神经表面表示,该表示连续建模几何和材料特性。采用基于阴影的改进技术来进一步提炼目标图像中的信息并恢复更多细节。实验表明,GAN2X可以准确地将2D图像分解为不同对象类别的3D形状,反照率和镜面特性,并实现无监督的单视图3D面部重建的最先进性能。我们还显示了其在下游任务中的应用,包括真实的图像编辑和将2D GAN抬高到分解3D GAN。
translated by 谷歌翻译
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.
translated by 谷歌翻译
以前的纵向图像生成方法大致分为两类:2D GAN和3D感知的GAN。 2D GAN可以产生高保真肖像,但具有低视图一致性。 3D感知GaN方法可以维护查看一致性,但它们所生成的图像不是本地可编辑的。为了克服这些限制,我们提出了FENERF,一个可以生成查看一致和本地可编辑的纵向图像的3D感知生成器。我们的方法使用两个解耦潜码,以在具有共享几何体的空间对齐的3D卷中生成相应的面部语义和纹理。从这种底层3D表示中受益,FENERF可以联合渲染边界对齐的图像和语义掩码,并使用语义掩模通过GaN反转编辑3D音量。我们进一步示出了可以从广泛可用的单手套图像和语义面膜对中学习这种3D表示。此外,我们揭示了联合学习语义和纹理有助于产生更精细的几何形状。我们的实验表明FENERF在各种面部编辑任务中优于最先进的方法。
translated by 谷歌翻译
使用单视图2D照片仅集合,无监督的高质量多视图 - 一致的图像和3D形状一直是一个长期存在的挑战。现有的3D GAN是计算密集型的,也是没有3D-一致的近似;前者限制了所生成的图像的质量和分辨率,并且后者对多视图一致性和形状质量产生不利影响。在这项工作中,我们提高了3D GAN的计算效率和图像质量,而无需依赖这些近似。为此目的,我们介绍了一种表现力的混合明确隐式网络架构,与其他设计选择一起,不仅可以实时合成高分辨率多视图一致图像,而且还产生高质量的3D几何形状。通过解耦特征生成和神经渲染,我们的框架能够利用最先进的2D CNN生成器,例如Stylega2,并继承它们的效率和表现力。在其他实验中,我们展示了与FFHQ和AFHQ猫的最先进的3D感知合成。
translated by 谷歌翻译
Generative models, as an important family of statistical modeling, target learning the observed data distribution via generating new instances. Along with the rise of neural networks, deep generative models, such as variational autoencoders (VAEs) and generative adversarial network (GANs), have made tremendous progress in 2D image synthesis. Recently, researchers switch their attentions from the 2D space to the 3D space considering that 3D data better aligns with our physical world and hence enjoys great potential in practice. However, unlike a 2D image, which owns an efficient representation (i.e., pixel grid) by nature, representing 3D data could face far more challenges. Concretely, we would expect an ideal 3D representation to be capable enough to model shapes and appearances in details, and to be highly efficient so as to model high-resolution data with fast speed and low memory cost. However, existing 3D representations, such as point clouds, meshes, and recent neural fields, usually fail to meet the above requirements simultaneously. In this survey, we make a thorough review of the development of 3D generation, including 3D shape generation and 3D-aware image synthesis, from the perspectives of both algorithms and more importantly representations. We hope that our discussion could help the community track the evolution of this field and further spark some innovative ideas to advance this challenging task.
translated by 谷歌翻译
我们提出Volux-GaN,一种生成框架,以合成3D感知面孔的令人信服的回忆。我们的主要贡献是一种体积的HDRI可发感方法,可以沿着每个3D光线沿着任何所需的HDR环境图累计累积Albedo,漫射和镜面照明贡献。此外,我们展示了使用多个鉴别器监督图像分解过程的重要性。特别是,我们提出了一种数据增强技术,其利用单个图像肖像结合的最近的进步来强制实施一致的几何形状,反照镜,漫射和镜面组分。与其他生成框架的多个实验和比较展示了我们的模型是如何向光电型可致力于的3D生成模型前进的一步。
translated by 谷歌翻译
神经隐式表示在新的视图合成和来自多视图图像的高质量3D重建方面显示了其有效性。但是,大多数方法都集中在整体场景表示上,但忽略了其中的各个对象,从而限制了潜在的下游应用程序。为了学习对象组合表示形式,一些作品将2D语义图作为训练中的提示,以掌握对象之间的差异。但是他们忽略了对象几何和实例语义信息之间的牢固联系,这导致了单个实例的不准确建模。本文提出了一个新颖的框架ObjectsDF,以在3D重建和对象表示中构建具有高保真度的对象复合神经隐式表示。观察常规音量渲染管道的歧义,我们通过组合单个对象的签名距离函数(SDF)来对场景进行建模,以发挥明确的表面约束。区分不同实例的关键是重新审视单个对象的SDF和语义标签之间的牢固关联。特别是,我们将语义信息转换为对象SDF的函数,并为场景和对象开发统一而紧凑的表示形式。实验结果表明,ObjectSDF框架在表示整体对象组合场景和各个实例方面的优越性。可以在https://qianyiwu.github.io/objectsdf/上找到代码
translated by 谷歌翻译
The neural radiance field (NeRF) has shown promising results in preserving the fine details of objects and scenes. However, unlike mesh-based representations, it remains an open problem to build dense correspondences across different NeRFs of the same category, which is essential in many downstream tasks. The main difficulties of this problem lie in the implicit nature of NeRF and the lack of ground-truth correspondence annotations. In this paper, we show it is possible to bypass these challenges by leveraging the rich semantics and structural priors encapsulated in a pre-trained NeRF-based GAN. Specifically, we exploit such priors from three aspects, namely 1) a dual deformation field that takes latent codes as global structural indicators, 2) a learning objective that regards generator features as geometric-aware local descriptors, and 3) a source of infinite object-specific NeRF samples. Our experiments demonstrate that such priors lead to 3D dense correspondence that is accurate, smooth, and robust. We also show that established dense correspondence across NeRFs can effectively enable many NeRF-based downstream applications such as texture transfer.
translated by 谷歌翻译
While 2D generative adversarial networks have enabled high-resolution image synthesis, they largely lack an understanding of the 3D world and the image formation process. Thus, they do not provide precise control over camera viewpoint or object pose. To address this problem, several recent approaches leverage intermediate voxel-based representations in combination with differentiable rendering. However, existing methods either produce low image resolution or fall short in disentangling camera and scene properties, e.g., the object identity may vary with the viewpoint. In this paper, we propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. In contrast to voxelbased representations, radiance fields are not confined to a coarse discretization of the 3D space, yet allow for disentangling camera and scene properties while degrading gracefully in the presence of reconstruction ambiguity. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone. We systematically analyze our approach on several challenging synthetic and real-world datasets. Our experiments reveal that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.
translated by 谷歌翻译
NeRF synthesizes novel views of a scene with unprecedented quality by fitting a neural radiance field to RGB images. However, NeRF requires querying a deep Multi-Layer Perceptron (MLP) millions of times, leading to slow rendering times, even on modern GPUs. In this paper, we demonstrate that real-time rendering is possible by utilizing thousands of tiny MLPs instead of one single large MLP. In our setting, each individual MLP only needs to represent parts of the scene, thus smaller and faster-to-evaluate MLPs can be used. By combining this divide-and-conquer strategy with further optimizations, rendering is accelerated by three orders of magnitude compared to the original NeRF model without incurring high storage costs. Further, using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality.
translated by 谷歌翻译