We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360{\deg} as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from each other mutually. Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle. To extrapolate editing to GAN out-of-domain views, we devise another module that is trained in a self-supervised learning manner. This module maps novel-view images to the hidden space of StyleGAN that allows StyleGAN to generate stylized images on novel views. These two modules together produce guided images in 360{\deg}views to finetune a NeRF to make stylization effects, where a stable fine-tuning strategy is proposed to achieve this. Experiments show that NeRFEditor outperforms prior work on benchmark and real-world scenes with better editability, fidelity, and identity preservation.
translated by 谷歌翻译
我们呈现剪辑NERF,一种用于神经辐射字段(NERF)的多模态3D对象操纵方法。通过利用近期对比语言图像预培训(剪辑)模型的联合语言图像嵌入空间,我们提出了一个统一的框架,它允许以用户友好的方式操纵nerf,使用短文本提示或示例图像。具体地,为了结合NERF的新型视图合成能力以及从生成模型的潜在表示的可控操纵能力,我们引入了一种允许单独控制形状和外观的脱屑的条件NERF架构。这是通过通过将学习的变形字段应用于对体积渲染阶段的位置编码和延迟颜色调节来实现的来实现。要将这种解除潜在的潜在潜在表示到剪辑嵌入,我们设计了两个代码映射器,将剪辑嵌入为输入并更新潜在码以反映目标编辑。用基于剪辑的匹配损耗训练映射器,以确保操纵精度。此外,我们提出了一种逆优化方法,可以将输入图像精确地将输入图像投影到潜在码以进行操作以使在真实图像上进行编辑。我们在各种文本提示和示例图像上进行广泛的实验评估我们的方法,并为交互式编辑提供了直观的接口。我们的实现是在https://cassiepython.github.io/clipnerf/上获得的
translated by 谷歌翻译
与传统的头像创建管道相反,这是一个昂贵的过程,现代生成方法直接从照片中学习数据分布,而艺术的状态现在可以产生高度的照片现实图像。尽管大量作品试图扩展无条件的生成模型并达到一定程度的可控性,但要确保多视图一致性,尤其是在大型姿势中,仍然具有挑战性。在这项工作中,我们提出了一个3D肖像生成网络,该网络可产生3D一致的肖像,同时根据有关姿势,身份,表达和照明的语义参数可控。生成网络使用神经场景表示在3D中建模肖像,其生成以支持明确控制的参数面模型为指导。尽管可以通过将图像与部分不同的属性进行对比,但可以进一步增强潜在的分离,但在非面积区域(例如,在动画表达式)时,仍然存在明显的不一致。我们通过提出一种体积混合策略来解决此问题,在该策略中,我们通过将动态和静态辐射场融合在一起,形成一个复合输出,并从共同学习的语义场中分割了两个部分。我们的方法在广泛的实验中优于先前的艺术,在自由视点中观看时,在自然照明中产生了逼真的肖像。所提出的方法还证明了真实图像以及室外卡通面孔的概括能力,在实际应用中显示出巨大的希望。其他视频结果和代码将在项目网页上提供。
translated by 谷歌翻译
以前的纵向图像生成方法大致分为两类:2D GAN和3D感知的GAN。 2D GAN可以产生高保真肖像,但具有低视图一致性。 3D感知GaN方法可以维护查看一致性,但它们所生成的图像不是本地可编辑的。为了克服这些限制,我们提出了FENERF,一个可以生成查看一致和本地可编辑的纵向图像的3D感知生成器。我们的方法使用两个解耦潜码,以在具有共享几何体的空间对齐的3D卷中生成相应的面部语义和纹理。从这种底层3D表示中受益,FENERF可以联合渲染边界对齐的图像和语义掩码,并使用语义掩模通过GaN反转编辑3D音量。我们进一步示出了可以从广泛可用的单手套图像和语义面膜对中学习这种3D表示。此外,我们揭示了联合学习语义和纹理有助于产生更精细的几何形状。我们的实验表明FENERF在各种面部编辑任务中优于最先进的方法。
translated by 谷歌翻译
基于生成神经辐射场(GNERF)基于生成神经辐射场(GNERF)的3D感知gan已达到令人印象深刻的高质量图像产生,同时保持了强3D一致性。最显着的成就是在面部生成领域中取得的。但是,这些模型中的大多数都集中在提高视图一致性上,但忽略了分离的方面,因此这些模型无法提供高质量的语义/属性控制对生成。为此,我们引入了一个有条件的GNERF模型,该模型使用特定属性标签作为输入,以提高3D感知生成模型的控制能力和解散能力。我们利用预先训练的3D感知模型作为基础,并集成了双分支属性编辑模块(DAEM),该模块(DAEM)利用属性标签来提供对生成的控制。此外,我们提出了一个Triot(作为INIT的训练,并针对调整进行优化),以优化潜在矢量以进一步提高属性编辑的精度。广泛使用的FFHQ上的广泛实验表明,我们的模型在保留非目标区域的同时产生具有更好视图一致性的高质量编辑。该代码可在https://github.com/zhangqianhui/tt-gnerf上找到。
translated by 谷歌翻译
As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially on simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found in our project page: https://cassiepython.github.io/nerfart/.
translated by 谷歌翻译
Recent 3D-aware GANs rely on volumetric rendering techniques to disentangle the pose and appearance of objects, de facto generating entire 3D volumes rather than single-view 2D images from a latent code. Complex image editing tasks can be performed in standard 2D-based GANs (e.g., StyleGAN models) as manipulation of latent dimensions. However, to the best of our knowledge, similar properties have only been partially explored for 3D-aware GAN models. This work aims to fill this gap by showing the limitations of existing methods and proposing LatentSwap3D, a model-agnostic approach designed to enable attribute editing in the latent space of pre-trained 3D-aware GANs. We first identify the most relevant dimensions in the latent space of the model controlling the targeted attribute by relying on the feature importance ranking of a random forest classifier. Then, to apply the transformation, we swap the top-K most relevant latent dimensions of the image being edited with an image exhibiting the desired attribute. Despite its simplicity, LatentSwap3D provides remarkable semantic edits in a disentangled manner and outperforms alternative approaches both qualitatively and quantitatively. We demonstrate our semantic edit approach on various 3D-aware generative models such as pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D and VolumeGAN, and on diverse datasets, such as FFHQ, AFHQ, Cats, MetFaces, and CompCars. The project page can be found: \url{https://enisimsar.github.io/latentswap3d/}.
translated by 谷歌翻译
在本文中,我们提出了一个新的基于NERF的参数头模型,该参数头模型集成了神经辐射场到人头的参数表示。它可以实时呈现高保真头图像,并直接控制生成的图像渲染姿势和各种语义属性。与现有相关参数模型不同,我们使用神经辐射字段作为新颖的3D代理而不是传统的3D纹理网格,这使得HeadnerF能够生成高保真图像。然而,原始NERF的计算昂贵的渲染过程阻碍了参数NERF模型的构造。为了解决这个问题,我们采用将2D神经渲染集成到NERF的渲染过程和设计新颖损失条款的策略。结果,可以显着加速头部的渲染速度,并且一帧的渲染时间从5s降至25ms。新颖的设计损失术语还提高了渲染精度,并且人体头部的细级细节,例如牙齿,皱纹和胡须之间的间隙,可以由Headnerf表示和合成。广泛的实验结果和一些应用展示了其有效性。我们将向公众推出代码和培训的模型。
translated by 谷歌翻译
图像翻译和操纵随着深层生成模型的快速发展而引起了越来越多的关注。尽管现有的方法带来了令人印象深刻的结果,但它们主要在2D空间中运行。鉴于基于NERF的3D感知生成模型的最新进展,我们介绍了一项新的任务,语义到网络翻译,旨在重建由NERF模型的3D场景,该场景以一个单视语义掩码作为输入为条件。为了启动这项新颖的任务,我们提出了SEM2NERF框架。特别是,SEM2NERF通过将语义面膜编码到控制预训练的解码器的3D场景表示形式中来解决高度挑战的任务。为了进一步提高映射的准确性,我们将新的区域感知学习策略集成到编码器和解码器的设计中。我们验证了提出的SEM2NERF的功效,并证明它在两个基准数据集上的表现优于几个强基础。代码和视频可从https://donydchen.github.io/sem2nerf/获得
translated by 谷歌翻译
尽管最近通过生成对抗网络(GAN)操纵面部属性最近取得了非常成功的成功,但在明确控制姿势,表达,照明等特征的明确控制方面仍然存在一些挑战。最近的方法通过结合2D生成模型来实现对2D图像的明确控制和3dmm。但是,由于3DMM缺乏现实主义和纹理重建的清晰度,因此合成图像与3DMM的渲染图像之间存在域间隙。由于渲染的3DMM图像仅包含面部区域,因此直接计算这两个域之间的损失是不理想的,因此训练有素的模型将是偏差的。在这项研究中,我们建议通过控制3DMM的参数来明确编辑验证样式的潜在空间。为了解决域间隙问题,我们提出了一个名为“地图和编辑”的新网络,以及一种简单但有效的属性编辑方法,以避免渲染和合成图像之间的直接损失计算。此外,由于我们的模型可以准确地生成多视图的面部图像,而身份保持不变。作为副产品,结合可见性掩模,我们提出的模型还可以生成质地丰富和高分辨率的紫外面部纹理。我们的模型依赖于验证的样式,并且提出的模型以自我监督的方式进行了训练,而无需任何手动注释或数据集训练。
translated by 谷歌翻译
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.
translated by 谷歌翻译
最近,大型预磨损模型(例如,BERT,STYLEGAN,CLIP)在其域内的各种下游任务中表现出很好的知识转移和泛化能力。在这些努力的启发中,在本文中,我们提出了一个统一模型,用于开放域图像编辑,重点是开放式域图像的颜色和音调调整,同时保持原始内容和结构。我们的模型了解许多现有照片编辑软件中使用的操作空间(例如,对比度,亮度,颜色曲线)更具语义,直观,易于操作的统一编辑空间。我们的模型属于图像到图像转换框架,由图像编码器和解码器组成,并且在图像之前和图像的成对上培训以产生多模式输出。我们认为,通过将图像对反馈到学习编辑空间的潜在代码中,我们的模型可以利用各种下游编辑任务,例如语言引导图像编辑,个性化编辑,编辑式聚类,检索等。我们广泛地研究实验中编辑空间的独特属性,并在上述任务上展示了卓越的性能。
translated by 谷歌翻译
使用单视图2D照片仅集合,无监督的高质量多视图 - 一致的图像和3D形状一直是一个长期存在的挑战。现有的3D GAN是计算密集型的,也是没有3D-一致的近似;前者限制了所生成的图像的质量和分辨率,并且后者对多视图一致性和形状质量产生不利影响。在这项工作中,我们提高了3D GAN的计算效率和图像质量,而无需依赖这些近似。为此目的,我们介绍了一种表现力的混合明确隐式网络架构,与其他设计选择一起,不仅可以实时合成高分辨率多视图一致图像,而且还产生高质量的3D几何形状。通过解耦特征生成和神经渲染,我们的框架能够利用最先进的2D CNN生成器,例如Stylega2,并继承它们的效率和表现力。在其他实验中,我们展示了与FFHQ和AFHQ猫的最先进的3D感知合成。
translated by 谷歌翻译
多年来,2d Gans在影像肖像的一代中取得了巨大的成功。但是,他们在生成过程中缺乏3D理解,因此他们遇到了多视图不一致问题。为了减轻这个问题,已经提出了许多3D感知的甘斯,并显示出显着的结果,但是3D GAN在编辑语义属性方面努力。 3D GAN的可控性和解释性并未得到太多探索。在这项工作中,我们提出了两种解决方案,以克服2D GAN和3D感知gan的这些弱点。我们首先介绍了一种新颖的3D感知gan,Surf-Gan,它能够在训练过程中发现语义属性,并以无监督的方式控制它们。之后,我们将先验的Surf-GAN注入stylegan,以获得高保真3D控制的发电机。与允许隐姿姿势控制的现有基于潜在的方法不同,所提出的3D控制样式gan可实现明确的姿势控制对肖像生成的控制。这种蒸馏允许3D控制与许多基于样式的技术(例如,反转和风格化)之间的直接兼容性,并且在计算资源方面也带来了优势。我们的代码可从https://github.com/jgkwak95/surf-gan获得。
translated by 谷歌翻译
由于其语义上的理解和用户友好的可控性,通过三维引导,通过三维引导的面部图像操纵已广泛应用于各种交互式场景。然而,现有的基于3D形式模型的操作方法不可直接适用于域名面,例如非黑色素化绘画,卡通肖像,甚至是动物,主要是由于构建每个模型的强大困难具体面部域。为了克服这一挑战,据我们所知,我们建议使用人为3DMM操纵任意域名的第一种方法。这是通过两个主要步骤实现的:1)从3DMM参数解开映射到潜在的STYLEGO2的潜在空间嵌入,可确保每个语义属性的解除响应和精确的控制; 2)通过实施一致的潜空间嵌入,桥接域差异并使人类3DMM适用于域外面的人类3DMM。实验和比较展示了我们高质量的语义操作方法在各种面部域中的优越性,所有主要3D面部属性可控姿势,表达,形状,反照镜和照明。此外,我们开发了直观的编辑界面,以支持用户友好的控制和即时反馈。我们的项目页面是https://cassiepython.github.io/cddfm3d/index.html
translated by 谷歌翻译
High-fidelity facial avatar reconstruction from a monocular video is a significant research problem in computer graphics and computer vision. Recently, Neural Radiance Field (NeRF) has shown impressive novel view rendering results and has been considered for facial avatar reconstruction. However, the complex facial dynamics and missing 3D information in monocular videos raise significant challenges for faithful facial reconstruction. In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. Different from existing works that depend on a conditional deformation field for dynamic modeling, we propose to learn a personalized generative prior, which is formulated as a local and low dimensional subspace in the latent space of 3D-GAN. We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios. Compared with existing works, we obtain superior novel view synthesis results and faithfully face reenactment performance.
translated by 谷歌翻译
Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases (i.e., 3D segmentations defined by a group of NeRF-type functions) that are shared across the scene. While our palette-based bases are view-independent, we also predict a view-dependent function to capture the color residual (e.g., specular shading). During training, we jointly optimize the basis functions and the color palettes, and we also introduce novel regularizers to encourage the spatial coherence of the decomposition. Our method allows users to efficiently edit the appearance of the 3D scene by modifying the color palettes. We also extend our framework with compressed semantic features for semantic-aware appearance editing. We demonstrate that our technique is superior to baseline methods both quantitatively and qualitatively for appearance editing of complex real-world scenes.
translated by 谷歌翻译
随着信息中的各种方式存在于现实世界中的各种方式,多式联信息之间的有效互动和融合在计算机视觉和深度学习研究中的多模式数据的创造和感知中起着关键作用。通过卓越的功率,在多式联运信息中建模互动,多式联运图像合成和编辑近年来已成为一个热门研究主题。与传统的视觉指导不同,提供明确的线索,多式联路指南在图像合成和编辑方面提供直观和灵活的手段。另一方面,该领域也面临着具有固有的模态差距的特征的几个挑战,高分辨率图像的合成,忠实的评估度量等。在本调查中,我们全面地阐述了最近多式联运图像综合的进展根据数据模型和模型架构编辑和制定分类。我们从图像合成和编辑中的不同类型的引导方式开始介绍。然后,我们描述了多模式图像综合和编辑方法,其具有详细的框架,包括生成的对抗网络(GAN),GaN反转,变压器和其他方法,例如NERF和扩散模型。其次是在多模式图像合成和编辑中广泛采用的基准数据集和相应的评估度量的综合描述,以及分析各个优点和限制的不同合成方法的详细比较。最后,我们为目前的研究挑战和未来的研究方向提供了深入了解。与本调查相关的项目可在HTTPS://github.com/fnzhan/mise上获得
translated by 谷歌翻译
通过利用预熟gan的潜在空间,已经提出了许多最近的作品来进行面部图像编辑。但是,很少有尝试将它们直接应用于视频,因为1)他们不能保证时间一致性,2)他们的应用受到视频的处理速度的限制,3)他们无法准确编码面部运动和表达的细节。为此,我们提出了一个新颖的网络,将面部视频编码到Stylegan的潜在空间中,以进行语义面部视频操纵。基于视觉变压器,我们的网络重复了潜在向量的高分辨率部分,以实现时间一致性。为了捕捉微妙的面部运动和表情,我们设计了涉及稀疏面部地标和密集的3D脸部网眼的新颖损失。我们已经彻底评估了我们的方法,并成功证明了其对各种面部视频操作的应用。特别是,我们提出了一个新型网络,用于3D坐标系中的姿势/表达控制。定性和定量结果都表明,我们的方法可以显着优于现有的单图方法,同时实现实时(66 fps)速度。
translated by 谷歌翻译
Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power of neural rendering. It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion. Although with the facial prior preserved in pre-trained 3D GANs, reconstructing a 3D portrait with only one monocular image is still an ill-pose problem. The straightforward application of 2D GAN inversion methods focuses on texture similarity only while ignoring the correctness of 3D geometry shapes. It may raise geometry collapse effects, especially when reconstructing a side face under an extreme pose. Besides, the synthetic results in novel views are prone to be blurry. In this work, we propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior. We design a pipeline and constraints to make full use of the pseudo auxiliary view obtained via image flipping, which helps obtain a robust and reasonable geometry shape during the inversion process. To enhance texture fidelity in unobserved viewpoints, pseudo labels from depth-guided 3D warping can provide extra supervision. We design constraints aimed at filtering out conflict areas for optimization in asymmetric situations. Comprehensive quantitative and qualitative evaluations on image reconstruction and editing demonstrate the superiority of our method.
translated by 谷歌翻译