尽管神经辐射场(NERF)表现出引人注目的新视图综合结果,但编辑预训练的NERF仍然没有直觉,因为神经网络的参数和场景几何/外观通常没有明确关联。在本文中,我们介绍了第一个框架,该框架使用户能够在3D场景中删除不需要的对象或修饰不希望的区域,该场景由预先训练的NERF表示,而没有任何类别的数据和培训。用户首先绘制一个自由形式的掩码,以通过预先训练的NERF的渲染视图指定包含不需要对象的区域。我们的框架首先将用户提供的掩码转移到其他渲染视图,并估算这些传输的蒙版区域内的指导颜色和深度图像。接下来,我们制定了一个优化问题,该问题通过更新NERF模型的参数,共同对所有掩盖区域中的图像内容共同涂抹图像内容。我们在不同的场景上演示了我们的框架,并证明它在使用较短的时间和用户手动努力的情况下,在多个视图中获得了视觉上的合理和结构一致的结果。
translated by 谷歌翻译
Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects from a NeRF representation created from an RGB-D sequence. Our NeRF inpainting method leverages recent work in 2D image inpainting and is guided by a user-provided mask. Our algorithm is underpinned by a confidence based view selection procedure. It chooses which of the individual 2D inpainted images to use in the creation of the NeRF, so that the resulting inpainted NeRF is 3D consistent. We show that our method for NeRF editing is effective for synthesizing plausible inpaintings in a multi-view coherent manner. We validate our approach using a new and still-challenging dataset for the task of NeRF inpainting.
translated by 谷歌翻译
https://video-nerf.github.io Figure 1. Our method takes a single casually captured video as input and learns a space-time neural irradiance field. (Top) Sample frames from the input video. (Middle) Novel view images rendered from textured meshes constructed from depth maps. (Bottom) Our results rendered from the proposed space-time neural irradiance field.
translated by 谷歌翻译
我们提出了一种基于神经辐射场(NERF)的单个$ 360^\ PANORAMA图像合成新视图的方法。在类似环境中的先前研究依赖于多层感知的邻居插值能力来完成由遮挡引起的丢失区域,这导致其预测中的伪像。我们提出了360Fusionnerf,这是一个半监督的学习框架,我们介绍几何监督和语义一致性,以指导渐进式培训过程。首先,将输入图像重新投影至$ 360^\ Circ $图像,并在其他相机位置提取辅助深度图。除NERF颜色指导外,深度监督还改善了合成视图的几何形状。此外,我们引入了语义一致性损失,鼓励新观点的现实渲染。我们使用预先训练的视觉编码器(例如剪辑)提取这些语义功能,这是一个视觉变压器,经过数以千计的不同2D照片,并通过自然语言监督从网络中挖掘出来。实验表明,我们提出的方法可以在保留场景的特征的同时产生未观察到的区域的合理完成。 360fusionnerf在各种场景中接受培训时,转移到合成结构3D数据集(PSNR〜5%,SSIM〜3%lpips〜13%)时,始终达到最先进的性能,SSIM〜3%LPIPS〜9%)和replica360数据集(PSNR〜8%,SSIM〜2%LPIPS〜18%)。
translated by 谷歌翻译
神经辐射字段(NERF)将场景编码为神经表示,使得能够实现新颖视图的照片逼真。然而,RGB图像的成功重建需要在静态条件下拍摄的大量输入视图 - 通常可以为房间尺寸场景的几百个图像。我们的方法旨在将整个房间的小说视图从数量级的图像中合成。为此,我们利用密集的深度前导者来限制NERF优化。首先,我们利用从用于估计相机姿势的运动(SFM)预处理步骤的结构自由提供的稀疏深度数据。其次,我们使用深度完成将这些稀疏点转换为密集的深度图和不确定性估计,用于指导NERF优化。我们的方法使数据有效的新颖观看综合在挑战室内场景中,使用少量为整个场景的18张图像。
translated by 谷歌翻译
通过隐式表示表示视觉信号(例如,基于坐标的深网)在许多视觉任务中都占了上风。这项工作探讨了一个新的有趣的方向:使用可以适用于各种2D和3D场景的广义方法训练风格化的隐式表示。我们对各种隐式函数进行了试点研究,包括基于2D坐标的表示,神经辐射场和签名距离函数。我们的解决方案是一个统一的隐式神经风化框架,称为INS。与Vanilla隐式表示相反,INS将普通隐式函数分解为样式隐式模块和内容隐式模块,以便从样式图像和输入场景中分别编码表示表示。然后,应用合并模块来汇总这些信息并合成样式化的输出。为了使3D场景中的几何形状进行正规化,我们提出了一种新颖的自我鉴定几何形状一致性损失,该损失保留了风格化场景的几何忠诚度。全面的实验是在多个任务设置上进行的,包括对复杂场景的新型综合,隐式表面的风格化以及使用MLP拟合图像。我们进一步证明,学到的表示不仅是连续的,而且在风格上都是连续的,从而导致不同样式之间毫不费力地插值,并以新的混合样式生成图像。请参阅我们的项目页面上的视频以获取更多查看综合结果:https://zhiwenfan.github.io/ins。
translated by 谷歌翻译
2D-to-3D reconstruction is an ill-posed problem, yet humans are good at solving this problem due to their prior knowledge of the 3D world developed over years. Driven by this observation, we propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models. Formulating single-view reconstruction as an image-conditioned 3D generation problem, we optimize the NeRF representations by minimizing a diffusion loss on its arbitrary view renderings with a pretrained image diffusion model under the input-view constraint. We leverage off-the-shelf vision-language models and introduce a two-section language guidance as conditioning inputs to the diffusion model. This is essentially helpful for improving multiview content coherence as it narrows down the general image prior conditioned on the semantic and visual features of the single-view input image. Additionally, we introduce a geometric loss based on estimated depth maps to regularize the underlying 3D geometry of the NeRF. Experimental results on the DTU MVS dataset show that our method can synthesize novel views with higher quality even compared to existing methods trained on this dataset. We also demonstrate our generalizability in zero-shot NeRF synthesis for in-the-wild images.
translated by 谷歌翻译
Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases (i.e., 3D segmentations defined by a group of NeRF-type functions) that are shared across the scene. While our palette-based bases are view-independent, we also predict a view-dependent function to capture the color residual (e.g., specular shading). During training, we jointly optimize the basis functions and the color palettes, and we also introduce novel regularizers to encourage the spatial coherence of the decomposition. Our method allows users to efficiently edit the appearance of the 3D scene by modifying the color palettes. We also extend our framework with compressed semantic features for semantic-aware appearance editing. We demonstrate that our technique is superior to baseline methods both quantitatively and qualitatively for appearance editing of complex real-world scenes.
translated by 谷歌翻译
获取3D对象表示对于创建照片现实的模拟器和为AR/VR应用程序收集资产很重要。神经领域已经显示出其在学习2D图像的场景的连续体积表示方面的有效性,但是从这些模型中获取对象表示,并以较弱的监督仍然是一个开放的挑战。在本文中,我们介绍了Laterf,一种从给定的2D图像和已知相机姿势的2D图像中提取感兴趣对象的方法,对象的自然语言描述以及少数对象和非对象标签 - 输入图像中的对象点。为了忠实地从场景中提取对象,后来在每个3D点上都以其他“对象”概率扩展NERF公式。此外,我们利用预先训练的剪辑模型与我们可区分的对象渲染器相结合的丰富潜在空间来注入对象的封闭部分。我们在合成数据集和真实数据集上展示了高保真对象提取,并通过广泛的消融研究证明我们的设计选择是合理的。
translated by 谷歌翻译
在许多计算机视觉和图形应用程序中,从2D图像重建3D室内场景是一项重要任务。这项任务中的一个主要挑战是,典型的室内场景中的无纹理区域使现有方法难以产生令人满意的重建结果。我们提出了一种名为Neuris的新方法,以高质量地重建室内场景。 Neuris的关键思想是将估计的室内场景正常整合为神经渲染框架中的先验,以重建大型无纹理形状,并且重要的是,以适应性的方式进行此操作,以便重建不规则的形状,并具有很好的细节。 。具体而言,我们通过检查优化过程中重建的多视图一致性来评估正常先验的忠诚。只有被接受为忠实的正常先验才能用于3D重建,通常发生在平滑形状的区域中,可能具有弱质地。但是,对于那些具有小物体或薄结构的区域,普通先验通常不可靠,我们只能依靠输入图像的视觉特征,因为此类区域通常包含相对较丰富的视觉特征(例如,阴影变化和边界轮廓)。广泛的实验表明,在重建质量方面,Neuris明显优于最先进的方法。
translated by 谷歌翻译
我们提出了IM2NERF,这是一个学习框架,该框架可以预测在野生中给出单个输入图像的连续神经对象表示,仅通过现成的识别方法进行分割输出而受到监督。构建神经辐射场的标准方法利用了多视图的一致性,需要对场景的许多校准视图,这一要求在野外学习大规模图像数据时无法满足。我们通过引入一个模型将输入图像编码到包含对象形状的代码,对象外观代码以及捕获对象图像的估计相机姿势的模型来迈出解决此缺点的一步。我们的模型条件在预测的对象表示上nerf,并使用卷渲染来从新视图中生成图像。我们将模型端到端训练大量输入图像。由于该模型仅配有单视图像,因此问题高度不足。因此,除了在合成的输入视图上使用重建损失外,我们还对新颖的视图使用辅助对手损失。此外,我们利用对象对称性和循环摄像头的姿势一致性。我们在Shapenet数据集上进行了广泛的定量和定性实验,并在开放图像数据集上进行了定性实验。我们表明,在所有情况下,IM2NERF都从野外的单视图像中实现了新视图合成的最新性能。
translated by 谷歌翻译
Volumetric neural rendering methods like NeRF generate high-quality view synthesis results but are optimized per-scene leading to prohibitive reconstruction time. On the other hand, deep multi-view stereo methods can quickly reconstruct scene geometry via direct network inference. Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural features, to model a radiance field. Point-NeRF can be rendered efficiently by aggregating neural point features near scene surfaces, in a ray marching-based rendering pipeline. Moreover, Point-NeRF can be initialized via direct inference of a pre-trained deep network to produce a neural point cloud; this point cloud can be finetuned to surpass the visual quality of NeRF with 30X faster training time. Point-NeRF can be combined with other 3D reconstruction methods and handles the errors and outliers in such methods via a novel pruning and growing mechanism. The experiments on the DTU, the NeRF Synthetics , the ScanNet and the Tanks and Temples datasets demonstrate Point-NeRF can surpass the existing methods and achieve the state-of-the-art results.
translated by 谷歌翻译
我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.
translated by 谷歌翻译
3D场景感性风格化旨在根据给定的样式图像从任意新颖的视图中生成光真逼真的图像,同时在从不同观点呈现时确保一致性。一些带有神经辐射场的现有风格化方法可以通过将样式图像的特征与多视图图像结合到训练3D场景来有效地预测风格化的场景。但是,这些方法生成了包含令人反感的伪影的新型视图图像。此外,他们无法为3D场景实现普遍的影迷风格化。因此,样式图像必须根据神经辐射场重新训练3D场景表示网络。我们提出了一个新颖的3D场景,逼真的风格转移框架来解决这些问题。它可以通过2D样式图像实现感性3D场景样式转移。我们首先预先训练了2D逼真的样式传输网络,该网络可以符合任何给定内容图像和样式图像之间的影片风格转移。然后,我们使用体素特征来优化3D场景并获得场景的几何表示。最后,我们共同优化了一个超级网络,以实现场景的逼真风格传输的任意样式图像。在转移阶段,我们使用预先训练的2D影视网络来限制3D场景中不同视图和不同样式图像的感性风格。实验结果表明,我们的方法不仅实现了任意样式图像的3D影像风格转移,而且还优于视觉质量和一致性方面的现有方法。项目页面:https://semchan.github.io/upst_nerf。
translated by 谷歌翻译
神经场景表示,例如神经辐射场(NERF),基于训练多层感知器(MLP),使用一组具有已知姿势的彩色图像。现在,越来越多的设备产生RGB-D(颜色 +深度)信息,这对于各种任务非常重要。因此,本文的目的是通过将深度信息与颜色图像结合在一起,研究这些有希望的隐式表示可以进行哪些改进。特别是,最近建议的MIP-NERF方法使用圆锥形的圆丝而不是射线进行音量渲染,它使人们可以考虑具有距离距离摄像头中心距离的像素的不同区域。所提出的方法还模拟了深度不确定性。这允许解决基于NERF的方法的主要局限性,包括提高几何形状的准确性,减少伪像,更快的训练时间和缩短预测时间。实验是在众所周知的基准场景上进行的,并且比较在场景几何形状和光度重建中的准确性提高,同时将训练时间减少了3-5次。
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
我们将神经渲染与多模态图像和文本表示相结合,以仅从自然语言描述中综合不同的3D对象。我们的方法,梦场,可以产生多种物体的几何和颜色而无需3D监控。由于不同,标题3D数据的稀缺性,先前的方法仅生成来自少数类别的对象,例如ShapEnet。相反,我们指导生成与从Web的标题图像的大型数据集预先培训的图像文本模型。我们的方法优化了许多相机视图的神经辐射场,使得根据预先训练的剪辑模型,渲染图像非常高度地使用目标字幕。为了提高保真度和视觉质量,我们引入简单的几何前瞻,包括突出透射率正则化,场景界限和新的MLP架构。在实验中,梦场从各种自然语言标题中产生现实,多视图一致的物体几何和颜色。
translated by 谷歌翻译
Neural Radiance Fields (NeRFs) encode the radiance in a scene parameterized by the scene's plenoptic function. This is achieved by using an MLP together with a mapping to a higher-dimensional space, and has been proven to capture scenes with a great level of detail. Naturally, the same parameterization can be used to encode additional properties of the scene, beyond just its radiance. A particularly interesting property in this regard is the semantic decomposition of the scene. We introduce a novel technique for semantic soft decomposition of neural radiance fields (named SSDNeRF) which jointly encodes semantic signals in combination with radiance signals of a scene. Our approach provides a soft decomposition of the scene into semantic parts, enabling us to correctly encode multiple semantic classes blending along the same direction -- an impossible feat for existing methods. Not only does this lead to a detailed, 3D semantic representation of the scene, but we also show that the regularizing effects of the MLP used for encoding help to improve the semantic representation. We show state-of-the-art segmentation and reconstruction results on a dataset of common objects and demonstrate how the proposed approach can be applied for high quality temporally consistent video editing and re-compositing on a dataset of casually captured selfie videos.
translated by 谷歌翻译
我们建议使用以光源方向为条件的神经辐射场(NERF)的扩展来解决多视光度立体声问题。我们神经表示的几何部分预测表面正常方向,使我们能够理解局部表面反射率。我们的神经表示的外观部分被分解为神经双向反射率函数(BRDF),作为拟合过程的一部分学习,阴影预测网络(以光源方向为条件),使我们能够对明显的BRDF进行建模。基于物理图像形成模型的诱导偏差的学到的组件平衡使我们能够远离训练期间观察到的光源和查看器方向。我们证明了我们在多视光学立体基准基准上的方法,并表明可以通过NERF的神经密度表示可以获得竞争性能。
translated by 谷歌翻译