With the success of neural volume rendering in novel view synthesis, neural implicit reconstruction with volume rendering has become popular. However, most methods optimize per-scene functions and are unable to generalize to novel scenes. We introduce VolRecon, a generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct with fine details and little noise, we combine projection features, aggregated from multi-view features with a view transformer, and volume features interpolated from a coarse global feature volume. A ray transformer computes SRDF values of all the samples along a ray to estimate the surface location, which are used for volume rendering of color and depth. Extensive experiments on DTU and ETH3D demonstrate the effectiveness and generalization ability of our method. On DTU, our method outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable quality as MVSNet in full view reconstruction. Besides, our method shows good generalization ability on the large-scale ETH3D benchmark. Project page: https://fangjinhuawang.github.io/VolRecon.
translated by 谷歌翻译
我们介绍了Sparseneus,这是一种基于神经渲染的新方法,用于从多视图图像中进行表面重建的任务。当仅提供稀疏图像作为输入时,此任务变得更加困难,这种情况通常会产生不完整或失真的结果。此外,他们无法概括看不见的新场景会阻碍他们在实践中的应用。相反,Sparseneus可以概括为新场景,并与稀疏的图像(仅2或3)良好合作。 Sparseneus采用签名的距离函数(SDF)作为表面表示,并通过引入代码编码通用表面预测的几何形状来从图像特征中学习可概括的先验。此外,引入了几种策略,以有效利用稀疏视图来进行高质量重建,包括1)多层几何推理框架以粗略的方式恢复表面; 2)多尺度的颜色混合方案,以实现更可靠的颜色预测; 3)一种一致性意识的微调方案,以控制由遮挡和噪声引起的不一致区域。广泛的实验表明,我们的方法不仅胜过最先进的方法,而且表现出良好的效率,可推广性和灵活性。
translated by 谷歌翻译
神经隐式表面已成为多视图3D重建的重要技术,但它们的准确性仍然有限。在本文中,我们认为这来自难以学习和呈现具有神经网络的高频纹理。因此,我们建议在不同视图中添加标准神经渲染优化直接照片一致性术语。直观地,我们优化隐式几何体,以便以一致的方式扭曲彼此的视图。我们证明,两个元素是这种方法成功的关键:(i)使用沿着每条光线的预测占用和3D点的预测占用和法线来翘曲整个补丁,并用稳健的结构相似度测量它们的相似性; (ii)以这种方式处理可见性和遮挡,使得不正确的扭曲不会给出太多的重要性,同时鼓励重建尽可能完整。我们评估了我们的方法,在标准的DTU和EPFL基准上被称为NeuralWarp,并表明它在两个数据集上以超过20%重建的艺术态度优于未经监督的隐式表面。
translated by 谷歌翻译
在许多计算机视觉和图形应用程序中,从2D图像重建3D室内场景是一项重要任务。这项任务中的一个主要挑战是,典型的室内场景中的无纹理区域使现有方法难以产生令人满意的重建结果。我们提出了一种名为Neuris的新方法,以高质量地重建室内场景。 Neuris的关键思想是将估计的室内场景正常整合为神经渲染框架中的先验,以重建大型无纹理形状,并且重要的是,以适应性的方式进行此操作,以便重建不规则的形状,并具有很好的细节。 。具体而言,我们通过检查优化过程中重建的多视图一致性来评估正常先验的忠诚。只有被接受为忠实的正常先验才能用于3D重建,通常发生在平滑形状的区域中,可能具有弱质地。但是,对于那些具有小物体或薄结构的区域,普通先验通常不可靠,我们只能依靠输入图像的视觉特征,因为此类区域通常包含相对较丰富的视觉特征(例如,阴影变化和边界轮廓)。广泛的实验表明,在重建质量方面,Neuris明显优于最先进的方法。
translated by 谷歌翻译
虚拟内容创建和互动在现代3D应用中起着重要作用,例如AR和VR。从真实场景中恢复详细的3D模型可以显着扩大其应用程序的范围,并在计算机视觉和计算机图形社区中进行了数十年的研究。我们提出了基于体素的隐式表面表示Vox-Surf。我们的Vox-Surf将空间分为有限的体素。每个体素将几何形状和外观信息存储在其角顶点。 Vox-Surf得益于从体素表示继承的稀疏性,几乎适用于任何情况,并且可以轻松地从多个视图图像中训练。我们利用渐进式训练程序逐渐提取重要体素,以进一步优化,以便仅保留有效的体素,从而大大减少了采样点的数量并增加了渲染速度。细素还可以视为碰撞检测的边界量。该实验表明,与其他方法相比,Vox-Surf表示可以学习精致的表面细节和准确的颜色,并以更少的记忆力和更快的渲染速度来学习。我们还表明,Vox-Surf在场景编辑和AR应用中可能更实用。
translated by 谷歌翻译
在不同观点之间找到准确的对应关系是无监督的多视图立体声(MVS)的跟腱。现有方法是基于以下假设:相应的像素具有相似的光度特征。但是,在实际场景中,多视图图像观察到非斜面的表面和经验遮挡。在这项工作中,我们提出了一种新颖的方法,即神经渲染(RC-MVSNET),以解决观点之间对应关系的歧义问题。具体而言,我们施加了一个深度渲染一致性损失,以限制靠近对象表面的几何特征以减轻遮挡。同时,我们引入了参考视图综合损失,以产生一致的监督,即使是针对非兰伯特表面。关于DTU和TANKS \&Temples基准测试的广泛实验表明,我们的RC-MVSNET方法在无监督的MVS框架上实现了最先进的性能,并对许多有监督的方法进行了竞争性能。该代码在https://github.com/上发布。 BOESE0601/RC-MVSNET
translated by 谷歌翻译
图像中的3D重建在虚拟现实和自动驾驶中具有广泛的应用,在此精确要求非常高。通过利用多层感知,在神经辐射场(NERF)中进行的突破性研究已大大提高了3D对象的表示质量。后来的一些研究通过建立截短的签名距离场(TSDF)改善了NERF,但仍遭受3D重建中表面模糊的问题。在这项工作中,通过提出一种新颖的3D形状表示方式Omninerf来解决这种表面歧义。它基于训练Omni方向距离场(ODF)和神经辐射场的混合隐式场,用全向信息代替NERF中的明显密度。此外,我们在深度图上介绍了其他监督,以进一步提高重建质量。该提出的方法已被证明可以有效处理表面重建边缘的NERF缺陷,从而提供了更高质量的3D场景重建结果。
translated by 谷歌翻译
我们介绍了一种新的神经表面重建方法,称为Neus,用于重建具有高保真的对象和场景,从2D图像输入。现有的神经表面重建方法,例如DVR和IDR,需要前景掩模作为监控,容易被捕获在局部最小值中,因此与具有严重自动遮挡或薄结构的物体的重建斗争。同时,新型观测合成的最近神经方法,例如Nerf及其变体,使用体积渲染来产生具有优化的稳健性的神经场景表示,即使对于高度复杂的物体。然而,从该学习的内隐式表示提取高质量表面是困难的,因为表示表示没有足够的表面约束。在Neus中,我们建议将表面代表为符号距离功能(SDF)的零级集,并开发一种新的卷渲染方法来训练神经SDF表示。我们观察到传统的体积渲染方法导致表面重建的固有的几何误差(即偏置),因此提出了一种新的制剂,其在第一阶的第一阶偏差中没有偏置,因此即使没有掩码监督,也导致更准确的表面重建。 DTU数据集的实验和BlendedMVS数据集显示,Neus在高质量的表面重建中优于最先进的,特别是对于具有复杂结构和自动闭塞的物体和场景。
translated by 谷歌翻译
Volumetric neural rendering methods like NeRF generate high-quality view synthesis results but are optimized per-scene leading to prohibitive reconstruction time. On the other hand, deep multi-view stereo methods can quickly reconstruct scene geometry via direct network inference. Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural features, to model a radiance field. Point-NeRF can be rendered efficiently by aggregating neural point features near scene surfaces, in a ray marching-based rendering pipeline. Moreover, Point-NeRF can be initialized via direct inference of a pre-trained deep network to produce a neural point cloud; this point cloud can be finetuned to surpass the visual quality of NeRF with 30X faster training time. Point-NeRF can be combined with other 3D reconstruction methods and handles the errors and outliers in such methods via a novel pruning and growing mechanism. The experiments on the DTU, the NeRF Synthetics , the ScanNet and the Tanks and Temples datasets demonstrate Point-NeRF can surpass the existing methods and achieve the state-of-the-art results.
translated by 谷歌翻译
在本文中,我们解决了多视图3D形状重建的问题。尽管最近与隐式形状表示相关的最新可区分渲染方法提供了突破性的表现,但它们仍然在计算上很重,并且在估计的几何形状上通常缺乏精确性。为了克服这些局限性,我们研究了一种基于体积的新型表示形式建立的新计算方法,就像在最近的可区分渲染方法中一样,但是用深度图进行了参数化,以更好地实现形状表面。与此表示相关的形状能量可以评估给定颜色图像的3D几何形状,并且不需要外观预测,但在优化时仍然受益于体积整合。在实践中,我们提出了一个隐式形状表示,SRDF基于签名距离,我们通过沿摄像头射线进行参数化。相关的形状能量考虑了深度预测一致性和光度一致性之间的一致性,这是在体积表示内的3D位置。可以考虑各种照片一致先验的基础基线,或者像学习功能一样详细的标准。该方法保留具有深度图的像素准确性,并且可行。我们对标准数据集进行的实验表明,它提供了有关具有隐式形状表示的最新方法以及传统的多视角立体方法的最新结果。
translated by 谷歌翻译
We present a novel neural surface reconstruction method called NeuralRoom for reconstructing room-sized indoor scenes directly from a set of 2D images. Recently, implicit neural representations have become a promising way to reconstruct surfaces from multiview images due to their high-quality results and simplicity. However, implicit neural representations usually cannot reconstruct indoor scenes well because they suffer severe shape-radiance ambiguity. We assume that the indoor scene consists of texture-rich and flat texture-less regions. In texture-rich regions, the multiview stereo can obtain accurate results. In the flat area, normal estimation networks usually obtain a good normal estimation. Based on the above observations, we reduce the possible spatial variation range of implicit neural surfaces by reliable geometric priors to alleviate shape-radiance ambiguity. Specifically, we use multiview stereo results to limit the NeuralRoom optimization space and then use reliable geometric priors to guide NeuralRoom training. Then the NeuralRoom would produce a neural scene representation that can render an image consistent with the input training images. In addition, we propose a smoothing method called perturbation-residual restrictions to improve the accuracy and completeness of the flat region, which assumes that the sampling points in a local surface should have the same normal and similar distance to the observation center. Experiments on the ScanNet dataset show that our method can reconstruct the texture-less area of indoor scenes while maintaining the accuracy of detail. We also apply NeuralRoom to more advanced multiview reconstruction algorithms and significantly improve their reconstruction quality.
translated by 谷歌翻译
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multiview posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. 1
translated by 谷歌翻译
我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
Neural Radiance Field (NeRF) has revolutionized free viewpoint rendering tasks and achieved impressive results. However, the efficiency and accuracy problems hinder its wide applications. To address these issues, we propose Geometry-Aware Generalized Neural Radiance Field (GARF) with a geometry-aware dynamic sampling (GADS) strategy to perform real-time novel view rendering and unsupervised depth estimation on unseen scenes without per-scene optimization. Distinct from most existing generalized NeRFs, our framework infers the unseen scenes on both pixel-scale and geometry-scale with only a few input images. More specifically, our method learns common attributes of novel-view synthesis by an encoder-decoder structure and a point-level learnable multi-view feature fusion module which helps avoid occlusion. To preserve scene characteristics in the generalized model, we introduce an unsupervised depth estimation module to derive the coarse geometry, narrow down the ray sampling interval to proximity space of the estimated surface and sample in expectation maximum position, constituting Geometry-Aware Dynamic Sampling strategy (GADS). Moreover, we introduce a Multi-level Semantic Consistency loss (MSC) to assist more informative representation learning. Extensive experiments on indoor and outdoor datasets show that comparing with state-of-the-art generalized NeRF methods, GARF reduces samples by more than 25\%, while improving rendering quality and 3D geometry estimation.
translated by 谷歌翻译
神经隐式功能最近显示了来自多个视图的表面重建的有希望的结果。但是,当重建无限或复杂的场景时,当前的方法仍然遭受过度复杂性和稳健性不佳。在本文中,我们介绍了RegSDF,这表明适当的点云监督和几何正规化足以产生高质量和健壮的重建结果。具体而言,RegSDF将额外的定向点云作为输入,并优化了可区分渲染框架内的签名距离字段和表面灯场。我们还介绍了这两个关键的正规化。第一个是在给定嘈杂和不完整输入的整个距离字段中平稳扩散签名距离值的Hessian正则化。第二个是最小的表面正则化,可紧凑并推断缺失的几何形状。大量实验是在DTU,BlendenDMV以及储罐和寺庙数据集上进行的。与最近的神经表面重建方法相比,RegSDF即使对于具有复杂拓扑和非结构化摄像头轨迹的开放场景,RegSDF也能够重建表面。
translated by 谷歌翻译
我们呈现DD-NERF,一种用于代表人体几何形状和从任意输入视图的外观的新型推广隐含区域。核心贡献是一种双重扩散机制,利用稀疏的卷积神经网络来构建代表不同水平的人体的两个体积:粗糙的体积利用不清的可变形网格来提供大规模的几何指导,以及详细信息卷从本地图像功能中了解复杂的几何图形。我们还使用变压器网络聚合跨视图的图像特征和原始像素,以计算最终的高保真辐射域。各种数据集的实验表明,所提出的方法优于几何重建和新颖观看综合质量的先前工作。
translated by 谷歌翻译
We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.
translated by 谷歌翻译
https://video-nerf.github.io Figure 1. Our method takes a single casually captured video as input and learns a space-time neural irradiance field. (Top) Sample frames from the input video. (Middle) Novel view images rendered from textured meshes constructed from depth maps. (Bottom) Our results rendered from the proposed space-time neural irradiance field.
translated by 谷歌翻译
神经辐射字段(NERF)将场景编码为神经表示,使得能够实现新颖视图的照片逼真。然而,RGB图像的成功重建需要在静态条件下拍摄的大量输入视图 - 通常可以为房间尺寸场景的几百个图像。我们的方法旨在将整个房间的小说视图从数量级的图像中合成。为此,我们利用密集的深度前导者来限制NERF优化。首先,我们利用从用于估计相机姿势的运动(SFM)预处理步骤的结构自由提供的稀疏深度数据。其次,我们使用深度完成将这些稀疏点转换为密集的深度图和不确定性估计,用于指导NERF优化。我们的方法使数据有效的新颖观看综合在挑战室内场景中,使用少量为整个场景的18张图像。
translated by 谷歌翻译
Neural implicit 3D representations have emerged as a powerful paradigm for reconstructing surfaces from multiview images and synthesizing novel views. Unfortunately, existing methods such as DVR or IDR require accurate perpixel object masks as supervision. At the same time, neural radiance fields have revolutionized novel view synthesis. However, NeRF's estimated volume density does not admit accurate surface reconstruction. Our key insight is that implicit surface models and radiance fields can be formulated in a unified way, enabling both surface and volume rendering using the same model. This unified perspective enables novel, more efficient sampling procedures and the ability to reconstruct accurate surfaces without input masks. We compare our method on the DTU, BlendedMVS, and a synthetic indoor dataset. Our experiments demonstrate that we outperform NeRF in terms of reconstruction quality while performing on par with IDR without requiring masks.
translated by 谷歌翻译