轻巧的飞行时间(TOF)深度传感器很小,便宜,低能量,并且已在移动设备上大量部署在移动设备上,以进行自动对焦,障碍物检测等。但是,由于其特定的测量值(深度分布)在某个像素时的区域而不是深度值,并且分辨率极低,它们不足以用于需要高保真深度(例如3D重建)的应用。在本文中,我们提出了Deltar,这是一种新颖的方法,可以通过与颜色图像合作来赋予高分辨率和准确深度的能力。作为Deltar的核心,提出了一种用于深度分布的特征提取器,并提出了基于注意力的神经体系结构,以有效地从颜色和TOF域中融合信息。为了在现实世界中评估我们的系统,我们设计了一个数据收集设备,并提出了一种校准RGB摄像头和TOF传感器的新方法。实验表明,我们的方法比旨在使用商品级RGB-D传感器的PAR性能实现的现有框架比现有的框架产生更准确的深度。代码和数据可在https://zju3dv.github.io/deltar/上获得。
translated by 谷歌翻译
4D隐式表示中的最新进展集中在全球控制形状和运动的情况下,低维潜在向量,这很容易缺少表面细节和累积跟踪误差。尽管许多深层的本地表示显示了3D形状建模的有希望的结果,但它们的4D对应物尚不存在。在本文中,我们通过提出一个新颖的局部4D隐性代表来填补这一空白,以动态穿衣人,名为Lord,具有4D人类建模和局部代表的优点,并实现具有详细的表面变形的高保真重建,例如衣服皱纹。特别是,我们的主要见解是鼓励网络学习本地零件级表示的潜在代码,能够解释本地几何形状和时间变形。为了在测试时间进行推断,我们首先估计内部骨架运动在每个时间步中跟踪本地零件,然后根据不同类型的观察到的数据通过自动编码来优化每个部分的潜在代码。广泛的实验表明,该提出的方法具有强大的代表4D人类的能力,并且在实际应用上胜过最先进的方法,包括从稀疏点,非刚性深度融合(质量和定量)进行的4D重建。
translated by 谷歌翻译
我们引入了一个新的隐式形状表示,称为基于射线的隐式函数(PRIF)。与基于处理空间位置的签名距离函数(SDF)的大多数现有方法相反,我们的表示形式在定向射线上运行。具体而言,PRIF的配制是直接产生给定输入射线的表面命中点,而无需昂贵的球体跟踪操作,因此可以有效地提取形状提取和可区分的渲染。我们证明,经过编码PRIF的神经网络在各种任务中取得了成功,包括单个形状表示,类别形状的生成,从稀疏或嘈杂的观察到形状完成,相机姿势估计的逆渲染以及带有颜色的神经渲染。
translated by 谷歌翻译
最近,神经隐式渲染技术已经迅速发展,并在新型视图合成和3D场景重建中显示出很大的优势。但是,用于编辑目的的现有神经渲染方法提供了有限的功能,例如刚性转换,或不适用于日常生活中的一般物体的细粒度编辑。在本文中,我们通过编码神经隐性字段,并在网格顶点上编码神经隐式字段,并在网格顶点上编码纹理代码,从而促进了一组编辑功能,包括网格引导的几何形状编辑,指定的纹理编辑,纹理交换,纹理交换,,纹理交换,,纹理编辑,,纹理编辑,,纹理编辑,,纹理编辑,,纹理编辑,,纹理编辑,,纹理编辑,,纹理编辑。填充和绘画操作。为此,我们开发了几种技术,包括可学习的符号指标,以扩大基于网格的表示,蒸馏和微调机制的空间区分性,以稳定地收敛,以及空间感知的优化策略,以实现精确的纹理编辑。关于真实和合成数据的广泛实验和编辑示例都证明了我们方法在表示质量和编辑能力上的优越性。代码可在项目网页上找到:https://zju3dv.github.io/neumesh/。
translated by 谷歌翻译
我们提出Volux-GaN,一种生成框架,以合成3D感知面孔的令人信服的回忆。我们的主要贡献是一种体积的HDRI可发感方法,可以沿着每个3D光线沿着任何所需的HDR环境图累计累积Albedo,漫射和镜面照明贡献。此外,我们展示了使用多个鉴别器监督图像分解过程的重要性。特别是,我们提出了一种数据增强技术,其利用单个图像肖像结合的最近的进步来强制实施一致的几何形状,反照镜,漫射和镜面组分。与其他生成框架的多个实验和比较展示了我们的模型是如何向光电型可致力于的3D生成模型前进的一步。
translated by 谷歌翻译
We propose a differentiable sphere tracing algorithm to bridge the gap between inverse graphics methods and the recently proposed deep learning based implicit signed distance function. Due to the nature of the implicit function, the rendering process requires tremendous function queries, which is particularly problematic when the function is represented as a neural network. We optimize both the forward and backward passes of our rendering layer to make it run efficiently with affordable memory consumption on a commodity graphics card. Our rendering method is fully differentiable such that losses can be directly computed on the rendered 2D observations, and the gradients can be propagated backwards to optimize the 3D geometry. We show that our rendering method can effectively reconstruct accurate 3D shapes from various inputs, such as sparse depth and multi-view images, through inverse optimization. With the geometry based reasoning, our 3D shape prediction methods show excellent generalization capability and robustness against various noises. * Work done while Shaohui Liu was an academic guest at ETH Zurich.
translated by 谷歌翻译
We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.
translated by 谷歌翻译
Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms. However, existing datasets still cover only a limited number of views or a restricted scale of spaces. In this paper, we introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification.
translated by 谷歌翻译
Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems. Recently, plenty of works are proposed, following a similar paradigm consisting of three essential components, i.e., camera feature extraction, BEV feature construction, and task heads. Among the three components, BEV feature construction is BEV-specific compared with 2D tasks. Existing methods aggregate the multi-view camera features to the flattened grid in order to construct the BEV feature. However, flattening the BEV space along the height dimension fails to emphasize the informative features of different heights. For example, the barrier is located at a low height while the truck is located at a high height. In this paper, we propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights. Instead of flattening the BEV space, we first sample along the height dimension to build the global and local BEV slices. Then, the features of BEV slices are aggregated from the camera features and merged by the attention mechanism. Finally, we fuse the merged local and global BEV features by a transformer to generate the final feature map for task heads. The purpose of local BEV slices is to emphasize informative heights. In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices. Compared with uniform sampling, LiDAR-guided sampling can determine more informative heights. We conduct detailed experiments to demonstrate the effectiveness of BEV-SAN. Code will be released.
translated by 谷歌翻译
Most multimodal multi-objective evolutionary algorithms (MMEAs) aim to find all global Pareto optimal sets (PSs) for a multimodal multi-objective optimization problem (MMOP). However, in real-world problems, decision makers (DMs) may be also interested in local PSs. Also, searching for both global and local PSs is more general in view of dealing with MMOPs, which can be seen as a generalized MMOP. In addition, the state-of-the-art MMEAs exhibit poor convergence on high-dimension MMOPs. To address the above two issues, in this study, a novel coevolutionary framework termed CoMMEA for multimodal multi-objective optimization is proposed to better obtain both global and local PSs, and simultaneously, to improve the convergence performance in dealing with high-dimension MMOPs. Specifically, the CoMMEA introduces two archives to the search process, and coevolves them simultaneously through effective knowledge transfer. The convergence archive assists the CoMMEA to quickly approaching the Pareto optimal front (PF). The knowledge of the converged solutions is then transferred to the diversity archive which utilizes the local convergence indicator and the $\epsilon$-dominance-based method to obtain global and local PSs effectively. Experimental results show that CoMMEA is competitive compared to seven state-of-the-art MMEAs on fifty-four complex MMOPs.
translated by 谷歌翻译