本文提出了一种新的3D形状生成方法,从而在小波域中的连续隐式表示上实现了直接生成建模。具体而言,我们提出了一个带有一对粗糙和细节系数的紧凑型小波表示,通过截短的签名距离函数和多尺度的生物联盟波波隐式表示3D形状,并制定了一对神经网络:基于生成器基于扩散模型的生成器以粗糙系数的形式产生不同的形状;以及一个细节预测因子,以进一步生成兼容的细节系数量,以丰富具有精细结构和细节的生成形状。定量和定性实验结果都表现出我们的方法在产生具有复杂拓扑和结构,干净表面和细节的多样化和高质量形状方面的优势,超过了最先进的模型的3D生成能力。
translated by 谷歌翻译
Diffusion models have shown great promise for image generation, beating GANs in terms of generation diversity, with comparable image quality. However, their application to 3D shapes has been limited to point or voxel representations that can in practice not accurately represent a 3D surface. We propose a diffusion model for neural implicit representations of 3D shapes that operates in the latent space of an auto-decoder. This allows us to generate diverse and high quality 3D surfaces. We additionally show that we can condition our model on images or text to enable image-to-3D generation and text-to-3D generation using CLIP embeddings. Furthermore, adding noise to the latent codes of existing shapes allows us to explore shape variations.
translated by 谷歌翻译
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. IM-NET is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our implicit decoder for representation learning (via IM-AE) and shape generation (via IM-GAN), we demonstrate superior results for tasks such as generative shape modeling, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.
translated by 谷歌翻译
Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D training scenes are all represented by 2D feature planes, and we can directly train existing 2D diffusion models on these representations to generate 3D neural fields with high quality and diversity, outperforming alternative approaches to 3D-aware generation. Our approach requires essential modifications to existing triplane factorization pipelines to make the resulting features easy to learn for the diffusion model. We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
translated by 谷歌翻译
我们为3D形状生成(称为SDF-Stylegan)提供了一种基于stylegan2的深度学习方法,目的是降低生成形状和形状集合之间的视觉和几何差异。我们将stylegan2扩展到3D世代,并利用隐式签名的距离函数(SDF)作为3D形状表示,并引入了两个新颖的全球和局部形状鉴别器,它们区分了真实和假的SDF值和梯度,以显着提高形状的几何形状和视觉质量。我们进一步补充了基于阴影图像的FR \'Echet Inception距离(FID)分数的3D生成模型的评估指标,以更好地评估生成形状的视觉质量和形状分布。对形状生成的实验证明了SDF-Stylegan比最先进的表现出色。我们进一步证明了基于GAN倒置的各种任务中SDF-Stylegan的功效,包括形状重建,部分点云的形状完成,基于单图像的形状形状生成以及形状样式编辑。广泛的消融研究证明了我们框架设计的功效。我们的代码和训练有素的模型可在https://github.com/zhengxinyang/sdf-stylegan上找到。
translated by 谷歌翻译
Generative models, as an important family of statistical modeling, target learning the observed data distribution via generating new instances. Along with the rise of neural networks, deep generative models, such as variational autoencoders (VAEs) and generative adversarial network (GANs), have made tremendous progress in 2D image synthesis. Recently, researchers switch their attentions from the 2D space to the 3D space considering that 3D data better aligns with our physical world and hence enjoys great potential in practice. However, unlike a 2D image, which owns an efficient representation (i.e., pixel grid) by nature, representing 3D data could face far more challenges. Concretely, we would expect an ideal 3D representation to be capable enough to model shapes and appearances in details, and to be highly efficient so as to model high-resolution data with fast speed and low memory cost. However, existing 3D representations, such as point clouds, meshes, and recent neural fields, usually fail to meet the above requirements simultaneously. In this survey, we make a thorough review of the development of 3D generation, including 3D shape generation and 3D-aware image synthesis, from the perspectives of both algorithms and more importantly representations. We hope that our discussion could help the community track the evolution of this field and further spark some innovative ideas to advance this challenging task.
translated by 谷歌翻译
Figure 1: DeepSDF represents signed distance functions (SDFs) of shapes via latent code-conditioned feed-forward decoder networks. Above images are raycast renderings of DeepSDF interpolating between two shapes in the learned shape latent space. Best viewed digitally.
translated by 谷歌翻译
从单视图重建3D形状是一个长期的研究问题。在本文中,我们展示了深度隐式地面网络,其可以通过预测底层符号距离场来从2D图像产生高质量的细节的3D网格。除了利用全局图像特征之外,禁止2D图像上的每个3D点的投影位置,并从图像特征映射中提取本地特征。结合全球和局部特征显着提高了符合距离场预测的准确性,特别是对于富含细节的区域。据我们所知,伪装是一种不断捕获从单视图图像中存在于3D形状中存在的孔和薄结构等细节的方法。 Disn在从合成和真实图像重建的各种形状类别上实现最先进的单视性重建性能。代码可在https://github.com/xharlie/disn提供补充可以在https://xharlie.github.io/images/neUrips_2019_Supp.pdf中找到补充
translated by 谷歌翻译
本地化隐式功能的最新进展使神经隐式表示能够可扩展到大型场景。然而,这些方法采用的3D空间的定期细分未能考虑到表面占用的稀疏性和几何细节的变化粒度。结果,其内存占地面积与输入体积均别较大,即使在适度密集的分解中也导致禁止的计算成本。在这项工作中,我们为3D表面,编码OCTFIELD提供了一种学习的分层隐式表示,允许具有低内存和计算预算的复杂曲面的高精度编码。我们方法的关键是仅在感兴趣的表面周围分发本地隐式功能的3D场景的自适应分解。我们通过引入分层Octree结构来实现这一目标,以根据表面占用和部件几何形状的丰富度自适应地细分3D空间。随着八十六是离散和不可分辨性的,我们进一步提出了一种新颖的等级网络,其模拟八偏细胞的细分作为概率的过程,并以可差的方式递归地编码和解码八叠结构和表面几何形状。我们展示了Octfield的一系列形状建模和重建任务的价值,显示出在替代方法方面的优越性。
translated by 谷歌翻译
Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR). More specifically, we propose a novel mapping network architecture that learns a mapping between deep features extracted from ViT and CLIP, and the latent space of a base 3D generative model. Unlike previous work, our method enables view-agnostic reconstruction of 3D shapes, even in the presence of large occlusions. We use the ShapeNetV2 dataset and perform extensive experiments with comparisons to SOTA methods to demonstrate our method's effectiveness.
translated by 谷歌翻译
With the rising industrial attention to 3D virtual modeling technology, generating novel 3D content based on specified conditions (e.g. text) has become a hot issue. In this paper, we propose a new generative 3D modeling framework called Diffusion-SDF for the challenging task of text-to-shape synthesis. Previous approaches lack flexibility in both 3D data representation and shape generation, thereby failing to generate highly diversified 3D shapes conforming to the given text descriptions. To address this, we propose a SDF autoencoder together with the Voxelized Diffusion model to learn and generate representations for voxelized signed distance fields (SDFs) of 3D shapes. Specifically, we design a novel UinU-Net architecture that implants a local-focused inner network inside the standard U-Net architecture, which enables better reconstruction of patch-independent SDF representations. We extend our approach to further text-to-shape tasks including text-conditioned shape completion and manipulation. Experimental results show that Diffusion-SDF is capable of generating both high-quality and highly diversified 3D shapes that conform well to the given text descriptions. Diffusion-SDF has demonstrated its superiority compared to previous state-of-the-art text-to-shape approaches.
translated by 谷歌翻译
Training parts from ShapeNet. (b) t-SNE plot of part embeddings. (c) Reconstructing entire scenes with Local Implicit Grids Figure 1:We learn an embedding of parts from objects in ShapeNet [3] using a part autoencoder with an implicit decoder. We show that this representation of parts is generalizable across object categories, and easily scalable to large scenes. By localizing implicit functions in a grid, we are able to reconstruct entire scenes from points via optimization of the latent grid.
translated by 谷歌翻译
我们介绍DMTET,深度3D条件生成模型,可以使用诸如粗体素的简单用户指南来合成高分辨率3D形状。它通过利用新型混合3D表示来结婚隐式和显式3D表示的优点。与当前隐含的方法相比,培训涉及符号距离值,DMTET直接针对重建的表面进行了优化,这使我们能够用更少的伪像来合成更精细的几何细节。与直接生成诸如网格之类的显式表示的深度3D生成模型不同,我们的模型可以合成具有任意拓扑的形状。 DMTET的核心包括可变形的四面体网格,其编码离散的符号距离函数和可分行的行进Tetrahedra层,其将隐式符号距离表示转换为显式谱图表示。这种组合允许使用在表面网格上明确定义的重建和对抗性损耗来联合优化表面几何形状和拓扑以及生成细分层次结构。我们的方法显着优于来自粗体素输入的条件形状合成的现有工作,培训在复杂的3D动物形状的数据集上。项目页面:https://nv-tlabs.github.io/dmtet/
translated by 谷歌翻译
深度生成模型的最新进展导致了3D形状合成的巨大进展。虽然现有模型能够合成表示为体素,点云或隐式功能的形状,但这些方法仅间接强制执行最终3D形状表面的合理性。在这里,我们提出了一种直接将对抗训练施加到物体表面的3D形状合成框架(Surfgen)。我们的方法使用可分解的球面投影层来捕获并表示隐式3D发生器的显式零IsoSurface作为在单元球上定义的功能。通过在对手设置中用球形CNN处理3D对象表面的球形表示,我们的发电机可以更好地学习自然形状表面的统计数据。我们在大规模形状数据集中评估我们的模型,并证明了端到端训练的模型能够产生具有不同拓扑的高保真3D形状。
translated by 谷歌翻译
Point cloud completion is a generation and estimation issue derived from the partial point clouds, which plays a vital role in the applications in 3D computer vision. The progress of deep learning (DL) has impressively improved the capability and robustness of point cloud completion. However, the quality of completed point clouds is still needed to be further enhanced to meet the practical utilization. Therefore, this work aims to conduct a comprehensive survey on various methods, including point-based, convolution-based, graph-based, and generative model-based approaches, etc. And this survey summarizes the comparisons among these methods to provoke further research insights. Besides, this review sums up the commonly used datasets and illustrates the applications of point cloud completion. Eventually, we also discussed possible research trends in this promptly expanding field.
translated by 谷歌翻译
在视觉计算中,3D几何形状以许多不同的形式表示,包括网格,点云,体素电网,水平集和深度图像。每个表示都适用于不同的任务,从而使一个表示形式转换为另一个表示(前向地图)是一个重要且常见的问题。我们提出了全向距离字段(ODF),这是一种新的3D形状表示形式,该表示通过将深度从任何观看方向从任何3D位置存储到对象的表面来编码几何形状。由于射线是ODF的基本单元,因此可以轻松地从通用的3D表示和点云等常见的3D表示。与限制代表封闭表面的水平集方法不同,ODF是未签名的,因此可以对开放表面进行建模(例如服装)。我们证明,尽管在遮挡边界处存在固有的不连续性,但可以通过神经网络(Neururodf)有效地学习ODF。我们还引入了有效的前向映射算法,以转换odf to&从常见的3D表示。具体而言,我们引入了一种有效的跳跃立方体算法,用于从ODF生成网格。实验表明,神经模型可以通过过度拟合单个对象学会学会捕获高质量的形状,并学会概括对共同的形状类别。
translated by 谷歌翻译
We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.
translated by 谷歌翻译
本文介绍了一个名为DTNET的新颖框架,用于3D网格重建和通过Distangled Tostology生成。除了以前的工作之外,我们还学习一个特定于每个输入的拓扑感知的神经模板,然后将模板变形以重建详细的网格,同时保留学习的拓扑。一个关键的见解是将复杂的网格重建分解为两个子任务:拓扑配方和形状变形。多亏了脱钩,DT-NET隐含地学习了潜在空间中拓扑和形状的分离表示。因此,它可以启用新型的脱离控件,以支持各种形状生成应用,例如,将3D对象的拓扑混合到以前的重建作品无法实现的3D对象的拓扑结构。广泛的实验结果表明,与最先进的方法相比,我们的方法能够产生高质量的网格,尤其是具有不同拓扑结构。
translated by 谷歌翻译
While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https://github.com/openai/point-e.
translated by 谷歌翻译