translated by 谷歌翻译
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in lowdimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.
translated by 谷歌翻译
当与输入的高维投影结合使用时,多层感知器(MLP)已被证明是有效的场景编码器,通常称为\ textit {位置{位置编码}。但是,频谱频谱的场景仍然是一个挑战:选择高频进行位置编码会引入低结构区域中的噪声,而低频率则导致详细区域的拟合不佳。为了解决这个问题,我们提出了一个渐进的位置编码,将分层MLP结构暴露于频率编码的增量集。我们的模型可以准确地使用广泛的频带重建场景,并以细节的渐进级别学习场景表示形式\ textit {没有明确的每级监督}。该体系结构是模块化的:每个级别都编码一个连续的隐式表示,可以分别利用其各自的分辨率,这意味着一个较小的网络来进行更粗糙的重建。与基线相比,几个2D和3D数据集的实验显示了重建精度,代表性能力和训练速度的提高。
translated by 谷歌翻译
我们提出了一个小说嵌入字段\ emph {pref}作为促进神经信号建模和重建任务的紧凑表示。基于纯的多层感知器(MLP)神经技术偏向低频信号,并依赖于深层或傅立叶编码以避免丢失细节。取而代之的是,基于傅立叶嵌入空间的相拟合公式,PREF采用了紧凑且物理上解释的编码场。我们进行全面的实验,以证明PERF比最新的空间嵌入技术的优势。然后,我们使用近似的逆傅里叶变换方案以及新型的parseval正常器来开发高效的频率学习框架。广泛的实验表明,我们的高效和紧凑的基于频率的神经信号处理技术与2D图像完成,3D SDF表面回归和5D辐射场现场重建相同,甚至比最新的。
translated by 谷歌翻译
translated by 谷歌翻译
我们呈现Fouriermask,它采用傅立叶系列与隐式的神经表示结合起来,以产生实例分段掩模。我们将傅里叶映射(FM)应用于坐标位置,并利用映射特征作为隐式表示的输入(基于坐标的多层Perceptron(MLP))。 FouriMASK学习为特定实例预测FM的系数,因此将FM适应特定对象。这允许Fouriermask广泛化以预测来自自然图像的实例分段掩模。由于隐式功能在输入坐标的域中是连续的,因此我们说明通过对输入像素坐标进行分采样,因此我们可以在推理期间生成更高的分辨率掩码。此外,我们在Fourimask的不确定预测上培训渲染器MLP(Fourrierrend),并说明它显着提高了面具的质量。与在相同输出分辨率的基线掩模R-CNN相比,Fourimask在MS Coco DataSet上显示竞争结果,并在更高分辨率上超越它。
translated by 谷歌翻译
Multilayer perceptrons (MLPs) learn high frequencies slowly. Recent approaches encode features in spatial bins to improve speed of learning details, but at the cost of larger model size and loss of continuity. Instead, we propose to encode features in bins of Fourier features that are commonly used for positional encoding. We call these Quantized Fourier Features (QFF). As a naturally multiresolution and periodic representation, our experiments show that using QFF can result in smaller model size, faster training, and better quality outputs for several applications, including Neural Image Representations (NIR), Neural Radiance Field (NeRF) and Signed Distance Function (SDF) modeling. QFF are easy to code, fast to compute, and serve as a simple drop-in addition to many neural field representations.
translated by 谷歌翻译
translated by 谷歌翻译
We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, our main idea is to learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and efficiency on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields.
translated by 谷歌翻译
translated by 谷歌翻译
坐标-MLPS正在成为建模多维连续信号的有效工具,克服与离散网格的近似相关的许多缺点。然而,具有relu激活的坐标式MLP,以其基本的形式展示了代表具有高保真的信号的性能不佳,促进位置嵌入层的需要。最近,Sitzmann等人。提出了一种正弦激活函数,其具有省略互联网MLP的位置嵌入位置的能力,同时仍然保持高信号保真度。尽管它潜力,释放仍占据坐标 - MLP的空间;我们推测这是由于网络的超灵敏度 - 采用这种正弦激活 - 初始化方案。在本文中,我们试图扩大目前对坐标MLP中激活效果的理解,并表明存在适合编码信号的更广泛的激活。我们确认正弦激活只是本类中的一个例子,并提出了几种非定期函数,以便经验上展示比正弦曲线的随机初始化更强大的性能。最后,我们倡导转向坐标-MLP,借鉴它们的高性能和简单性,雇用这些非传统激活功能。
translated by 谷歌翻译
translated by 谷歌翻译
最近,通过神经网络参数化的隐式神经表示(INR)已成为一种强大而有前途的工具,可以代表不同种类的信号,因为其连续的,可区分的属性,表现出与经典离散表示的优越性。但是,对INR的神经网络的培训仅利用输入输出对,而目标输出相对于输入的衍生物通常忽略了输入。在本文中,我们为目标输出为图像像素的INR提出了一个训练范式,以编码图像衍生物除了神经网络中的图像值外。具体而言,我们使用有限的差异来近似图像导数。我们展示了如何利用训练范式来解决典型的INRS问题,即图像回归和逆渲染,并证明这种训练范式可以提高INR的数据效率和概括能力。我们方法的代码可在\ url {https://github.com/megvii-research/sobolev_inrs}中获得。
translated by 谷歌翻译
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or SIRENs, are ideally suited for representing complex natural signals and their derivatives. We analyze SIREN activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIRENs can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine SIRENs with hypernetworks to learn priors over the space of SIREN functions. Please see the project website for a video overview of the proposed method and all applications.
translated by 谷歌翻译
translated by 谷歌翻译
The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call "mip-NeRF" (à la "mipmap"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22× faster.
translated by 谷歌翻译
最近隐含的神经表示(INRS)作为各种数据类型的新颖且有效的表现。到目前为止,事先工作主要集中在优化其重建性能。这项工作从新颖的角度来调查INRS,即作为图像压缩的工具。为此,我们提出了基于INR的第一综合压缩管线,包括量化,量化感知再培训和熵编码。使用INRS进行编码,即对数据示例的过度装备,通常是较慢的秩序。为缓解此缺点,我们基于MAML利用META学习初始化,以便在较少的渐变更新中达到编码,这也通常提高INR的速率失真性能。我们发现,我们对INR的源压缩方法非常优于类似的事先工作,具有专门针对图像专门设计的常见压缩算法,并将基于速率 - 失真自动分析器的差距缩小到最先进的学习方法。此外,我们提供了对我们希望促进这种新颖方法对图像压缩的未来研究的重要性的广泛消融研究。
translated by 谷歌翻译
我们提出了一种新颖的方法来通过学习特定于课程的位置嵌入来增强坐标-MLP的性能。位置嵌入参数的端到端优化以及网络权重导致普遍性差的性能差。相反,我们开发了一个通用框架,以了解基于经典的图形 - 拉普拉斯正规化的位置嵌入,这可以隐含地平衡记忆和泛化之间的权衡。然后,该框架用于提出一种新的位置嵌入方案,其中坐标(例如,实例)学习的超参数来提供最佳性能。我们表明,与良好的随机傅里叶功能(RFF)相比,建议的嵌入实现了更好的稳定性。此外,我们证明所提出的嵌入方案产生稳定的梯度,使得能够作为中间层的深度架构无缝集成。
translated by 谷歌翻译
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
translated by 谷歌翻译
translated by 谷歌翻译