浅水深度图像使对象保持焦点,前景和背景背景模糊。这种效果需要比智能手机摄像机更大的镜头光圈。常规方法根据其深度获取RGB-D图像和模糊图像区域。但是,这种方法不适用于反射性或透明的表面,也不适用于深度值不准确或模棱两可的细微详细的对象轮廓。我们提出了一种基于学习的方法,可以在用单个小光圈镜头获得的手持式爆发中综合降水模糊。我们的深度学习模型直接产生了浅水深度图像,避免了明显的基于深度的模糊。模拟的孔径直径等于爆发过程中的相机翻译。由于不准确或模棱两可的深度估计,我们的方法不会遭受伪影的困扰,并且非常适合肖像摄影。
translated by 谷歌翻译
纵向模式广泛使用智能手机相机,以提供增强的摄影体验。应用于在纵向模式下捕获的图像的主要效果之一是合成浅景深(DOF)。合成的DOF(或Bokeh效应)在图像中选择性地熔断区域,以模拟使用具有宽孔径的大透镜的效果。此外,许多应用程序现在包含一个新的图像运动属性(NIMAT)来模拟背景运动,其中运动与每个像素处的估计深度相关。在这项工作中,我们通过在纵向模式下引入模糊综合过程的修改来遵循渲染NIMAT效果的趋势。特别地,我们的修改通过施加旋转模糊的核来实现来自单个图像的多视图散景的高质量合成。鉴于合成的多视图,我们可以生成类似于NIMAT效果的美学上的现实图像运动。与原始NIMAT效应和其他类似图像动作相比,我们验证了我们的方法,如Facebook 3D图像。我们的图像运动演示了一个平滑的图像视图过渡,物体边界周围的伪像较少。
translated by 谷歌翻译
A recent strand of work in view synthesis uses deep learning to generate multiplane images-a camera-centric, layered 3D representation-given two or more input images at known viewpoints. We apply this representation to singleview view synthesis, a problem which is more challenging but has potentially much wider application. Our method learns to predict a multiplane image directly from a single image input, and we introduce scale-invariant view synthesis for supervision, enabling us to train on online video. We show this approach is applicable to several different datasets, that it additionally generates reasonable depth maps, and that it learns to fill in content behind the edges of foreground objects in background layers.Project page at https://single-view-mpi.github.io/.
translated by 谷歌翻译
在本文中,我们解决了单眼散景合成的问题,我们试图从单个全焦点图像中呈现浅深度图像。与DSLR摄像机不同,由于移动光圈的物理限制,这种效果无法直接在移动摄像机中捕获。因此,我们提出了一种基于网络的方法,该方法能够从单个图像输入中渲染现实的单眼散景。为此,我们根据预测的单眼深度图引入了三个新的边缘感知散景损失,该图在模糊背景时锐化了前景边缘。然后,使用对抗性损失对该模型进行固定,从而产生逼真的玻璃效果。实验结果表明,我们的方法能够在处理复杂场景的同时产生令人愉悦的自然散景效果,并具有锋利的边缘。
translated by 谷歌翻译
Fast and easy handheld capture with guideline: closest object moves at most D pixels between views Promote sampled views to local light field via layered scene representation Blend neighboring local light fields to render novel views
translated by 谷歌翻译
虚拟现实(VR)耳机提供了一种身临其境的立体视觉体验,但以阻止用户直接观察其物理环境的代价。传递技术旨在通过利用向外的摄像头来重建否则没有耳机的用户可以看到的图像来解决此限制。这本质上是一个实时视图综合挑战,因为传递摄像机不能与眼睛进行物理共同。现有的通行技术会遭受分散重建工件的注意力,这主要是由于缺乏准确的深度信息(尤其是对于近场和分离的物体),并且表现出有限的图像质量(例如,低分辨率和单色)。在本文中,我们提出了第一种学习的传递方法,并使用包含立体声对RGB摄像机的自定义VR耳机评估其性能。通过模拟和实验,我们证明了我们所学的传递方法与最先进的方法相比提供了卓越的图像质量,同时满足了实时的,透视透视的立体视图综合的严格VR要求,从而在广泛的视野上综合用于桌面连接的耳机。
translated by 谷歌翻译
用于运动中的人类的新型视图综合是一个具有挑战性的计算机视觉问题,使得诸如自由视视频之类的应用。现有方法通常使用具有多个输入视图,3D监控或预训练模型的复杂设置,这些模型不会概括为新标识。旨在解决这些限制,我们提出了一种新颖的视图综合框架,以从单视图传感器捕获的任何人的看法生成现实渲染,其具有稀疏的RGB-D,类似于低成本深度摄像头,而没有参与者特定的楷模。我们提出了一种架构来学习由基于球体的神经渲染获得的小说视图中的密集功能,并使用全局上下文修复模型创建完整的渲染。此外,增强剂网络利用了整体保真度,即使在原始视图中的遮挡区域中也能够产生细节的清晰渲染。我们展示了我们的方法为单个稀疏RGB-D输入产生高质量的合成和真实人体演员的新颖视图。它概括了看不见的身份,新的姿势,忠实地重建面部表情。我们的方法优于现有人体观测合成方法,并且对不同水平的输入稀疏性具有稳健性。
translated by 谷歌翻译
https://video-nerf.github.io Figure 1. Our method takes a single casually captured video as input and learns a space-time neural irradiance field. (Top) Sample frames from the input video. (Middle) Novel view images rendered from textured meshes constructed from depth maps. (Bottom) Our results rendered from the proposed space-time neural irradiance field.
translated by 谷歌翻译
我们介绍了Fadiv-Syn,一种快速深入的新型观点合成方法。相关方法通常受到它们的深度估计阶段的限制,其中不正确的深度预测可能导致大的投影误差。为避免此问题,我们将输入图像有效地将输入图像呈现为目标帧,以为一系列假定的深度平面。得到的平面扫描量(PSV)直接进入我们的网络,首先以自我监督的方式估计软PSV掩模,然后直接产生新颖的输出视图。因此,我们侧行显式深度估计。这提高了透明,反光,薄,特色场景部件上的效率和性能。 Fadiv-syn可以在大规模Realestate10K数据集上执行插值和外推任务,优于最先进的外推方法。与可比方法相比,它由于其轻量级架构而实现了实时性能。我们彻底评估消融,例如去除软掩蔽网络,从更少的示例中培训以及更高的分辨率和更强深度离散化的概括。
translated by 谷歌翻译
部分闭塞作用是一种现象,即相机附近的模糊物体是半透明的,导致部分外观被遮挡的背景。但是,由于现有的散景渲染方法,由于在全焦点图像中的遮挡区域缺少信息而模拟现实的部分遮挡效果是一项挑战。受到可学习的3D场景表示的启发,我们试图通过引入一种基于MPI的新型高分辨率Bokeh渲染框架来解决部分遮挡,称为MPIB。为此,我们首先介绍了如何将MPI表示形式应用于散布渲染的分析。基于此分析,我们提出了一个MPI表示模块与背景介入模块相结合,以实现高分辨率场景表示。然后,可以将此表示形式重复使用以根据控制参数呈现各种散景效应。为了训练和测试我们的模型,我们还为数据生成设计了基于射线追踪的散景生成器。对合成和现实世界图像的广泛实验验证了该框架的有效性和灵活性。
translated by 谷歌翻译
Image view synthesis has seen great success in reconstructing photorealistic visuals, thanks to deep learning and various novel representations. The next key step in immersive virtual experiences is view synthesis of dynamic scenes. However, several challenges exist due to the lack of high-quality training datasets, and the additional time dimension for videos of dynamic scenes. To address this issue, we introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS. The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes. We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras. Our algorithm addresses the temporal inconsistency of disocclusions by identifying the error-prone areas with a 3D mask volume, and replaces them with static background observed throughout the video. Our method enables manipulation in 3D space as opposed to simple 2D masks, We demonstrate better temporal stability than frame-by-frame static view synthesis methods, or those that use 2D masks. The resulting view synthesis videos show minimal flickering artifacts and allow for larger translational movements.
translated by 谷歌翻译
在本文中,我们为复杂场景进行了高效且强大的深度学习解决方案。在我们的方法中,3D场景表示为光场,即,一组光线,每组在到达图像平面时具有相应的颜色。对于高效的新颖视图渲染,我们采用了光场的双面参数化,其中每个光线的特征在于4D参数。然后,我们将光场配向作为4D函数,即将4D坐标映射到相应的颜色值。我们训练一个深度完全连接的网络以优化这种隐式功能并记住3D场景。然后,特定于场景的模型用于综合新颖视图。与以前需要密集的视野的方法不同,需要密集的视野采样来可靠地呈现新颖的视图,我们的方法可以通过采样光线来呈现新颖的视图并直接从网络查询每种光线的颜色,从而使高质量的灯场呈现稀疏集合训练图像。网络可以可选地预测每光深度,从而使诸如自动重新焦点的应用。我们的小说视图合成结果与最先进的综合结果相当,甚至在一些具有折射和反射的具有挑战性的场景中优越。我们在保持交互式帧速率和小的内存占地面积的同时实现这一点。
translated by 谷歌翻译
我们提出了一种学习方法,可以从单个视图开始生成自然场景的无界飞行视频,在该视图中,从单个照片集中学习了这种功能,而无需每个场景的相机姿势甚至多个视图。为了实现这一目标,我们提出了一种新颖的自我监督视图生成训练范式,在这里我们采样和渲染虚拟摄像头轨迹,包括循环轨迹,使我们的模型可以从单个视图集合中学习稳定的视图生成。在测试时,尽管在训练过程中从未见过视频,但我们的方法可以拍摄单个图像,并产生长的相机轨迹,包括数百个新视图,具有现实和多样化的内容。我们将我们的方法与最新的监督视图生成方法进行了比较,该方法需要摆姿势的多视频视频,并展示了卓越的性能和综合质量。
translated by 谷歌翻译
这些年来,展示技术已经发展。开发实用的HDR捕获,处理和显示解决方案以将3D技术提升到一个新的水平至关重要。多曝光立体声图像序列的深度估计是开发成本效益3D HDR视频内容的重要任务。在本文中,我们开发了一种新颖的深度体系结构,以进行多曝光立体声深度估计。拟议的建筑有两个新颖的组成部分。首先,对传统立体声深度估计中使用的立体声匹配技术进行了修改。对于我们体系结构的立体深度估计部分,部署了单一到stereo转移学习方法。拟议的配方规避了成本量构造的要求,该要求由基于重新编码的单码编码器CNN取代,具有不同的重量以进行功能融合。基于有效网络的块用于学习差异。其次,我们使用强大的视差特征融合方法组合了从不同暴露水平上从立体声图像获得的差异图。使用针对不同质量度量计算的重量图合并在不同暴露下获得的差异图。获得的最终预测差异图更强大,并保留保留深度不连续性的最佳功能。提出的CNN具有使用标准动态范围立体声数据或具有多曝光低动态范围立体序列的训练的灵活性。在性能方面,所提出的模型超过了最新的单眼和立体声深度估计方法,无论是定量还是质量地,在具有挑战性的场景流以及暴露的Middlebury立体声数据集上。该体系结构在复杂的自然场景中表现出色,证明了其对不同3D HDR应用的有用性。
translated by 谷歌翻译
This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.
translated by 谷歌翻译
神经辐射场(NERF)及其变体在代表3D场景和合成照片现实的小说视角方面取得了巨大成功。但是,它们通常基于针孔摄像头模型,并假设全焦点输入。这限制了它们的适用性,因为从现实世界中捕获的图像通常具有有限的场地(DOF)。为了减轻此问题,我们介绍了DOF-NERF,这是一种新型的神经渲染方法,可以处理浅的DOF输入并可以模拟DOF效应。特别是,它扩展了NERF,以模拟按照几何光学的原理模拟镜头的光圈。这样的物理保证允许DOF-NERF使用不同的焦点配置操作视图。 DOF-NERF受益于显式光圈建模,还可以通过调整虚拟光圈和焦点参数来直接操纵DOF效果。它是插件,可以插入基于NERF的框架中。关于合成和现实世界数据集的实验表明,DOF-NERF不仅在全焦点设置中与NERF相当,而且可以合成以浅DOF输入为条件的全焦点新型视图。还展示了DOF-nerf在DOF渲染上的有趣应用。源代码将在https://github.com/zijinwuzijin/dof-nerf上提供。
translated by 谷歌翻译
We explore the problem of view synthesis from a narrow baseline pair of images, and focus on generating highquality view extrapolations with plausible disocclusions. Our method builds upon prior work in predicting a multiplane image (MPI), which represents scene content as a set of RGBα planes within a reference view frustum and renders novel views by projecting this content into the target viewpoints. We present a theoretical analysis showing how the range of views that can be rendered from an MPI increases linearly with the MPI disparity sampling frequency, as well as a novel MPI prediction procedure that theoretically enables view extrapolations of up to 4× the lateral viewpoint movement allowed by prior work. Our method ameliorates two specific issues that limit the range of views renderable by prior methods: 1) We expand the range of novel views that can be rendered without depth discretization artifacts by using a 3D convolutional network architecture along with a randomized-resolution training procedure to allow our model to predict MPIs with increased disparity sampling frequency. 2) We reduce the repeated texture artifacts seen in disocclusions by enforcing a constraint that the appearance of hidden content at any depth must be drawn from visible content at or behind that depth.
translated by 谷歌翻译
高速,高分辨率的立体视频(H2-STEREO)视频使我们能够在细粒度上感知动态3D内容。然而,对商品摄像机的收购H2-STEREO视频仍然具有挑战性。现有的空间超分辨率或时间框架插值方法分别提供了缺乏时间或空间细节的折衷解决方案。为了减轻这个问题,我们提出了一个双摄像头系统,其中一台相机捕获具有丰富空间细节的高空间分辨率低框架速率(HSR-LFR)视频,而另一个摄像头则捕获了低空间分辨率的高架框架-Rate(LSR-HFR)视频带有光滑的时间细节。然后,我们设计了一个学习的信息融合网络(LIFNET),该网络利用跨摄像机冗余,以增强两种相机视图,从而有效地重建H2-STEREO视频。即使在大型差异场景中,我们也利用一个差异网络将时空信息传输到视图上,基于该视图,我们建议使用差异引导的LSR-HFR视图基于差异引导的流量扭曲,并针对HSR-LFR视图进行互补的扭曲。提出了特征域中的多尺度融合方法,以最大程度地减少HSR-LFR视图中闭塞引起的翘曲幽灵和孔。 LIFNET使用YouTube收集的高质量立体视频数据集以端到端的方式进行训练。广泛的实验表明,对于合成数据和摄像头捕获的真实数据,我们的模型均优于现有的最新方法。消融研究探讨了各个方面,包括时空分辨率,摄像头基线,摄像头解理,长/短曝光和应用程序,以充分了解其对潜在应用的能力。
translated by 谷歌翻译
Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision [20,29], but their use in graphics problems has been limited ([21, 7] are notable recent exceptions). In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches which consist of multiple complex stages of processing, each of which require careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. To verify our method we show that it can convincingly reproduce known test views from nearby imagery. Additionally we show images rendered from novel viewpoints. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.
translated by 谷歌翻译