The quantitative evaluation of optical flow algorithms by Barron et al. (1994) led to significant advances in performance. The challenges for optical flow algorithms today go beyond the datasets and evaluation methods proposed in that paper. Instead, they center on problems associated with complex natural scenes, including nonrigid motion, real sensor noise, and motion discontinuities. We propose a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. To that end, we contribute four types of data to test different aspects of optical flow algorithms: (1) sequences with nonrigid motion where the ground-truth flow is determined by A preliminary version of this paper appeared in the IEEE International Conference on Computer Vision (Baker et al. 2007).
translated by 谷歌翻译
Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods. Our taxonomy is designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms. We have also produced several new multi-frame stereo data sets with ground truth and are making both the code and data sets available on the Web. Finally, we include a comparative evaluation of a large set of today's best-performing stereo algorithms.
translated by 谷歌翻译
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image-and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.
translated by 谷歌翻译
传统上,本征成像或内在图像分解被描述为将图像分解为两层:反射率,材料的反射率;和一个阴影,由光和几何之间的相互作用产生。近年来,深入学习技术已广泛应用,以提高这些分离的准确性。在本调查中,我们概述了那些在知名内在图像数据集和文献中使用的相关度量的结果,讨论了预测所需的内在图像分解的适用性。虽然Lambertian的假设仍然是许多方法的基础,但我们表明,对图像形成过程更复杂的物理原理组件的潜力越来越意识到,这是光学准确的材料模型和几何形状,更完整的逆轻型运输估计。考虑使用的前瞻和模型以及驾驶分解过程的学习架构和方法,我们将这些方法分类为分解的类型。考虑到最近神经,逆和可微分的渲染技术的进步,我们还提供了关于未来研究方向的见解。
translated by 谷歌翻译
Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluating scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.
translated by 谷歌翻译
We propose "factor matting", an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of video into independent components, each visualizing a counterfactual version of the scene where contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that produces useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections. Our method is trained per-video and requires neither pre-training on external large datasets, nor knowledge about the 3D structure of the scene. We conduct extensive experiments, and show that our method not only can disentangle scenes with complex interactions, but also outperforms top methods on existing tasks such as classical video matting and background subtraction. In addition, we demonstrate the benefits of our approach on a range of downstream tasks. Please refer to our project webpage for more details: https://factormatte.github.io
translated by 谷歌翻译
传统摄像机测量图像强度。相比之下,事件相机以异步测量每像素的时间强度变化。恢复事件的强度是一个流行的研究主题,因为重建的图像继承了高动态范围(HDR)和事件的高速属性;因此,它们可以在许多机器人视觉应用中使用并生成慢动作HDR视频。然而,最先进的方法通过训练映射到图像经常性神经网络(RNN)来解决这个问题,这缺乏可解释性并且难以调整。在这项工作中,我们首次展示运动和强度估计的联合问题导致我们以模拟基于事件的图像重建作为可以解决的线性逆问题,而无需训练图像重建RNN。相反,基于古典和学习的图像前导者可以用于解决问题并从重建的图像中删除伪影。实验表明,尽管仅使用来自短时间间隔(即,没有复发连接),但是,尽管只使用来自短时间间隔的数据,所提出的方法会产生视觉质量的图像。我们的方法还可用于提高首先估计图像Laplacian的方法重建的图像的质量;在这里,我们的方法可以被解释为由图像前提引导的泊松重建。
translated by 谷歌翻译
使用FASS-MVS,我们提出了一种具有表面感知半全局匹配的快速多视图立体声的方法,其允许从UAV捕获的单眼航空视频数据中快速深度和正常地图估计。反过来,由FASS-MVS估计的数据促进在线3D映射,这意味着在获取或接收到图像数据时立即和递增地生成场景的3D地图。 FASS-MVS由分层处理方案组成,其中深度和正常数据以及相应的置信度分数以粗略的方式估计,允许有效地处理由倾斜图像所固有的大型场景深度低无人机。实际深度估计采用用于致密多图像匹配的平面扫描算法,以产生深度假设,通过表面感知半全局优化来提取实际深度图,从而减少了SGM的正平行偏压。给定估计的深度图,然后通过将深度图映射到点云中并计算狭窄的本地邻域内的普通向量来计算像素 - 方面正常信息。在彻底的定量和消融研究中,我们表明,由FASS-MV计算的3D信息的精度接近离线多视图立体声的最先进方法,误差甚至没有一个幅度而不是科麦。然而,同时,FASS-MVS的平均运行时间估计单个深度和正常地图的距离小于ColMAP的14%,允许在1-中执行全高清图像的在线和增量处理2 Hz。
translated by 谷歌翻译
本文介绍了一种新颖的体系结构,用于同时估算高度准确的光流和刚性场景转换,以实现困难的场景,在这种情况下,亮度假设因强烈的阴影变化而违反了亮度假设。如果是旋转物体或移动的光源(例如在黑暗中驾驶汽车遇到的光源),场景的外观通常从一个视图到下一个视图都发生了很大变化。不幸的是,用于计算光学流或姿势的标准方法是基于这样的期望,即场景中特征在视图之间保持恒定。在调查的情况下,这些方法可能经常失败。提出的方法通过组合图像,顶点和正常数据来融合纹理和几何信息,以计算照明不变的光流。通过使用粗到最新的策略,可以学习全球锚定的光流,从而减少了基于伪造的伪相应的影响。基于学习的光学流,提出了第二个体系结构,该体系结构可预测扭曲的顶点和正常地图的稳健刚性变换。特别注意具有强烈旋转的情况,这通常会导致这种阴影变化。因此,提出了一个三步程序,该程序可以利用正态和顶点之间的相关性。该方法已在新创建的数据集上进行了评估,该数据集包含具有强烈旋转和阴影效果的合成数据和真实数据。该数据代表了3D重建中的典型用例,其中该对象通常在部分重建之间以很大的步骤旋转。此外,我们将该方法应用于众所周知的Kitti Odometry数据集。即使由于实现了Brighness的假设,这不是该方法的典型用例,因此,还建立了对标准情况和与其他方法的关系的适用性。
translated by 谷歌翻译
This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which cannot be handled by existing methods.
translated by 谷歌翻译
现代计算机视觉已超越了互联网照片集的领域,并进入了物理世界,通过非结构化的环境引导配备摄像头的机器人和自动驾驶汽车。为了使这些体现的代理与现实世界对象相互作用,相机越来越多地用作深度传感器,重建了各种下游推理任务的环境。机器学习辅助的深度感知或深度估计会预测图像中每个像素的距离。尽管已经在深入估算中取得了令人印象深刻的进步,但仍然存在重大挑战:(1)地面真相深度标签很难大规模收集,(2)通常认为相机信息是已知的,但通常是不可靠的,并且(3)限制性摄像机假设很常见,即使在实践中使用了各种各样的相机类型和镜头。在本论文中,我们专注于放松这些假设,并描述将相机变成真正通用深度传感器的最终目标的贡献。
translated by 谷歌翻译
The PASCAL Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection.This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.
translated by 谷歌翻译
大量位移光流是许多计算机视觉任务不可或缺的一部分。基于粗到精细方案的差异稀疏匹配,并在局部优化以颜色,梯度和平滑度来调节能量模型,使它们对稀疏匹配,变形和任意大型位移对噪声敏感。本文解决了此问题并提出了混合流,这是大型位移和变形的变异运动估计框架。在图像对上执行多尺度混合匹配方法。通过按照群集的上下文描述符匹配,通过根据特征描述符对像素进行分类而形成的粗尺度簇。我们在每个匹配的粗尺度簇中包含的精细尺度超级像素上应用多尺度的图形匹配。无法进一步细分的小簇使用局部特征匹配进行匹配。这些初始匹配在一起形成了流动,该流程通过边缘固定插值和变化的细化而传播。我们的方法不需要训练,并且由于现场运动而引起的实质性位移以及刚性和非韧性转换是强大的,因此它非常适合大规模图像(例如宽面积运动图像(WAMI))。更值得注意的是,混合流在代表感知组的任意拓扑的有向图上起作用,从而在存在明显变形的情况下改善运动估计。我们展示了混合流的优越性能,在两个基准数据集上的最先进的变量技术,并通过最先进的基于深度学习的技术报告可比的结果。
translated by 谷歌翻译
Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks where CNNs were successful. In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.Since existing ground truth datasets are not sufficiently large to train a CNN, we generate a synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.
translated by 谷歌翻译
结合同时定位和映射(SLAM)估计和动态场景建模可以高效地在动态环境中获得机器人自主权。机器人路径规划和障碍避免任务依赖于场景中动态对象运动的准确估计。本文介绍了VDO-SLAM,这是一种强大的视觉动态对象感知SLAM系统,用于利用语义信息,使得能够在场景中进行准确的运动估计和跟踪动态刚性物体,而无需任何先前的物体形状或几何模型的知识。所提出的方法识别和跟踪环境中的动态对象和静态结构,并将这些信息集成到统一的SLAM框架中。这导致机器人轨迹的高度准确估计和对象的全部SE(3)运动以及环境的时空地图。该系统能够从对象的SE(3)运动中提取线性速度估计,为复杂的动态环境中的导航提供重要功能。我们展示了所提出的系统对许多真实室内和室外数据集的性能,结果表明了对最先进的算法的一致和实质性的改进。可以使用源代码的开源版本。
translated by 谷歌翻译
综合照片 - 现实图像和视频是计算机图形的核心,并且是几十年的研究焦点。传统上,使用渲染算法(如光栅化或射线跟踪)生成场景的合成图像,其将几何形状和材料属性的表示为输入。统称,这些输入定义了实际场景和呈现的内容,并且被称为场景表示(其中场景由一个或多个对象组成)。示例场景表示是具有附带纹理的三角形网格(例如,由艺术家创建),点云(例如,来自深度传感器),体积网格(例如,来自CT扫描)或隐式曲面函数(例如,截短的符号距离)字段)。使用可分辨率渲染损耗的观察结果的这种场景表示的重建被称为逆图形或反向渲染。神经渲染密切相关,并将思想与经典计算机图形和机器学习中的思想相结合,以创建用于合成来自真实观察图像的图像的算法。神经渲染是朝向合成照片现实图像和视频内容的目标的跨越。近年来,我们通过数百个出版物显示了这一领域的巨大进展,这些出版物显示了将被动组件注入渲染管道的不同方式。这种最先进的神经渲染进步的报告侧重于将经典渲染原则与学习的3D场景表示结合的方法,通常现在被称为神经场景表示。这些方法的一个关键优势在于它们是通过设计的3D-一致,使诸如新颖的视点合成捕获场景的应用。除了处理静态场景的方法外,我们还涵盖了用于建模非刚性变形对象的神经场景表示...
translated by 谷歌翻译
对医疗保健监控的远程工具的需求从未如此明显。摄像机测量生命体征利用成像装置通过分析人体的图像来计算生理变化。建立光学,机器学习,计算机视觉和医学的进步这些技术以来的数码相机的发明以来已经显着进展。本文介绍了对生理生命体征的相机测量综合调查,描述了它们可以测量的重要标志和实现所做的计算技术。我涵盖了临床和非临床应用以及这些应用需要克服的挑战,以便从概念上推进。最后,我描述了对研究社区可用的当前资源(数据集和代码),并提供了一个全面的网页(https://cameravitals.github.io/),其中包含这些资源的链接以及其中引用的所有文件的分类列表文章。
translated by 谷歌翻译
面部特征跟踪是成像跳芭式(BCG)的关键组成部分,其中需要精确定量面部关键点的位移,以获得良好的心率估计。皮肤特征跟踪能够在帕金森病中基于视频的电机降解量化。传统的计算机视觉算法包括刻度不变特征变换(SIFT),加速强大的功能(冲浪)和LUCAS-KANADE方法(LK)。这些长期代表了最先进的效率和准确性,但是当存在常见的变形时,如图所示,如图所示,如此。在过去的五年中,深度卷积神经网络对大多数计算机视觉任务的传统方法表现优于传统的传统方法。我们提出了一种用于特征跟踪的管道,其应用卷积堆积的AutoEncoder,以将图像中最相似的裁剪标识到包含感兴趣的特征的参考裁剪。 AutoEncoder学会将图像作物代表到特定于对象类别的深度特征编码。我们在面部图像上培训AutoEncoder,并验证其在手动标记的脸部和手视频中通常验证其跟踪皮肤功能的能力。独特的皮肤特征(痣)的跟踪误差是如此之小,因为我们不能排除他们基于$ \ chi ^ 2 $ -test的手动标签。对于0.6-4.2像素的平均误差,我们的方法在所有情况下都表现出了其他方法。更重要的是,我们的方法是唯一一个不分歧的方法。我们得出的结论是,我们的方法为特征跟踪,特征匹配和图像配准比传统算法创建更好的特征描述符。
translated by 谷歌翻译
培训和测试监督对象检测模型需要大量带有地面真相标签的图像。标签定义图像中的对象类及其位置,形状以及可能的其他信息,例如姿势。即使存在人力,标签过程也非常耗时。我们引入了一个新的标签工具,用于2D图像以及3D三角网格:3D标记工具(3DLT)。这是一个独立的,功能丰富和跨平台软件,不需要安装,并且可以在Windows,MacOS和基于Linux的发行版上运行。我们不再像当前工具那样在每个图像上分别标记相同的对象,而是使用深度信息从上述图像重建三角形网格,并仅在上述网格上标记一次对象。我们使用注册来简化3D标记,离群值检测来改进2D边界框的计算和表面重建,以将标记可能性扩展到大点云。我们的工具经过最先进的方法测试,并且在保持准确性和易用性的同时,它极大地超过了它们。
translated by 谷歌翻译
Mapping the seafloor with underwater imaging cameras is of significant importance for various applications including marine engineering, geology, geomorphology, archaeology and biology. For shallow waters, among the underwater imaging challenges, caustics i.e., the complex physical phenomena resulting from the projection of light rays being refracted by the wavy surface, is likely the most crucial one. Caustics is the main factor during underwater imaging campaigns that massively degrade image quality and affect severely any 2D mosaicking or 3D reconstruction of the seabed. In this work, we propose a novel method for correcting the radiometric effects of caustics on shallow underwater imagery. Contrary to the state-of-the-art, the developed method can handle seabed and riverbed of any anaglyph, correcting the images using real pixel information, thus, improving image matching and 3D reconstruction processes. In particular, the developed method employs deep learning architectures in order to classify image pixels to "non-caustics" and "caustics". Then, exploits the 3D geometry of the scene to achieve a pixel-wise correction, by transferring appropriate color values between the overlapping underwater images. Moreover, to fill the current gap, we have collected, annotated and structured a real-world caustic dataset, namely R-CAUSTIC, which is openly available. Overall, based on the experimental results and validation the developed methodology is quite promising in both detecting caustics and reconstructing their intensity.
translated by 谷歌翻译