事件相机是新颖的,生物启发的视觉传感器,其像素在局部强度变化时输出异步和独立的时间戳尖峰,称为“事件”。在延迟,高动态范围(HDR)和时间分辨率方面,事件相机提供优于传统基于帧的相机的优势。最近,事件相机仅限于在强度通道中输出事件,然而,最近的进展导致了颜色事件的发展。相机,如Color-DAVIS346。在这项工作中,我们提出并发布第一个彩色事件相机数据集(CED),包含50分钟的彩色帧和事件。 CED具有各种各样的室内和室外场景,我们希望这将有助于推动基于事件的视觉研究。我们还提供了事件相机模拟器ESIM的扩展,可以模拟颜色事件。最后,我们对三种最先进的图像重建方法进行了评估,这些方法可用于将Color-DAVIS346转换为连续时间的HDR彩色摄像机,以便对事件流进行可视化,并用于下游视觉应用。
translated by 谷歌翻译
事件相机比传统相机具有许多优点,例如低延迟性,高时间分辨率和高动态范围。然而,由于事件相机的输出是超出实际强度图像的异步事件序列,因此不能直接应用现有算法。因此,要求从其他任务的事件生成强度图像。在本文中,我们揭开了基于事件相机的条件生成对抗网络的潜力,从事件数据流的可调整部分创建图像/视频。将事件的时空坐标堆栈用作输入,并且训练网络以基于时空强度变化来重现图像。事件相机即使在极端照明条件下也可以生成高动态范围(HDR)图像,也可以在快速运动下生成非模糊图像。此外,还可以实现生成非常高帧率视频的可能性,理论上高达每秒100万帧(FPS)因为事件相机的时间分辨率约为1 {\}}。通过使用在线可用的真实数据集和由事件相机模拟器产生的合成数据集将结果与在相同的像素网格线上捕获的强度图像进行比较来评估所提出的方法。
translated by 谷歌翻译
事件摄像机在大动态范围内提供异步的,数据驱动的局部时间对比度测量,具有极高的时间分辨率。传统相机捕获低频参考强度信息。这两种传感器模式提供了补充信息。我们提出了一种计算效率高的异步滤波器,它可以将图像帧和事件连续地融入单个高时间分辨率,高动态范围的图像状态。在没有传统图像帧的情况下,过滤器只能在事件上运行。我们提出了关于高速,高动态范围序列的实验结果,以及我们生成的新的基础真实数据集,以证明所提出的算法优于现有的最先进的方法。
translated by 谷歌翻译
事件相机是视觉传感器,记录异像素亮度变化的异步流,称为“事件”。它们比基于帧的计算机视觉相机具有吸引人的优势,包括高时间分辨率,高动态范围和无运动模糊。由于事件信号的稀疏,非均匀时空布局,模式识别算法通常将事件聚合成基于网格的表示,并随后通过标准视觉管道(例如,卷积神经网络(CNN))对其进行处理。在这项工作中,我们引入了一个通用框架,通过一系列不同的操作将转换流转换为基于网格的表示。我们的框架带有两个主要优势:(i)允许以端到端的方式学习输入事件表示和任务专用网络,以及(ii)提供统一文献中现存事件表示的大多数的分类法。识别小说。根据经验,我们表明,我们的端到端学习事件表示的方法在光流估计和对象识别方面比最先进的方法提高了大约12%。
translated by 谷歌翻译
事件相机是生物灵感的传感器,与传统相机完全不同。它们不是以固定速率捕获图像,而是以异步方式测量像素亮度变化。这导致了一系列事件,它们对亮度变化的时间,位置和符号进行编码。与传统相机相比,事件相机具有出色的性能:非常高的动态范围(140 dB与60 dB),高时间分辨率(按照微秒),低功耗,不受运动障碍的影响。因此,事件相机在传统相机的挑战场景中具有很大的机器人和计算机视觉潜力,例如高速和高动态范围。然而,需要新颖的方法来处理这些传感器的非常规输出以释放它们的潜力。本文全面概述了基于事件的视觉新兴领域,重点介绍了为解锁事件摄像机的突出特性而开发的应用程序和算法。我们从他们的工作原理,可用的实际传感器和他们已经使用的任务,从低级视觉(特征检测和跟踪,光流等)到高级视觉(重建,分割,识别)呈现事件摄像机)。我们还讨论了为处理事件而开发的技术,包括基于学习的技术,以及这些新型传感器的专用处理器,例如尖峰神经网络。此外,我们还要强调仍有待解决的挑战,以及寻求更有效,生物启发的方式,使机器能够接受世界并与之互动的机会。
translated by 谷歌翻译
Event cameras are revolutionary sensors that work radically differently from standard cameras. Instead of capturing intensity images at a fixed rate, event cameras measure changes of intensity asynchronously, in the form of a stream of events, which encode per-pixel brightness changes. In the last few years, their outstanding properties (asynchronous sensing, no motion blur, high dynamic range) have led to exciting vision applications, with very low-latency and high robustness. However , these sensors are still scarce and expensive to get, slowing down progress of the research community. To address these issues, there is a huge demand for cheap, high-quality synthetic, labeled event for algorithm prototyping, deep learning and algorithm benchmarking. The development of such a simulator, however, is not trivial since event cameras work fundamentally differently from frame-based cameras. We present the first event camera simulator that can generate a large amount of reliable event data. The key component of our simulator is a theoretically sound, adaptive rendering scheme that only samples frames when necessary, through a tight coupling between the rendering engine and the event simulator. We release an open source implementation of our simulator. We release ESIM as open source: http://rpg.ifi.uzh.ch/esim.
translated by 谷歌翻译
Event cameras are bio-inspired vision sensors that naturally capture the dynamics of a scene, filtering out redundant information. This paper presents a deep neural network approach that unlocks the potential of event cameras on a challenging motion-estimation task: prediction of a vehicle's steering angle. To make the best out of this sensor-algorithm combination, we adapt state-of-the-art convolutional architectures to the output of event sensors and extensively evaluate the performance of our approach on a publicly available large scale event-camera dataset (≈1000 km). We present qualitative and quantitative explanations of why event cameras allow robust steering prediction even in cases where traditional cameras fail, e.g. challenging illumination conditions and fast motion. Finally, we demonstrate the advantages of leveraging transfer learning from traditional to event-based vision, and show that our approach outperforms state-of-the-art algorithms based on standard cameras.
translated by 谷歌翻译
基于事件的摄像机可以在高速运动和具有挑战性的照明条件下以微秒精度测量强度变化(称为“{\ it events}”)。利用有源像素传感器(APS),事件相机允许同时输出强度帧。然而,输出图像以相对低的帧速率捕获并且经常遭受运动模糊。可以将Ablurry图像视为潜像序列的积分,而事件表示潜像之间的变化。因此,我们能够通过将事件数据与alatent图像相关联来对模糊生成过程进行建模。基于丰富的事件数据和低帧率易受模糊的图像,我们提出了一种简单有效的方法来重建高质量和高帧率的视频。从单个模糊帧及其事件数据开始,我们提出了\ textbf {基于事件的双积分(EDI)}模型。然后,我们将它扩展到\ textbf {基于事件的多个DoubleIntegral(mEDI)}模型,以便根据多个图像及其事件获得更平滑和令人信服的结果。我们还提供了一种有效的求解器来最小化所提出的能量模型。通过优化能量模型,我们在消除一般模糊和重建高时分辨率视频方面取得了显着的进步。视频生成基于在单个标量变量中求解简单的非凸优化问题。合成图像和真实图像的实验结果证明了我们的mEDI模型和优化方法与现有技术相比的优越性。
translated by 谷歌翻译
我们提出了一种方法,利用事件相机和标准相机的互补性来跟踪低延迟的视觉特征。事件相机是输出像素级亮度变化的新型传感器,称为“事件”。它们比标准相机具有显着的优点,即非常高的动态范围,没有运动模糊,以及微秒级的延迟。然而,因为相同的场景模式可以根据运动方向产生不同的事件,所以跨时间建立事件对应是具有挑战性的。相比之下,标准相机提供不依赖于运动方向的强度测量(帧)。我们的方法提取帧中的特征,然后使用事件异步跟踪它们,从而开发两种类型数据中的最佳类型:帧提供不依赖于运动方向的光度表示,并且事件提供低延迟更新。与先前基于启发式的工作相比,这是基于最大似然框架内的生成事件模型直接使用原始强度测量的第一原理方法。因此,我们的方法产生的特征轨迹在各种各样的场景中都比现有技术更精确(亚像素精度)和更长。
translated by 谷歌翻译
Event cameras are bioinspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. However, event cameras output only little information when the amount of motion is limited, such as in the case of almost still motion. Conversely, standard cameras provide instant and rich information about the environment most of the time (in low-speed and good lighting scenarios), but they fail severely in case of fast motions, or difficult lighting such as high dynamic range or low light scenes. In this letter, we present the first state estimation pipeline that leverages the complementary advantages of these two sensors by fusing in a tightly coupled manner events, standard frames, and inertial measurements. We show on the publicly available Event Camera Dataset that our hybrid pipeline leads to an accuracy improvement of 130% over event-only pipelines, and 85% over standard-frames-only visual-inertial systems, while still being computationally tractable. Furthermore, we use our pipeline to demonstrate-to the best of our knowledge-the first autonomous quadrotor flight using an event camera for state estimation, unlocking flight scenarios that were not reachable with traditional visual-inertial odometry, such as low-light environments and high dynamic range scenes. Videos of the experiments: http://rpg.ifi.uzh.ch/ultimateslam.html Index Terms-SLAM, visual-based navigation, aerial systems: perception and autonomy.
translated by 谷歌翻译
基于事件的摄像机可以在高速运动和具有挑战性的照明条件下以微秒精度测量强度变化(称为“{\ it events}”)。利用有源像素传感器(APS),事件相机允许同时输出强度帧。然而,输出图像以相对低的帧速率捕获并且经常遭受运动模糊。可以将Ablurry图像视为潜像序列的积分,而事件表示潜像之间的变化。因此,我们能够通过将事件数据与alatent图像相关联来对模糊生成过程进行建模。在本文中,我们提出了一种简单有效的方法,即\ textbf {基于事件的双积分(EDI)}模型,用于从单个模糊帧及其事件数据重建高帧率,清晰视频。视频生成基于解决单个标量变量的简单非凸优化问题。合成图像和实际图像的实验结果证明了我们的EDI模型和优化方法的优越性,与现有技术相比。
translated by 谷歌翻译
基于事件的摄像机在基于帧的摄像机遭受诸如高速运动和高动态范围的各种情况下表现出很大的希望。但是,开发用于事件测量的算法需要一类新的手工制作算法。深度学习在为视觉社区中的许多问题提供无模型解决方案方面取得了巨大成功,已经开发了基于帧的图像的互联网络,并且不存在用于事件的标记数据的丰富,因为有监督训练的图像。针对这些问题,我们介绍了EV-FlowNet,这是一种用于基于事件的摄像机的光流估计的新型自监督深度学习管道。特别地,我们引入了基于图像的agiven事件流表示,其被作为鞋底输入馈入自监督神经网络。在给定来自网络的估计流量的情况下,与事件同时从相同相机捕获的相应灰度图像然后用作监控信号以在训练时间提供aloss功能。结果表明,所得到的网络能够仅在各种不同的场景中准确地预测光学流动,并且具有与基于图像的网络相关的性能。该方法不仅允许准确估计该光流,而且还提供了将其他自我监督方法转移到基于事件的域的框架。
translated by 谷歌翻译
Event-based cameras have recently drawn the attention of the Computer Vision community thanks to their advantages in terms of high temporal resolution, low power consumption and high dynamic range, compared to traditional frame-based cameras. These properties make event-based cameras an ideal choice for autonomous vehicles, robot navigation or UAV vision, among others. However, the accuracy of event-based object classification algorithms, which is of crucial importance for any reliable system working in real-world conditions, is still far behind their frame-based counterparts. Two main reasons for this performance gap are: 1. The lack of effective low-level representations and architectures for event-based object classification and 2. The absence of large real-world event-based datasets. In this paper we address both problems. First, we introduce a novel event-based feature representation together with a new machine learning architecture. Compared to previous approaches, we use local memory units to efficiently leverage past temporal information and build a robust event-based representation. Second, we release the first large real-world event-based dataset for object classification. We compare our method to the state-of-the-art with extensive experiments, showing better classification performance and real-time computation.
translated by 谷歌翻译
We present EVO, an Event-based Visual Odometry algorithm. Our algorithm successfully leverages the outstanding properties of event cameras to track fast camera motions while recovering a semi-dense 3D map of the environment. The implementation runs in real-time on a standard CPU and outputs up to several hundred pose estimates per second. Due to the nature of event cameras, our algorithm is unaffected by motion blur and operates very well in challenging, high dynamic range conditions with strong illumination changes. To achieve this, we combine a novel, event-based tracking approach based on image-to-model alignment with a recent event-based 3D reconstruction algorithm in a parallel fashion. Additionally, we show that the output of our pipeline can be used to reconstruct intensity images from the binary event stream, though our algorithm does not require such intensity information. We believe that this work makes significant progress in SLAM by unlocking the potential of event cameras. This allows us to tackle challenging scenarios that are currently inaccessible to standard cameras.
translated by 谷歌翻译
深度学习提出了希望和期望,作为许多应用程序的一般解决方案;事实证明它已被证明是有效的,但它也显示出对大量数据的强烈依赖性。幸运的是,已经证明,即使数据稀缺,也可以通过重复使用priorknowledge来训练成功的模型。因此,在最广泛的定义中,开发转移学习技术是部署有效和准确的智能系统的关键因素。本文将重点研究一系列适用于视觉目标识别任务的转移学习方法,特别是图像分类。转移学习是一个通用术语,并且特定设置已经给出了特定的名称:当学习者只能访问来自目标域的标记数据和来自不同域(源)的标记数据时,问题被称为“无监督域适应”。 (DA)。这项工作的第一部分将集中在这个设置的三种方法:其中一种方法涉及特征,一种是图像,而第三种方法同时使用两种。第二部分将重点关注机器人感知的现实生活问题,特别是RGB-D识别。机器人平台通常不仅限于色彩感知;他们经常带着Depthcamera。不幸的是,深度模态很少用于视觉识别,因为缺乏预先训练的模型,从中可以传输并且很少有数据从头开始。将提出两种处理这种情况的方法:一种使用合成数据,另一种利用跨模态转移学习。
translated by 谷歌翻译
焦点深度(DFF)是计算机视觉中经典的不适定逆问题之一。大多数方法基于表现出最大锐度的焦点设置来恢复每个像素处的深度。然而,如何可靠地估计锐度水平并不明显,特别是在低纹理区域。在本文中,我们提出“深度焦点深度(DDFF)”作为此问题的第一个端到端学习方法。我们面临的主要挑战之一是深度神经网络的数据。为了获得具有相应地基深度的大量焦点堆栈,我们建议使用具有共校准RGB-D传感器的光场相机。这允许您以数字方式创建不同大小的焦点堆栈。与现有的基准测试相比,我们的数据集大25倍,因此可以使用机器学习来解决这个反问题。我们将我们的结果与最先进的DFF方法进行比较,我们还分析了几个关键深层架构组件的效果。这些实验表明,我们提出的方法“DDFFNet”在所有场景中都能达到最先进的性能,与传统的DFF方法相比,深度误差减少了75%以上。
translated by 谷歌翻译
Event cameras or neuromorphic cameras mimic the human perception system as they measure the per-pixel intensity change rather than the actual intensity level. In contrast to traditional cameras, such cameras capture new information about the scene at MHz frequency in the form of sparse events. The high temporal resolution comes at the cost of losing the familiar per-pixel intensity information. In this work we propose a variational model that accurately models the behaviour of event cameras, enabling reconstruction of intensity images with arbitrary frame rate in real-time. Our method is formulated on a per-event-basis, where we explicitly incorporate information about the asynchronous nature of events via an event manifold induced by the relative timestamps of events. In our experiments we verify that solving the variational model on the manifold produces high-quality images without explicitly estimating optical flow.
translated by 谷歌翻译
Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that-because of the technological advantages of the event camera-our pipeline works in scenes characterized by high-speed motion, which are still unaccessible to standard cameras.
translated by 谷歌翻译
We present an algorithm to estimate the rotational motion of an event camera. In contrast to traditional cameras, which produce images at a fixed rate, event cameras have independent pixels that respond asynchronously to brightness changes, with microsecond resolution. Our method leverages the type of information conveyed by these novel sensors (i.e., edges) to directly estimate the angular velocity of the camera, without requiring optical flow or image intensity estimation. The core of the method is a contrast maximization design. The method performs favorably against ground truth data and gyroscopic measurements from an Inertial Measurement Unit, even in the presence of very high-speed motions (close to 1000 deg/s).
translated by 谷歌翻译
我们提出了一种基于事件的摄像机角点检测的学习方法,即使在快速和突然的运动下也能保持稳定。基于事件的摄像机具有高时间分辨率,功率效率和高动态范围。然而,与标准强度图像相比,基于事件的数据的属性非常不同,并且为这些图像设计的角点检测方法的简单扩展在基于事件的数据上表现不佳。我们首先介绍一种计算时间曲面的有效方法,该时间曲面对于对象的速度是不变的。然后我们展示我们可以训练一个随机森林来识别从我们的时间表面移动的角落产生的事件。随机森林也非常高效,因此是处理基于事件的摄像机的高捕获频率的一个很好的选择 - 我们的实施过程高达1.6Mev / son一个CPU。由于我们的时间表面公式和这种学习方法,我们的方法对于角落方向的突然变化比以前的方法更加稳健。我们的方法也自然地为角设置了置信度分数,这对于后处理是有用的。此外,我们还介绍了一种高分辨率数据集,适用于基于预防的摄像机角点检测方法的定量评估和比较。我们称之为SILC,针对Speed Invariant LearnedCorners,并将其与最先进的实验进行比较,展示了更好的性能。
translated by 谷歌翻译