人发现是在人居住环境中导航的移动机器人的至关重要任务。激光雷达传感器在此任务中很有希望,这要归功于其准确的深度测量和较大的视野。存在两种类型的LIDAR传感器:扫描单个平面的2D LIDAR传感器和3D激光雷达传感器,它们扫描多个平面,从而形成体积。他们如何比较人检测任务?为了回答这一点,我们使用公共大规模的Jackrabbot数据集以及最先进的2D和3D激光雷达的人检测器(分别是DR-SPAAM和CenterPoint)进行了一系列实验。我们的实验包括多个方面,从基本性能和速度比较到对距离和场景混乱的本地化精度和鲁棒性的更详细分析。这些实验的见解突出了2D和3D激光雷达传感器的优势和劣势作为人检测的来源,并且对于设计将与周围人类密切运行的移动机器人特别有价值(例如,服务或社交机器人)。
translated by 谷歌翻译
We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. Our proposed architecture is shown to produce state of the art results on the KITTI 3D object detection benchmark [1] while running in real time with a low memory footprint, making it a suitable candidate for deployment on autonomous vehicles. Code is at: https://github.com/kujason/avod
translated by 谷歌翻译
In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. Towards this goal, we design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. Our proposed continuous fusion layer encode both discrete-state image features as well as continuous geometric information. This enables us to design a novel, reliable and efficient end-to-end learnable 3D object detector based on multiple sensors. Our experimental evaluation on both KITTI as well as a large scale 3D object detection benchmark shows significant improvements over the state of the art.
translated by 谷歌翻译
不利天气条件可能会对基于激光雷达的对象探测器产生负面影响。在这项工作中,我们专注于在寒冷天气条件下的车辆气体排气凝结现象。这种日常效果会影响对象大小,取向并引入幽灵对象检测的估计,从而损害了最先进的对象检测器状态的可靠性。我们建议通过使用数据增强和新颖的培训损失项来解决此问题。为了有效地训练深层神经网络,需要大量标记的数据。如果天气不利,此过程可能非常费力且昂贵。我们分为两个步骤解决此问题:首先,我们根据3D表面重建和采样提出了一种气排气数据生成方法,该方法使我们能够从一小群标记的数据池中生成大量的气体排气云。其次,我们引入了一个点云增强过程,该过程可用于在良好天气条件下记录的数据集中添加气体排气。最后,我们制定了一个新的训练损失术语,该损失术语利用增强点云来通过惩罚包括噪声的预测来增加对象检测的鲁棒性。与其他作品相反,我们的方法可以与基于网格的检测器和基于点的检测器一起使用。此外,由于我们的方法不需要任何网络体系结构更改,因此推理时间保持不变。实际数据的实验结果表明,我们提出的方法大大提高了对气体排气和嘈杂数据的鲁棒性。
translated by 谷歌翻译
We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Computation speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive in computation due to high dimensionality of point clouds. We utilize the 3D data more efficiently by representing the scene from the Bird's Eye View (BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixelwise neural network predictions. The input representation, network architecture, and model optimization are especially designed to balance high accuracy and real-time efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at > 28 FPS.
translated by 谷歌翻译
Object detection in point clouds is an important aspect of many robotics applications such as autonomous driving. In this paper we consider the problem of encoding a point cloud into a format appropriate for a downstream detection pipeline. Recent literature suggests two types of encoders; fixed encoders tend to be fast but sacrifice accuracy, while encoders that are learned from data are more accurate, but slower. In this work we propose PointPillars, a novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars). While the encoded features can be used with any standard 2D convolutional detection architecture, we further propose a lean downstream network. Extensive experimentation shows that PointPillars outperforms previous encoders with respect to both speed and accuracy by a large margin. Despite only using lidar, our full detection pipeline significantly outperforms the state of the art, even among fusion methods, with respect to both the 3D and bird's eye view KITTI benchmarks. This detection performance is achieved while running at 62 Hz: a 2 -4 fold runtime improvement. A faster version of our method matches the state of the art at 105 Hz. These benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds.
translated by 谷歌翻译
基于LIDAR的传感驱动器电流自主车辆。尽管进展迅速,但目前的激光雷达传感器在分辨率和成本方面仍然落后于传统彩色相机背后的二十年。对于自主驾驶,这意味着靠近传感器的大物体很容易可见,但远方或小物体仅包括一个测量或两个。这是一个问题,尤其是当这些对象结果驾驶危险时。另一方面,在车载RGB传感器中清晰可见这些相同的对象。在这项工作中,我们提出了一种将RGB传感器无缝熔化成基于LIDAR的3D识别方法。我们的方法采用一组2D检测来生成密集的3D虚拟点,以增加否则稀疏的3D点云。这些虚拟点自然地集成到任何基于标准的LIDAR的3D探测器以及常规激光雷达测量。由此产生的多模态检测器简单且有效。大规模NUSCENES数据集的实验结果表明,我们的框架通过显着的6.6地图改善了强大的中心点基线,并且优于竞争融合方法。代码和更多可视化可在https://tianweiy.github.io/mvp/上获得
translated by 谷歌翻译
流行的对象检测度量平均精度(3D AP)依赖于预测的边界框和地面真相边界框之间的结合。但是,基于摄像机的深度估计的精度有限,这可能会导致其他合理的预测,这些预测遭受了如此纵向定位错误,被视为假阳性和假阴性。因此,我们提出了流行的3D AP指标的变体,这些变体旨在在深度估计误差方面更具允许性。具体而言,我们新颖的纵向误差耐受度指标,Let-3D-AP和Let-3D-APL,允许预测的边界框的纵向定位误差,最高为给定的公差。所提出的指标已在Waymo Open DataSet 3D摄像头仅检测挑战中使用。我们认为,它们将通过提供更有信息的性能信号来促进仅相机3D检测领域的进步。
translated by 谷歌翻译
In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability. * Majority of the work done as an intern at Nuro, Inc. depth to point cloud 2D region (from CNN) to 3D frustum 3D box (from PointNet)
translated by 谷歌翻译
我们提出了DeepFusion,这是一种模块化的多模式结构,可在不同组合中以3D对象检测为融合激光雷达,相机和雷达。专门的功能提取器可以利用每种模式,并且可以轻松交换,从而使该方法变得简单而灵活。提取的特征被转化为鸟眼视图,作为融合的共同表示。在特征空间中融合方式之前,先进行空间和语义对齐。最后,检测头利用丰富的多模式特征,以改善3D检测性能。 LIDAR相机,激光摄像头雷达和摄像头融合的实验结果显示了我们融合方法的灵活性和有效性。在此过程中,我们研究了高达225米的遥远汽车检测的很大程度上未开发的任务,显示了激光摄像机融合的好处。此外,我们研究了3D对象检测的LIDAR点所需的密度,并在对不利天气条件的鲁棒性示例中说明了含义。此外,对我们的摄像头融合的消融研究突出了准确深度估计的重要性。
translated by 谷歌翻译
自动驾驶应用中使用的激光雷达传感器会受到不利天气条件的负面影响。一种常见但有研究的效果是在寒冷的天气中凝结车辆气体的凝结。这种日常现象会严重影响雷达测量值的质量,从而通过创建像幽灵对象检测之类的人工制品,从而导致不太准确的环境感知。在文献中,使用基于学习的方法来实现雨水和雾之类的不利天气影响的语义分割。但是,这样的方法需要大量标记的数据,这可能非常昂贵且艰辛。我们通过提出两步方法来检测冷凝车气排气的方法来解决这个问题。首先,我们在场景中为每辆车确定其排放区域,并在存在的情况下检测气体排气。然后,通过对可能存在气体排气的空间区域进行建模来检测到孤立的云。我们测试了实际城市数据的方法,表明我们的方法可以可靠地检测到不同情况下的气体排气,从而吸引了离线预标和在线应用程序(例如幽灵对象检测)的吸引力。
translated by 谷歌翻译
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-theart performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, Center-Point outperforms all previous single model methods by a large margin and ranks first among all Lidar-only submissions. The code and pretrained models are available at https://github.com/tianweiy/CenterPoint.
translated by 谷歌翻译
近年来,自主驾驶LIDAR数据的3D对象检测一直在迈出卓越的进展。在最先进的方法中,已经证明了将点云进行编码为鸟瞰图(BEV)是有效且有效的。与透视图不同,BEV在物体之间保留丰富的空间和距离信息;虽然在BEV中相同类型的更远物体不会较小,但它们包含稀疏点云特征。这一事实使用共享卷积神经网络削弱了BEV特征提取。为了解决这一挑战,我们提出了范围感知注意网络(RAANET),提取更强大的BEV功能并产生卓越的3D对象检测。范围感知的注意力(RAA)卷曲显着改善了近距离的特征提取。此外,我们提出了一种新的辅助损耗,用于密度估计,以进一步增强覆盖物体的Raanet的检测精度。值得注意的是,我们提出的RAA卷积轻量级,并兼容,以集成到用于BEV检测的任何CNN架构中。 Nuscenes DataSet上的广泛实验表明,我们的提出方法优于基于LIDAR的3D对象检测的最先进的方法,具有16 Hz的实时推断速度,为LITE版本为22 Hz。该代码在匿名GitHub存储库HTTPS://github.com/Anonymous0522 / ange上公开提供。
translated by 谷歌翻译
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world selfdriving problems, we introduce a new large-scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest cam-era+LiDAR dataset available based on our proposed geographical coverage metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
translated by 谷歌翻译
3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies -a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that it is not the quality of the data but its representation that accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations -essentially mimicking the LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance -raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo-image-based approaches. Our code is publicly available at https: //github.com/mileyan/pseudo_lidar.
translated by 谷歌翻译
实时和高性能3D对象检测对于自动驾驶至关重要。最近表现最佳的3D对象探测器主要依赖于基于点或基于3D Voxel的卷积,这两者在计算上均无效地部署。相比之下,基于支柱的方法仅使用2D卷积,从而消耗了较少的计算资源,但它们的检测准确性远远落后于基于体素的对应物。在本文中,通过检查基于支柱和体素的探测器之间的主要性能差距,我们开发了一个实时和高性能的柱子检测器,称为Pillarnet。提出的柱子由一个强大的编码网络组成,用于有效的支柱特征学习,用于空间语义特征融合的颈网和常用的检测头。仅使用2D卷积,Pillarnet具有可选的支柱尺寸的灵活性,并与经典的2D CNN骨架兼容,例如VGGNET和RESNET.ADITIONICLY,Pillarnet受益于我们设计的方向iOu decoupled iou Recressions you Recressions损失以及IOU Aware Pareace Predication Prediction Predictight offication Branch。大规模Nuscenes数据集和Waymo Open数据集的广泛实验结果表明,在有效性和效率方面,所提出的Pillarnet在最新的3D检测器上表现良好。源代码可在https://github.com/agent-sgs/pillarnet.git上找到。
translated by 谷歌翻译
自动驾驶汽车必须在3D中检测其他车辆和行人,以计划安全路线并避免碰撞。基于深度学习的最先进的3D对象探测器已显示出有希望的准确性,但容易过度拟合域特质,使它们在新环境中失败 - 如果自动驾驶汽车旨在自动操作,则是一个严重的问题。在本文中,我们提出了一种新颖的学习方法,该方法通过在目标域中的伪标记上微调检测器,从而大大减少这一差距,我们的方法在车辆停放时会根据先前记录的驾驶序列的重播而生成的差距。在这些重播中,随着时间的推移会跟踪对象,并且检测被插值和外推 - 至关重要的是利用未来的信息来捕获硬病例。我们在五个自动驾驶数据集上显示,对这些伪标签上的对象检测器进行微调大大减少了域间隙到新的驾驶环境,从而极大地提高了准确性和检测可靠性。
translated by 谷歌翻译
3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-A 2 net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-A 2 net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. Code is available at https://github.com/sshaoshuai/PointCloudDet3D.
translated by 谷歌翻译
Figure 1: Results obtained from our single image, monocular 3D object detection network MonoDIS on a KITTI3D test image with corresponding birds-eye view, showing its ability to estimate size and orientation of objects at different scales.
translated by 谷歌翻译
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
translated by 谷歌翻译