我们为来自多视图立体声(MVS)城市场景的3D建筑物的实例分割了一部小说框架。与关注城市场景的语义分割的现有作品不同,即使它们安装在大型和不精确的3D表面模型中,这项工作的重点是检测和分割3D构建实例。通过添加高度图,首先将多视图RGB图像增强到RGBH图像,并且被分段以使用微调的2D实例分割神经网络获得所有屋顶实例。然后将来自不同的多视图图像的屋顶实例掩码被聚集到全局掩码中。我们的面具聚类占空间闭塞和重叠,可以消除多视图图像之间的分割歧义。基于这些全局掩码,3D屋顶实例由掩码背部投影分割,并通过Markov随机字段(MRF)优化扩展到整个建筑实例。定量评估和消融研究表明了该方法的所有主要步骤的有效性。提供了一种用于评估3D建筑模型的实例分割的数据集。据我们所知,它是一个在实例分割级别的3D城市建筑的第一个数据集。
translated by 谷歌翻译
视频分析的图像分割在不同的研究领域起着重要作用,例如智能城市,医疗保健,计算机视觉和地球科学以及遥感应用。在这方面,最近致力于发展新的细分策略;最新的杰出成就之一是Panoptic细分。后者是由语义和实例分割的融合引起的。明确地,目前正在研究Panoptic细分,以帮助获得更多对视频监控,人群计数,自主驾驶,医学图像分析的图像场景的更细致的知识,以及一般对场景更深入的了解。为此,我们介绍了本文的首次全面审查现有的Panoptic分段方法,以获得作者的知识。因此,基于所采用的算法,应用场景和主要目标的性质,执行现有的Panoptic技术的明确定义分类。此外,讨论了使用伪标签注释新数据集的Panoptic分割。继续前进,进行消融研究,以了解不同观点的Panoptic方法。此外,讨论了适合于Panoptic分割的评估度量,并提供了现有解决方案性能的比较,以告知最先进的并识别其局限性和优势。最后,目前对主题技术面临的挑战和吸引不久的将来吸引相当兴趣的未来趋势,可以成为即将到来的研究研究的起点。提供代码的文件可用于:https://github.com/elharroussomar/awesome-panoptic-egation
translated by 谷歌翻译
Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. In this direction, we explore the tasks of 3D human semantic-, instance- and multi-human body-part segmentation. Few works have attempted to directly segment humans in point clouds (or depth maps), which is largely due to the lack of training data on humans interacting with 3D scenes. We address this challenge and propose a framework for synthesizing virtual humans in realistic 3D scenes. Synthetic point cloud data is attractive since the domain gap between real and synthetic depth is small compared to images. Our analysis of different training schemes using a combination of synthetic and realistic data shows that synthetic data for pre-training improves performance in a wide variety of segmentation tasks and models. We further propose the first end-to-end model for 3D multi-human body-part segmentation, called Human3D, that performs all the above segmentation tasks in a unified manner. Remarkably, Human3D even outperforms previous task-specific state-of-the-art methods. Finally, we manually annotate humans in test scenes from EgoBody to compare the proposed training schemes and segmentation models.
translated by 谷歌翻译
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
translated by 谷歌翻译
Deep learning based methods have significantly boosted the study of automatic building extraction from remote sensing images. However, delineating vectorized and regular building contours like a human does remains very challenging, due to the difficulty of the methodology, the diversity of building structures, and the imperfect imaging conditions. In this paper, we propose the first end-to-end learnable building contour extraction framework, named BuildMapper, which can directly and efficiently delineate building polygons just as a human does. BuildMapper consists of two main components: 1) a contour initialization module that generates initial building contours; and 2) a contour evolution module that performs both contour vertex deformation and reduction, which removes the need for complex empirical post-processing used in existing methods. In both components, we provide new ideas, including a learnable contour initialization method to replace the empirical methods, dynamic predicted and ground truth vertex pairing for the static vertex correspondence problem, and a lightweight encoder for vertex information extraction and aggregation, which benefit a general contour-based method; and a well-designed vertex classification head for building corner vertices detection, which casts light on direct structured building contour extraction. We also built a suitable large-scale building dataset, the WHU-Mix (vector) building dataset, to benefit the study of contour-based building extraction methods. The extensive experiments conducted on the WHU-Mix (vector) dataset, the WHU dataset, and the CrowdAI dataset verified that BuildMapper can achieve a state-of-the-art performance, with a higher mask average precision (AP) and boundary AP than both segmentation-based and contour-based methods.
translated by 谷歌翻译
本文介绍了Omnicity,这是一种从多层次和多视图图像中了解无所不能的城市理解的新数据集。更确切地说,Omnicity包含多视图的卫星图像以及街道级全景图和单视图图像,构成了超过100k像素的注释图像,这些图像是从纽约市的25k Geo-Locations中良好的一致性和收集的。为了减轻大量像素的注释努力,我们提出了一个有效的街景图像注释管道,该管道利用了卫星视图的现有标签地图以及不同观点之间的转换关系(卫星,Panorama和Mono-View)。有了新的Omnicity数据集,我们为各种任务提供基准,包括构建足迹提取,高度估计以及构建平面/实例/细粒细分。我们还分析了视图对每个任务的影响,不同模型的性能,现有方法的局限性等。与现有的多层次和多视图基准相比,我们的Omnicity包含更多具有更丰富注释类型和更丰富的图像更多的视图,提供了从最先进的模型获得的更多基线结果,并为街道级全景图像中的细粒度建筑实例细分介绍了一项新颖的任务。此外,Omnicity为现有任务提供了新的问题设置,例如跨视图匹配,合成,分割,检测等,并促进开发新方法,以了解大规模的城市理解,重建和仿真。 Omnicity数据集以及基准将在https://city-super.github.io/omnicity上找到。
translated by 谷歌翻译
本文通过解决面具可逆性问题来研究建筑物多边形映射的问题,该问题导致了基于学习的方法的预测蒙版和多边形之间的显着性能差距。我们通过利用分层监督(底部级顶点,中层线段和高级区域口罩)来解决此问题,并提出了一种新颖用于建筑物多边形映射的面具。结果,我们表明,学识渊博的可逆建筑面具占据了深度卷积神经网络的所有优点,用于建筑物的高绩效多边形映射。在实验中,我们评估了对Aicrowd和Inria的两个公共基准的方法。在Aicrowd数据集上,我们提出的方法对AP,APBOUNDARY和POLIS的指标获得了一致改进。对于Inria数据集,我们提出的方法还获得了IOU和准确性指标的竞争结果。型号和源代码可在https://github.com/sarahwxu上获得。
translated by 谷歌翻译
我们提出了Urbanscene3D,这是一个大规模的数据平台,用于研究城市场景感知和重建。 Urbanscene3D包含超过128K的高分辨率图像,其中涵盖了16个场景,包括大规模的真实城市区域和合成城市,总共有136 km^2区域。该数据集还包含具有不同观察模式的高精度激光扫描和数百个图像集,它们为设计和评估空中路径计划和3D重建算法提供了全面的基准。此外,该数据集是基于虚幻引擎和AirSim模拟器构建的数据集以及数据集中每个建筑物的手动注释的唯一实例标签,启用了各种数据的生成,例如2D/3D边界框, ,以及3D点云/网状分段等。具有物理发动机和照明系统的模拟器不仅产生各种数据,而且还使用户能够在拟议的城市环境中模拟汽车或无人机以进行未来的研究。
translated by 谷歌翻译
We introduce Similarity Group Proposal Network (SGPN), a simple and intuitive deep learning framework for 3D object instance segmentation on point clouds. SGPN uses a single network to predict point grouping proposals and a corresponding semantic class for each proposal, from which we can directly extract instance segmentation results. Important to the effectiveness of SGPN is its novel representation of 3D instance segmentation results in the form of a similarity matrix that indicates the similarity between each pair of points in embedded feature space, thus producing an accurate grouping proposal for each point. Experimental results on various 3D scenes show the effectiveness of our method on 3D instance segmentation, and we also evaluate the capability of SGPN to improve 3D object detection and semantic segmentation results. We also demonstrate its flexibility by seamlessly incorporating 2D CNN features into the framework to boost performance.
translated by 谷歌翻译
车辆分类是一台热电电脑视觉主题,研究从地面查看到顶视图。在遥感中,顶视图的使用允许了解城市模式,车辆集中,交通管理等。但是,在瞄准像素方面的分类时存在一些困难:(a)大多数车辆分类研究使用对象检测方法,并且最公开的数据集设计用于此任务,(b)创建实例分段数据集是费力的,并且(C )传统的实例分段方法由于对象很小,因此在此任务上执行此任务。因此,本研究目标是:(1)提出使用GIS软件的新型半监督迭代学习方法,(2)提出一种自由盒实例分割方法,(3)提供城市规模的车辆数据集。考虑的迭代学习程序:(1)标记少数车辆,(2)在这些样本上列车,(3)使用模型对整个图像进行分类,(4)将图像预测转换为多边形shapefile,(5 )纠正有错误的一些区域,并将其包含在培训数据中,(6)重复,直到结果令人满意。为了单独的情况,我们考虑了车辆内部和车辆边界,DL模型是U-Net,具有高效网络B7骨架。当移除边框时,车辆内部变为隔离,允许唯一的对象识别。要恢复已删除的1像素边框,我们提出了一种扩展每个预测的简单方法。结果显示与掩模-RCNN(IOU中67%的82%)相比的更好的像素 - 明智的指标。关于每个对象分析,整体准确性,精度和召回大于90%。该管道适用于任何遥感目标,对分段和生成数据集非常有效。
translated by 谷歌翻译
We introduce a novel deep learning-based framework to interpret 3D urban scenes represented as textured meshes. Based on the observation that object boundaries typically align with the boundaries of planar regions, our framework achieves semantic segmentation in two steps: planarity-sensible over-segmentation followed by semantic classification. The over-segmentation step generates an initial set of mesh segments that capture the planar and non-planar regions of urban scenes. In the subsequent classification step, we construct a graph that encodes the geometric and photometric features of the segments in its nodes and the multi-scale contextual features in its edges. The final semantic segmentation is obtained by classifying the segments using a graph convolutional network. Experiments and comparisons on two semantic urban mesh benchmarks demonstrate that our approach outperforms the state-of-the-art methods in terms of boundary quality, mean IoU (intersection over union), and generalization ability. We also introduce several new metrics for evaluating mesh over-segmentation methods dedicated to semantic segmentation, and our proposed over-segmentation approach outperforms state-of-the-art methods on all metrics. Our source code is available at \url{https://github.com/WeixiaoGao/PSSNet}.
translated by 谷歌翻译
随着深度卷积神经网络的兴起,对象检测在过去几年中取得了突出的进步。但是,这种繁荣无法掩盖小物体检测(SOD)的不令人满意的情况,这是计算机视觉中臭名昭著的挑战性任务之一,这是由于视觉外观不佳和由小目标的内在结构引起的嘈杂表示。此外,用于基准小对象检测方法基准测试的大规模数据集仍然是瓶颈。在本文中,我们首先对小物体检测进行了详尽的审查。然后,为了催化SOD的发展,我们分别构建了两个大规模的小物体检测数据集(SODA),SODA-D和SODA-A,分别集中在驾驶和空中场景上。 SODA-D包括24704个高质量的交通图像和277596个9个类别的实例。对于苏打水,我们收集2510个高分辨率航空图像,并在9个类别上注释800203实例。众所周知,拟议的数据集是有史以来首次尝试使用针对多类SOD量身定制的大量注释实例进行大规模基准测试。最后,我们评估主流方法在苏打水上的性能。我们预计发布的基准可以促进SOD的发展,并产生该领域的更多突破。数据集和代码将很快在:\ url {https://shaunyuan22.github.io/soda}上。
translated by 谷歌翻译
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
translated by 谷歌翻译
现有的计算机视觉系统可以与人类竞争,以理解物体的可见部分,但在描绘部分被遮挡物体的无形部分时,仍然远远远远没有达到人类。图像Amodal的完成旨在使计算机具有类似人类的Amodal完成功能,以了解完整的对象,尽管该对象被部分遮住。这项调查的主要目的是对图像Amodal完成领域的研究热点,关键技术和未来趋势提供直观的理解。首先,我们对这个新兴领域的最新文献进行了全面的评论,探讨了图像Amodal完成中的三个关键任务,包括Amodal形状完成,Amodal外观完成和订单感知。然后,我们检查了与图像Amodal完成有关的流行数据集及其共同的数据收集方法和评估指标。最后,我们讨论了现实世界中的应用程序和未来的研究方向,以实现图像的完成,从而促进了读者对现有技术和即将到来的研究趋势的挑战的理解。
translated by 谷歌翻译
大多数现有的点云实例和语义分割方法在很大程度上依赖于强大的监督信号,这需要场景中每个点的点级标签。但是,这种强大的监督遭受了巨大的注释成本,引起了研究有效注释的需求。在本文中,我们发现实例的位置对实例和语义3D场景细分都很重要。通过充分利用位置,我们设计了一种弱监督的点云分割算法,该算法仅需要单击每个实例以指示其注释的位置。通过进行预处理过度分割,我们将这些位置注释扩展到seg级标签中。我们通过将未标记的片段分组分组到相关的附近标签段中,进一步设计一个段分组网络(SEGGROUP),以在SEG级标签下生成点级伪标签,以便现有的点级监督的分段模型可以直接消耗这些PSEUDO标签为了训练。实验结果表明,我们的SEG级监督方法(SEGGROUP)通过完全注释的点级监督方法获得了可比的结果。此外,在固定注释预算的情况下,它的表现优于最近弱监督的方法。
translated by 谷歌翻译
地理定位的概念是指确定地球上的某些“实体”的位置的过程,通常使用全球定位系统(GPS)坐标。感兴趣的实体可以是图像,图像序列,视频,卫星图像,甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用,而深入学习已经上升以提高机器学习模型的性能能力,因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用,如增强现实,机器人,自驾驶车辆,道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查,其涉及从捕获图像(图像地理定位)或图像内的地理定位对象(对象地理定位)的地理定位的综合调查。我们将提供深入的研究,包括流行算法的摘要,对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。
translated by 谷歌翻译
我们提出了一种基于动态卷积的3D点云的实例分割方法。这使其能够在推断时适应变化的功能和对象尺度。这样做避免了一些自下而上的方法的陷阱,包括对超参数调整和启发式后处理管道的依赖,以弥补物体大小的不可避免的可变性,即使在单个场景中也是如此。通过收集具有相同语义类别并为几何质心进行仔细投票的均匀点,网络的表示能力大大提高了。然后通过几个简单的卷积层解码实例,其中参数是在输入上生成的。所提出的方法是无建议的,而是利用适应每个实例的空间和语义特征的卷积过程。建立在瓶颈层上的轻重量变压器使模型可以捕获远程依赖性,并具有有限的计算开销。结果是一种简单,高效且健壮的方法,可以在各种数据集上产生强大的性能:ScannETV2,S3DIS和Partnet。基于体素和点的体系结构的一致改进意味着提出的方法的有效性。代码可在以下网址找到:https://git.io/dyco3d
translated by 谷歌翻译
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods. The code is available at: https://github.com/LiWentomng/boxlevelset.
translated by 谷歌翻译
盒子监督的实例分割最近吸引了大量的研究工作,而在空中图像域中则收到很少的关注。与通用物体集合相比,空中对象具有大型内部差异和阶级相似性与复杂的背景。此外,高分辨率卫星图像中存在许多微小的物体。这使得最近的一对亲和力建模方法不可避免地涉及具有劣势的噪声监督。为了解决这些问题,我们提出了一种新颖的空中实例分割方法,该方法驱动网络为空中对象的一系列级别设置功能,只有盒子注释以端到端的方式。具有精心设计的能量函数的级别集方法而不是学习成对亲和力将对象分段视为曲线演进,这能够准确地恢复对象的边界并防止来自无法区分的背景和类似对象的干扰。实验结果表明,所提出的方法优于最先进的盒子监督实例分段方法。源代码可在https://github.com/liwentomng/boxLevelset上获得。
translated by 谷歌翻译
当前的3D分割方法很大程度上依赖于大规模的点状数据集,众所周知,这些数据集众所周知。很少有尝试规避需要每点注释的需求。在这项工作中,我们研究了弱监督的3D语义实例分割。关键的想法是利用3D边界框标签,更容易,更快地注释。确实,我们表明只有仅使用边界框标签训练密集的分割模型。在我们方法的核心上,\ name {}是一个深层模型,灵感来自经典的霍夫投票,直接投票赞成边界框参数,并且是专门针对边界盒票的专门定制的群集方法。这超出了常用的中心票,这不会完全利用边界框注释。在扫描仪测试中,我们弱监督的模型在其他弱监督的方法中获得了领先的性能(+18 MAP@50)。值得注意的是,它还达到了当前完全监督模型的50分数的地图的97%。为了进一步说明我们的工作的实用性,我们在最近发布的Arkitscenes数据集中训练Box2mask,该数据集仅使用3D边界框注释,并首次显示引人注目的3D实例细分掩码。
translated by 谷歌翻译