智能论文笔记

PatchPerPix for Instance Segmentation

Peter Hirsch , Lisa Mais , Dagmar Kainmueller

分类：计算机视觉

2020-01-21

We present a novel method for proposal free instance segmentation that can handle sophisticated object shapes which span large parts of an image and form dense object clusters with crossovers. Our method is based on predicting dense local shape descriptors, which we assemble to form instances. All instances are assembled simultaneously in one go. To our knowledge, our method is the first non-iterative method that yields instances that are composed of learnt shape patches. We evaluate our method on a diverse range of data domains, where it defines the new state of the art on four benchmarks, namely the ISBI 2012 EM segmentation benchmark, the BBBC010 C. elegans dataset, and 2d as well as 3d fluorescence microscopy data of cell nuclei. We show furthermore that our method also applies to 3d light microscopy data of Drosophila neurons, which exhibit extreme cases of complex shape clusters

translated by 谷歌翻译

Sparse Object-level Supervision for Instance Segmentation with Pixel Embeddings

Adrian Wolny , Qin Yu , Constantin Pape , Anna Kreshuk

分类：计算机视觉 | 机器学习

2021-03-26

必须在密集的注释图像上培训最先进的实例分段方法。虽然一般而言，这一要求对于生物医学图像尤其令人生畏，其中域专业知识通常需要注释，没有大的公共数据收集可用于预培训。我们建议通过基于非空间嵌入的非空间嵌入的联盟分割方法来解决密集的注释瓶颈，该方法利用所学习的嵌入空间的结构以可分散的方式提取单个实例。然后可以将分割损耗直接应用于实例，整体管道可以以完全或弱监督的方式培训，包括积极解贴的监管的具有挑战性的情况，其中为未标记的部分引入了一种新的自我监督的一致性损失训练数据。我们在不同显微镜模型以及城市景观和CVPPP实例分段基准中评估了对2D和3D分段问题的提出的方法，在后者上实现最先进的结果。该代码可用于：https://github.com/kreshuklab/spoco

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

Dynamic Convolution for 3D Point Cloud Instance Segmentation

Tong He , Chunhua Shen , Anton van den Hengel

分类：计算机视觉

2021-07-18

我们提出了一种基于动态卷积的3D点云的实例分割方法。这使其能够在推断时适应变化的功能和对象尺度。这样做避免了一些自下而上的方法的陷阱，包括对超参数调整和启发式后处理管道的依赖，以弥补物体大小的不可避免的可变性，即使在单个场景中也是如此。通过收集具有相同语义类别并为几何质心进行仔细投票的均匀点，网络的表示能力大大提高了。然后通过几个简单的卷积层解码实例，其中参数是在输入上生成的。所提出的方法是无建议的，而是利用适应每个实例的空间和语义特征的卷积过程。建立在瓶颈层上的轻重量变压器使模型可以捕获远程依赖性，并具有有限的计算开销。结果是一种简单，高效且健壮的方法，可以在各种数据集上产生强大的性能：ScannETV2，S3DIS和Partnet。基于体素和点的体系结构的一致改进意味着提出的方法的有效性。代码可在以下网址找到：https：//git.io/dyco3d

translated by 谷歌翻译

BuyTheDips: PathLoss for improved topology-preserving deep learning-based image segmentation

Minh On Vu Ngoc , Yizi Chen , Nicolas Boutry , Jonathan Fabrizio , Clement Mallet

分类：计算机视觉

2022-07-23

捕获图像的全局拓扑对于提出对其域的准确分割至关重要。但是，大多数现有的分割方法都不能保留给定输入的初始拓扑，这对许多下游基于对象的任务有害。对于大多数在本地尺度上工作的深度学习模型来说，这是更真实的。在本文中，我们提出了一种新的拓扑深度图像分割方法，该方法依赖于新的泄漏损失：Pathloss。我们的方法是Baloss [1]的扩展，其中我们希望改进泄漏检测，以更好地恢复图像分割的接近度。这种损失使我们能够正确定位并修复预测中可能发生的关键点（边界中的泄漏），并基于最短路径搜索算法。这样，损失最小化仅在必要时才能强制连接，并最终提供了图像中对象边界的良好定位。此外，根据我们的研究，与无需使用拓扑损失的方法相比，我们的Pathloss学会了保持更强的细长结构。通过我们的拓扑损失函数培训，我们的方法在两个代表性数据集上优于最先进的拓扑感知方法：电子显微镜和历史图。

translated by 谷歌翻译

SOLO: Segmenting Objects by Locations

Xinlong Wang , Tao Kong , Chunhua Shen , Yuning Jiang , Lei Li

分类：

2019-12-10

We present a new, embarrassingly simple approach to instance segmentation. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the "detect-then-segment" strategy (e.g., Mask R-CNN), or predict embedding vectors first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location and size, thus nicely converting instance segmentation into a single-shot classification-solvable problem. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent single-shot instance segmenters in accuracy. We hope that this simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation. Code is available at https://git.io/AdelaiDet

translated by 谷歌翻译

Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes

Julian Chibane , Francis Engelmann , Tuan Anh Tran , Gerard Pons-Moll

分类：计算机视觉

2022-06-02

当前的3D分割方法很大程度上依赖于大规模的点状数据集，众所周知，这些数据集众所周知。很少有尝试规避需要每点注释的需求。在这项工作中，我们研究了弱监督的3D语义实例分割。关键的想法是利用3D边界框标签，更容易，更快地注释。确实，我们表明只有仅使用边界框标签训练密集的分割模型。在我们方法的核心上，\ name {}是一个深层模型，灵感来自经典的霍夫投票，直接投票赞成边界框参数，并且是专门针对边界盒票的专门定制的群集方法。这超出了常用的中心票，这不会完全利用边界框注释。在扫描仪测试中，我们弱监督的模型在其他弱监督的方法中获得了领先的性能（+18 MAP@50）。值得注意的是，它还达到了当前完全监督模型的50分数的地图的97％。为了进一步说明我们的工作的实用性，我们在最近发布的Arkitscenes数据集中训练Box2mask，该数据集仅使用3D边界框注释，并首次显示引人注目的3D实例细分掩码。

translated by 谷歌翻译

Proceedings of the 3rd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

translated by 谷歌翻译

Image Segmentation Using Deep Learning: A Survey

Shervin Minaee , Yuri Boykov , Fatih Porikli , Antonio Plaza , Nasser Kehtarnavaz , Demetri Terzopoulos

分类：

2020-01-15

Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

translated by 谷歌翻译

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts , Mohamed Omran , Sebastian Ramos , Timo Rehfeld , Markus Enzweiler , Rodrigo Benenson , Uwe Franke , Stefan Roth , Bernt Schiele

分类：

2016-04-06

TU Dresden www.cityscapes-dataset.net train/val -fine annotation -3475 images train -coarse annotation -20 000 images test -fine annotation -1525 images

translated by 谷歌翻译

Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks

Vivien Sainte Fare Garnot , Loic Landrieu

分类：计算机视觉

2021-07-16

前所未有的访问多时间卫星图像，为各种地球观察任务开辟了新的视角。其中，农业包裹的像素精确的Panoptic分割具有重大的经济和环境影响。虽然研究人员对单张图像进行了探索了这个问题，但我们争辩说，随着图像的时间序列更好地寻址作物候选的复杂时间模式。在本文中，我们介绍了卫星图像时间序列（坐着）的Panoptic分割的第一端到端，单级方法（坐姿）。该模块可以与我们的新型图像序列编码网络相结合，依赖于时间自我关注，以提取丰富和自适应的多尺度时空特征。我们还介绍了Pastis，第一个开放式访问坐在Panoptic注释的数据集。我们展示了对多个竞争架构的语义细分的编码器的优越性，并建立了坐在的第一封Panoptic细分状态。我们的实施和痛苦是公开的。

translated by 谷歌翻译

Common Limitations of Image Processing Metrics: A Picture Story

Annika Reinke , Minu D. Tizabi , Carole H. Sudre , Matthias Eisenmann , Tim Rädsch , Michael Baumgartner , Laura Acion , Michela Antonelli , Tal Arbel , Spyridon Bakas

分类：计算机视觉

2021-04-12

尽管自动图像分析的重要性不断增加，但最近的元研究揭示了有关算法验证的主要缺陷。性能指标对于使用的自动算法的有意义，客观和透明的性能评估和验证尤其是关键，但是在使用特定的指标进行给定的图像分析任务时，对实际陷阱的关注相对较少。这些通常与（1）无视固有的度量属性，例如在存在类不平衡或小目标结构的情况下的行为，（2）无视固有的数据集属性，例如测试的非独立性案例和（3）无视指标应反映的实际生物医学领域的兴趣。该动态文档的目的是说明图像分析领域通常应用的性能指标的重要局限性。在这种情况下，它重点介绍了可以用作图像级分类，语义分割，实例分割或对象检测任务的生物医学图像分析问题。当前版本是基于由全球60多家机构的国际图像分析专家进行的关于指标的Delphi流程。

translated by 谷歌翻译

SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask Representations

Tao Wang , Jun Hao Liew , Yu Li , Yunpeng Chen , Jiashi Feng

分类：计算机视觉

2022-02-15

Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks, yielding comparably good performance as traditional two-stage Mask R-CNN yet enjoying much simpler architecture and higher efficiency. We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part, most of which are however directly discarded by non-maximum-suppression. Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information while maintaining the architectural efficiency. The resulting model is named SODAR. Unlike the original per grid cell object masks, SODAR is implicitly supervised to learn mask representations that encode geometric structure of nearby objects and complement adjacent representations with context. The aggregation method further includes two novel designs: 1) a mask interpolation mechanism that enables the model to generate much fewer mask representations by sharing neighboring representations among nearby grid cells, and thus saves computation and memory; 2) a deformable neighbour sampling mechanism that allows the model to adaptively adjust neighbor sampling locations thus gathering mask representations with more relevant context and achieving higher performance. SODAR significantly improves the instance segmentation performance, e.g., it outperforms a SOLO model with ResNet-101 backbone by 2.2 AP on COCO \texttt{test} set, with only about 3\% additional computation. We further show consistent performance gain with the SOLOv2 model.

translated by 谷歌翻译

NucMM Dataset: 3D Neuronal Nuclei Instance Segmentation at Sub-Cubic Millimeter Scale

Zudi Lin , Donglai Wei , Mariela D. Petkova , Yuelong Wu , Zergham Ahmed , Krishna Swaroop K , Silin Zou , Nils Wendt , Jonathan Boulanger-Weill , Xueying Wang

分类：计算机视觉

2021-07-13

从显微镜图像体积分段3D细胞核对于生物学和临床分析至关重要，从而实现了细胞表达模式和细胞谱系的研究。然而，神经元核的当前数据集通常包含小于$ 10 ^ {\ text {-} 3} \ mm ^ 3 $的卷，每卷少于500美元，无法揭示大脑区域的复杂性并限制神经元的调查结构。在本文中，我们推动了向子立方毫米秤的任务向前推进了，并用两个完全注释的卷策划了NUCMM数据集：1美元\ mm ^ $电子显微镜（EM）含有几乎整个斑马鱼大脑，大约170,000左右核;还有0.25美元\ mm ^ 3 $ micro-ct（uct）卷，其中鼠标视觉皮层的一部分，大约7,000个核。具有两种成像模态，体积大小和实例数量显着增加，我们在外观和密度中发现了神经元核的大量多样性，对该领域引入了新的挑战。我们还进行统计分析以定量地说明这些挑战。为了解决挑战，我们提出了一种新颖的混合表示学习模型，该模型结合了前景掩模，轮廓图和签名距离变换来生产高质量的3D面罩。 NUCMM数据集上的基准比较表明，我们所提出的方法显着优于最先进的核细胞分割方法。代码和数据可在https://connectomics-bazaar.github.io/proj/nucmm/index.html中获得。

translated by 谷歌翻译

Learning to Detect Every Thing in an Open World

Kuniaki Saito , Ping Hu , Trevor Darrell , Kate Saenko

分类：计算机视觉

2021-12-03

许多开放世界应用程序需要检测新的对象，但最先进的对象检测和实例分段网络在此任务中不屈服。关键问题在于他们假设没有任何注释的地区应被抑制为否定，这教导了将未经讨犯的对象视为背景的模型。为了解决这个问题，我们提出了一个简单但令人惊讶的强大的数据增强和培训方案，我们呼唤学习来检测每件事（LDET）。为避免抑制隐藏的对象，背景对象可见但未标记，我们粘贴在从原始图像的小区域采样的背景图像上粘贴带有的注释对象。由于仅对这种综合增强的图像培训遭受域名，我们将培训与培训分为两部分：1）培训区域分类和回归头在增强图像上，2）在原始图像上训练掩模头。通过这种方式，模型不学习将隐藏对象作为背景分类，同时概括到真实图像。 LDET导致开放式世界实例分割任务中的许多数据集的重大改进，表现出CoCo上的交叉类别概括的基线，以及对UVO和城市的交叉数据集评估。

translated by 谷歌翻译

A Survey of Self-Supervised and Few-Shot Object Detection

Gabriel Huang , Issam Laradji , David Vazquez , Simon Lacoste-Julien , Pau Rodriguez

分类：计算机视觉 | 人工智能 | 机器学习

2021-10-27

标记数据通常昂贵且耗时，特别是对于诸如对象检测和实例分割之类的任务，这需要对图像的密集标签进行密集的标签。虽然几张拍摄对象检测是关于培训小说中的模型（看不见的）对象类具有很少的数据，但它仍然需要在许多标记的基础（见）类的课程上进行训练。另一方面，自我监督的方法旨在从未标记数据学习的学习表示，该数据转移到诸如物体检测的下游任务。结合几次射击和自我监督的物体检测是一个有前途的研究方向。在本调查中，我们审查并表征了几次射击和自我监督对象检测的最新方法。然后，我们给我们的主要外卖，并讨论未来的研究方向。https://gabrielhuang.github.io/fsod-survey/的项目页面

translated by 谷歌翻译

SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation

Weiyue Wang , Ronald Yu , Qiangui Huang , Ulrich Neumann

分类：

2017-11-23

We introduce Similarity Group Proposal Network (SGPN), a simple and intuitive deep learning framework for 3D object instance segmentation on point clouds. SGPN uses a single network to predict point grouping proposals and a corresponding semantic class for each proposal, from which we can directly extract instance segmentation results. Important to the effectiveness of SGPN is its novel representation of 3D instance segmentation results in the form of a similarity matrix that indicates the similarity between each pair of points in embedded feature space, thus producing an accurate grouping proposal for each point. Experimental results on various 3D scenes show the effectiveness of our method on 3D instance segmentation, and we also evaluate the capability of SGPN to improve 3D object detection and semantic segmentation results. We also demonstrate its flexibility by seamlessly incorporating 2D CNN features into the framework to boost performance.

translated by 谷歌翻译

Disentangling Monocular 3D Object Detection

Andrea Simonelli , Samuel Rota Rota Bulò , Lorenzo Porzi , Manuel López-Antequera , Peter Kontschieder

分类：

2019-05-29

Figure 1: Results obtained from our single image, monocular 3D object detection network MonoDIS on a KITTI3D test image with corresponding birds-eye view, showing its ability to estimate size and orientation of objects at different scales.

translated by 谷歌翻译

Robust deep learning-based semantic organ segmentation in hyperspectral images

Silvia Seidlitz , Jan Sellner , Jan Odenthal , Berkin Özdemir , Alexander Studier-Fischer , Samuel Knödler , Leonardo Ayala , Tim Adler , Hannes G. Kenngott , Minu Tizabi

分类：计算机视觉 | 机器学习

2021-11-09

语义图像分割是手术中的背景知识和自治机器人的重要前提。本领域的状态专注于在微创手术期间获得的传统RGB视频数据，但基于光谱成像数据的全景语义分割并在开放手术期间获得几乎没有注意到日期。为了解决文献中的这种差距，我们正在研究基于在开放手术环境中获得的猪的高光谱成像（HSI）数据的以下研究问题：（1）基于神经网络的HSI数据的充分表示是完全自动化的器官分割，尤其是关于数据的空间粒度（像素与Superpixels与Patches与完整图像）的空间粒度？（2）在执行语义器官分割时，是否有利用HSI数据使用HSI数据，即RGB数据和处理的HSI数据（例如氧合等组织参数）？根据基于20猪的506个HSI图像的全面验证研究，共注释了19个类，基于深度的学习的分割性能 - 贯穿模态 - 与输入数据的空间上下文一致。未处理的HSI数据提供优于RGB数据或来自摄像机提供商的处理数据，其中优势随着输入到神经网络的输入的尺寸而增加。最大性能（应用于整个图像的HSI）产生了0.89（标准偏差（SD）0.04）的平均骰子相似度系数（DSC），其在帧间间变异性（DSC为0.89（SD 0.07）的范围内。我们得出结论，HSI可以成为全自动手术场景理解的强大的图像模型，其具有传统成像的许多优点，包括恢复额外功能组织信息的能力。

translated by 谷歌翻译

NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds

Weiwei Sun , Daniel Rebain , Renjie Liao , Vladimir Tankovich , Soroosh Yazdani , Kwang Moo Yi , Andrea Tagliasacchi

分类：计算机视觉

2022-07-20

我们介绍了一种方法，例如针对3D点云的提案生成。现有技术通常直接在单个进料前进的步骤中回归建议，从而导致估计不准确。我们表明，这是一个关键的瓶颈，并提出了一种基于迭代双边滤波的方法。遵循双边滤波的精神，我们考虑了每个点的深度嵌入以及它们在3D空间中的位置。我们通过合成实验表明，在为给定的兴趣点生成实例建议时，我们的方法会带来巨大的改进。我们进一步验证了我们在挑战性扫描基准测试中的方法，从而在自上而下的方法的子类别中实现了最佳实例分割性能。

translated by 谷歌翻译