智能论文笔记

Validation of Vector Data using Oblique Images

Pragyana Mishra , Eyal Ofek , Gur Kimchi

分类：计算机视觉 | 机器学习

2022-06-17

倾斜的图像是与地球表面的倾斜角度拍摄的航拍照片。这些图像中向量和其他地理空间数据的投影取决于摄像机参数，地理空间实体的位置，表面地形，遮挡和可见性。本文提出了一种可靠且可扩展的算法，以使用斜图像检测矢量数据的不一致。该算法使用图像描述符来编码图像中地理空间实体的局部外观。这些图像描述符结合了颜色，像素强度梯度，纹理和可检测的滤镜响应。对向量机分类器进行了训练，以检测与基础矢量数据，数字高程图，建筑模型和摄像头参数不一致的图像描述符。在本文中，我们在可见的路段和非道路数据上训练分类器。此后，训练有素的分类器检测到矢量的不一致，其中包括封闭和未对准的道路细分市场。一致的道路段验证了我们的向量，DEM和3-D模型数据的这些区域，而段不一致指出了错误。我们进一步表明，搜索与未对齐道路不一致的可见路段一致的描述符会产生与图像中像素一致的所需道路对齐。

translated by 谷歌翻译

ResDepth: A Deep Residual Prior For 3D Reconstruction From High-resolution Satellite Images

Corinne Stucker , Konrad Schindler

分类：计算机视觉

2021-06-15

现代光学卫星传感器使高分辨率立体声重建。但是在观察地球从空间推动立体声匹配时挑战成像条件。在实践中，由此产生的数字表面模型（DSM）相当嘈杂，并且通常不会达到3D城市建模等高分辨率应用所需的准确性。可以说，基于低电平图像相似性的立体声对应不足，并且应该互补关于超出基本局部平滑度的预期表面几何的先验知识。为此，我们介绍了Resptepth，这是一个卷积神经网络，其在示例数据之前学习如此表达几何。 Restepth在调节图像上的细化时改进初始原始的立体声DSM。即，它充当了一个智能，学习的后处理过滤器，可以无缝地补充任何立体声匹配管道。在一系列实验中，我们发现所提出的方法始终如一地改善了定量和定性的立体声DSM。我们表明，网络权重中的先前编码捕获了城市设计的有意义的几何特征，这也概括了不同地区，甚至从一个城市到另一个城市。此外，我们证明，通过对各种立体对的训练，RESPTH可以在成像条件和采集几何体中获得足够的不变性。

translated by 谷歌翻译

Skin feature point tracking using deep feature encodings

Jose Ramon Chang , Torbjörn E. M. Nordling

分类：计算机视觉 | 人工智能 | 机器学习

2021-12-28

面部特征跟踪是成像跳芭式（BCG）的关键组成部分，其中需要精确定量面部关键点的位移，以获得良好的心率估计。皮肤特征跟踪能够在帕金森病中基于视频的电机降解量化。传统的计算机视觉算法包括刻度不变特征变换（SIFT），加速强大的功能（冲浪）和LUCAS-KANADE方法（LK）。这些长期代表了最先进的效率和准确性，但是当存在常见的变形时，如图所示，如图所示，如此。在过去的五年中，深度卷积神经网络对大多数计算机视觉任务的传统方法表现优于传统的传统方法。我们提出了一种用于特征跟踪的管道，其应用卷积堆积的AutoEncoder，以将图像中最相似的裁剪标识到包含感兴趣的特征的参考裁剪。 AutoEncoder学会将图像作物代表到特定于对象类别的深度特征编码。我们在面部图像上培训AutoEncoder，并验证其在手动标记的脸部和手视频中通常验证其跟踪皮肤功能的能力。独特的皮肤特征（痣）的跟踪误差是如此之小，因为我们不能排除他们基于$ \ chi ^ 2 $ -test的手动标签。对于0.6-4.2像素的平均误差，我们的方法在所有情况下都表现出了其他方法。更重要的是，我们的方法是唯一一个不分歧的方法。我们得出的结论是，我们的方法为特征跟踪，特征匹配和图像配准比传统算法创建更好的特征描述符。

translated by 谷歌翻译

Visual and Object Geo-localization: A Comprehensive Survey

Daniel Wilson , Xiaohan Zhang , Waqas Sultani , Safwan Wshah

分类：计算机视觉

2021-12-30

地理定位的概念是指确定地球上的某些“实体”的位置的过程，通常使用全球定位系统（GPS）坐标。感兴趣的实体可以是图像，图像序列，视频，卫星图像，甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用，而深入学习已经上升以提高机器学习模型的性能能力，因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用，如增强现实，机器人，自驾驶车辆，道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查，其涉及从捕获图像（图像地理定位）或图像内的地理定位对象（对象地理定位）的地理定位的综合调查。我们将提供深入的研究，包括流行算法的摘要，对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。

translated by 谷歌翻译

CBHE: Corner-based Building Height Estimation for Complex Street Scene Images

Yunxiang Zhao , Jianzhong Qi , Rui Zhang

分类：计算机视觉 | 人工智能

2019-04-25

建筑高度估计在许多应用中都很重要，如3D城市重建，城市规划和导航。最近，提出了一种新的建筑物高度估计方法，使用街道场景图像和2D地图。该方法比使用昂贵的高分辨率光学数据，LIDAR数据或雷达数据来获得的传统方法更具可扩展。该方法需要通过针孔相机模型来检测建筑屋顶线，然后计算建筑物高度。我们观察到这种方法在处理复杂的街道场景图像中具有局限性，其中建筑物彼此重叠并且屋顶线难以定位。我们提出CBHE，考虑到建筑角落和屋顶线的建筑高度估计算法。 CBHE首先根据来自2D地图和相机参数的建筑占地面积获得街道场景图像中的建筑角和屋顶候选。然后，我们使用一个名为BuildionNet的深神经网络来分类和过滤角落和屋顶候选。基于来自建筑物的有效角落和屋顶线，CBHE通过针孔相机模型计算建筑物高度。实验结果表明，与最先进的开放式分类器相比，该建议的建筑物对建筑角和屋顶候选滤波的准确性提高了。同时，CBHE以建筑物高度估计精度超过10％以上的基线算法。

translated by 谷歌翻译

Seafloor-Invariant Caustics Removal from Underwater Imagery

Panagiotis Agrafiotis , Konstantinos Karantzalos , Andreas Georgopoulos

分类：计算机视觉

2022-12-20

Mapping the seafloor with underwater imaging cameras is of significant importance for various applications including marine engineering, geology, geomorphology, archaeology and biology. For shallow waters, among the underwater imaging challenges, caustics i.e., the complex physical phenomena resulting from the projection of light rays being refracted by the wavy surface, is likely the most crucial one. Caustics is the main factor during underwater imaging campaigns that massively degrade image quality and affect severely any 2D mosaicking or 3D reconstruction of the seabed. In this work, we propose a novel method for correcting the radiometric effects of caustics on shallow underwater imagery. Contrary to the state-of-the-art, the developed method can handle seabed and riverbed of any anaglyph, correcting the images using real pixel information, thus, improving image matching and 3D reconstruction processes. In particular, the developed method employs deep learning architectures in order to classify image pixels to "non-caustics" and "caustics". Then, exploits the 3D geometry of the scene to achieve a pixel-wise correction, by transferring appropriate color values between the overlapping underwater images. Moreover, to fill the current gap, we have collected, annotated and structured a real-world caustic dataset, namely R-CAUSTIC, which is openly available. Overall, based on the experimental results and validation the developed methodology is quite promising in both detecting caustics and reconstructing their intensity.

translated by 谷歌翻译

AstroVision: Towards Autonomous Feature Detection and Description for Missions to Small Bodies Using Deep Learning

Travis Driver , Katherine Skinner , Mehregan Dor , Panagiotis Tsiotras

分类：计算机视觉 | 机器人

2022-08-03

小天体的任务在很大程度上依赖于光学特征跟踪，以表征和相对导航。尽管深度学习导致了功能检测和描述方面的巨大进步，但由于大规模，带注释的数据集的可用性有限，因此培训和验证了空间应用程序的数据驱动模型具有挑战性。本文介绍了Astrovision，这是一个大规模数据集，由115,970个密集注释的，真实的图像组成，这些图像是过去和正在进行的任务中捕获的16个不同物体的真实图像。我们利用Astrovision开发一组标准化基准，并对手工和数据驱动的功能检测和描述方法进行详尽的评估。接下来，我们采用Astrovision对最先进的，深刻的功能检测和描述网络进行端到端培训，并在多个基准测试中表现出改善的性能。将公开使用完整的基准管道和数据集，以促进用于空间应用程序的计算机视觉算法的发展。

translated by 谷歌翻译

Towards an unsupervised large-scale 2D and 3D building mapping with airborne LiDAR data

Hunsoo Song , Jinha Jung

分类：计算机视觉

2022-05-29

2D和3D建筑图提供了宝贵的信息，以了解人类活动及其对地球及其环境的影响。尽管为提高建筑地图的质量而做出了巨大努力，但自动化方法产生的当前大规模建筑地图仍存在许多错误和不确定性，并且通常仅限于提供2D建筑信息。这项研究提出了一种开源无监督的2D和3D建筑物提取算法，并带有适用于大型建筑物映射的机载LIDAR数据。我们的算法以完全无监督的方式运行，不需要任何培训标签或培训程序。我们的算法由形态过滤和基于平面的过滤组成。因此，计算是有效的，结果易于预测，这可以大大减少所得建筑图中的不确定性。丹佛和纽约市的大规模数据集（> 550 $ km^2 $）的定量和定性评估表明，我们的算法比通过基于深度学习的方法生成的Microsoft Building Footprints可以产生更准确的建筑图。在不同条件下进行的广泛评估证实，我们的算法是可扩展的，可以通过适当的参数选择进一步改进。我们还详细介绍了参数和潜在错误来源的影响，以帮助我们算法的潜在用户。我们的基于激光雷达的算法具有优势，即生成2D和3D构建图在计算上有效，而它产生了准确且可解释的结果。我们提出的算法为带有机载激光雷达数据的全球尺度2D和3D建筑物映射提供了巨大的潜力。

translated by 谷歌翻译

Remote Sensing Image Scene Classification: Benchmark and State of the Art

Gong Cheng , Junwei Han , Xiaoqiang Lu

分类：

2017-03-01

This paper reviews the recent progress of remote sensing image scene classification, proposes a large-scale benchmark dataset, and evaluates a number of state-of-the-art methods using the proposed dataset.

translated by 谷歌翻译

Object recognition from local scale-invariant features

分类：

An object recognition system has been developed that uses a new class of local image features. The features are invariantto image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest-neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low-residual least-squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially-occluded images with a computation time of under 2 seconds.

translated by 谷歌翻译

Image Feature Information Extraction for Interest Point Detection: A Review

Junfeng Jing , Tian Gao , Weichuan Zhang , Yongsheng Gao , Changming Sun

分类：计算机视觉

2021-06-15

兴趣点检测是计算机视觉和图像处理中最根本，最关键的问题之一。在本文中，我们对图像特征信息（IFI）提取技术进行了全面综述，以进行利益点检测。为了系统地介绍现有的兴趣点检测方法如何从输入图像中提取IFI，我们提出了IFI提取技术的分类学检测。根据该分类法，我们讨论了不同类型的IFI提取技术以进行兴趣点检测。此外，我们确定了与现有的IFI提取技术有关的主要未解决的问题，以及以前尚未讨论过的任何兴趣点检测方法。提供了现有的流行数据集和评估标准，并评估和讨论了18种最先进方法的性能。此外，还详细阐述了有关IFI提取技术的未来研究方向。

translated by 谷歌翻译

The pascal visual object classes (voc) challenge

分类：

The PASCAL Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection.This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

translated by 谷歌翻译

LSVL: Large-scale season-invariant visual localization for UAVs

Jouko Kinnari , Riccardo Renzulli , Francesco Verdoja , Ville Kyrki

分类：机器人

2022-12-07

Localization of autonomous unmanned aerial vehicles (UAVs) relies heavily on Global Navigation Satellite Systems (GNSS), which are susceptible to interference. Especially in security applications, robust localization algorithms independent of GNSS are needed to provide dependable operations of autonomous UAVs also in interfered conditions. Typical non-GNSS visual localization approaches rely on known starting pose, work only on a small-sized map, or require known flight paths before a mission starts. We consider the problem of localization with no information on initial pose or planned flight path. We propose a solution for global visual localization on a map at scale up to 100 km2, based on matching orthoprojected UAV images to satellite imagery using learned season-invariant descriptors. We show that the method is able to determine heading, latitude and longitude of the UAV at 12.6-18.7 m lateral translation error in as few as 23.2-44.4 updates from an uninformed initialization, also in situations of significant seasonal appearance difference (winter-summer) between the UAV image and the map. We evaluate the characteristics of multiple neural network architectures for generating the descriptors, and likelihood estimation methods that are able to provide fast convergence and low localization error. We also evaluate the operation of the algorithm using real UAV data and evaluate running time on a real-time embedded platform. We believe this is the first work that is able to recover the pose of an UAV at this scale and rate of convergence, while allowing significant seasonal difference between camera observations and map.

translated by 谷歌翻译

Season-invariant GNSS-denied visual localization for UAVs

Jouko Kinnari , Francesco Verdoja , Ville Kyrki

分类：机器人

2021-10-05

没有全球导航卫星系统（GNSS）的本地化是无人驾驶汽车（UAVS）自动操作中的关键功能。在已知地图上基于视觉的本地化可以是一个有效的解决方案，但是它受到两个主要问题的负担：根据天气和季节的不同，位置的外观不同，以及无人机相机图像和地图之间的透视差异使匹配变得难以匹配。在这项工作中，我们提出了一种本地化解决方案，该解决方案依靠无人机相机图像匹配，以与训练有素的卷积神经网络模型进行地理参与的正射击图，该模型与相机图像和地图之间的季节性外观差异（冬季夏季）不变。我们将解决方案的收敛速度和本地化精度与六种参考方法进行比较。结果表明，参考方法的重大改善，尤其是在较高的季节性变化下。我们最终证明了该方法成功本地无人机的能力，表明所提出的方法对透视变化是可靠的。

translated by 谷歌翻译

Vision-Based Environmental Perception for Autonomous Driving

Fei Liu , Zihao Lu , Xianke Lin

分类：计算机视觉

2022-12-22

Visual perception plays an important role in autonomous driving. One of the primary tasks is object detection and identification. Since the vision sensor is rich in color and texture information, it can quickly and accurately identify various road information. The commonly used technique is based on extracting and calculating various features of the image. The recent development of deep learning-based method has better reliability and processing speed and has a greater advantage in recognizing complex elements. For depth estimation, vision sensor is also used for ranging due to their small size and low cost. Monocular camera uses image data from a single viewpoint as input to estimate object depth. In contrast, stereo vision is based on parallax and matching feature points of different views, and the application of deep learning also further improves the accuracy. In addition, Simultaneous Location and Mapping (SLAM) can establish a model of the road environment, thus helping the vehicle perceive the surrounding environment and complete the tasks. In this paper, we introduce and compare various methods of object detection and identification, then explain the development of depth estimation and compare various methods based on monocular, stereo, and RDBG sensors, next review and compare various methods of SLAM, and finally summarize the current problems and present the future development trends of vision technologies.

translated by 谷歌翻译

Near-field Perception for Low-Speed Vehicle Automation using Surround-view Fisheye Cameras

Ciaran Eising , Jonathan Horgan , Senthil Yogamani

分类：计算机视觉 | 机器人

2021-03-31

摄像机是自动化驱动系统中的主要传感器。它们提供高信息密度，并对检测为人类视野提供的道路基础设施线索最优。环绕式摄像机系统通常包括具有190 {\ DEG} +视野的四个鱼眼相机，覆盖在车辆周围的整个360 {\ DEG}集中在近场传感上。它们是低速，高精度和近距离传感应用的主要传感器，如自动停车，交通堵塞援助和低速应急制动。在这项工作中，我们提供了对这种视觉系统的详细调查，在可以分解为四个模块化组件的架构中，设置调查即可识别，重建，重建和重组。我们共同称之为4R架构。我们讨论每个组件如何完成特定方面，并提供一个位置论证，即它们可以协同组织以形成用于低速自动化的完整感知系统。我们通过呈现来自以前的作品的结果，并通过向此类系统提出架构提案来支持此参数。定性结果在视频中呈现在HTTPS://youtu.be/ae8bcof7777uy中。

translated by 谷歌翻译

Mapping industrial poultry operations at scale with deep learning and aerial imagery

Caleb Robinson , Ben Chugg , Brandon Anderson , Juan M. Lavista Ferres , Daniel E. Ho

分类：计算机视觉 | 机器学习

2021-12-21

集中的动物饲养业务（CAFOS）对空气，水和公共卫生构成严重风险，但已被证明挑战规范。美国政府问责办公室注意到基本挑战是缺乏关于咖啡馆的全面的位置信息。我们使用美国农业部的国家农产病程（Naip）1M / Pixel Acial Imagerery来检测美国大陆的家禽咖啡馆。我们培养卷积神经网络（CNN）模型来识别单个家禽谷仓，并将最佳表现模型应用于超过42 TB的图像，以创建家禽咖啡座的第一个国家开源数据集。我们验证了来自加利福尼亚州的10个手标县的家禽咖啡馆设施的模型预测，并证明这种方法具有填补环境监测中差距的显着潜力。

translated by 谷歌翻译

GlacierNet2: A Hybrid Multi-Model Learning Architecture for Alpine Glacier Mapping

Zhiyuan Xie , Umesh K. Haritashya , Vijayan K. Asari , Michael P. Bishop , Jeffrey S. Kargel , Theus H. Aspiras

分类：机器学习

2022-04-06

近几十年来，气候变化显着影响冰川动态，导致质量损失和冰川相关危害的风险增加，包括冰川上和冰期湖上的湖泊发展以及灾难性的爆发洪水。快速变化的条件决定了对气候 - 冰川动力学的连续和详细观察的需求。有关冰川几何形状的主题和定量信息对于理解气候强迫和冰川对气候变化的敏感性的敏感性至关重要，但是，基于光谱信息和常规机器学习技术的使用，基于使用光谱信息和常规的机器学习技术，众所周知，准确地绘制碎片冰川冰川（DCG）。这项研究的目的是改善较早提出的基于深度学习的方法Glaciernet，该方法旨在利用卷积神经网络分割模型来准确地概述区域DCG消融区。具体而言，我们开发了一种增强的冰川架构，使多个模型，自动后处理和盆地级水文流技术来改善DCG的映射，从而包括消融区和积累区域。实验评估表明，GlacierNet2改善了消融区的估计，并允许高水平的交点比联合（IOU：0.8839）得分。所提出的体系结构在区域尺度上概述了完整的冰川（累积和消融区），总体评分为0.8619。这是自动化完整冰川映射的至关重要的第一步，可用于准确的冰川建模或质量平衡分析。

translated by 谷歌翻译

Deep-Learning-Based Single-Image Height Reconstruction from Very-High-Resolution SAR Intensity Data

Michael Recla , Michael Schmitt

分类：计算机视觉

2021-11-03

最初在具有基于图像的图像的机器人和自主驾驶等领域开发的领域，基于图像的单图像深度估计（侧面）发现了对更广泛的图像分析界的兴趣。遥感也不例外，因为在地形重建的背景下估计来自单个空中或卫星图像的高度地图的可能性很大。少数开创性的调查已经证明了从光学遥感图像的单个图像高度预测的一般可行性，并激发了这种方向的进一步研究。借鉴了本文，我们介绍了对遥感中的其他重要传感器模式的基于深度学习的单图像高度预测的第一次演示：合成孔径雷达（SAR）数据。除了用于SAR强度图像的卷积神经网络（CNN）架构的适应外，我们还为不同SAR成像模式和测试站点提供了用于生成训练数据的工作流程，以及广泛的实验结果。由于我们特别强调可转换性，我们能够确认基于深度的学习的单图像高度估计不仅可能，而且也是不可能的，而且也转移到未经看的数据，即使通过不同的成像模式和成像参数获取。

translated by 谷歌翻译

S2Looking: A Satellite Side-Looking Dataset for Building Change Detection

Li Shen , Yao Lu , Hao Chen , Hao Wei , Donghai Xie , Jiabao Yue , Rui Chen , Shouye Lv , Bitao Jiang

分类：计算机视觉 | 人工智能

2021-07-20

建筑变更检测是许多重要应用，特别是在军事和危机管理领域。最近用于变化检测的方法已转向深度学习，这取决于其培训数据的质量。因此，大型注释卫星图像数据集的组装对于全球建筑更改监视是必不可少的。现有数据集几乎完全提供近Nadir观看角度。这限制了可以检测到的更改范围。通过提供更大的观察范围，光学卫星的滚动成像模式提出了克服这种限制的机会。因此，本文介绍了S2Looking，一个建筑变革检测数据集，其中包含以各种偏离Nadir角度捕获的大规模侧视卫星图像。 DataSet由5000个批次图像对组成的农村地区，并在全球范围内超过65,920个辅助的变化实例。数据集可用于培训基于深度学习的变更检测算法。它通过提供（1）更大的观察角来扩展现有数据集; （2）大照明差异; （3）额外的农村形象复杂性。为了便于{该数据集的使用，已经建立了基准任务，并且初步测试表明，深度学习算法发现数据集明显比最接近的近Nadir DataSet，Levir-CD +更具挑战性。因此，S2Looking可能会促进现有的建筑变革检测算法的重要进步。 DataSet可在https://github.com/s2looking/使用。

translated by 谷歌翻译