Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of two-and three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance. Index Terms-Behavior understanding and description, fusion of data from multiple cameras, motion detection, personal identification , tracking, visual surveillance.
translated by 谷歌翻译
Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to diierences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research.
translated by 谷歌翻译
The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.
translated by 谷歌翻译
Mobile robot vision-based navigation has been the source of countless research contributions, from the domains of both vision and control. Vision is becoming more and more common in applications such as localization, automatic map construction, autonomous navigation, path following, inspection, monitoring or risky situation detection. This survey presents those pieces of work, from the nineties until nowadays, which constitute a wide progress in visual navigation techniques for land, aerial and autonomous underwater vehicles. The paper deals with two major approaches: map-based navigation and mapless navigation. Map-based navigation has been in turn subdivided in metric map-based navigation and topological map-based navigation. Our outline to mapless navigation includes reactive techniques based on qualitative characteristics extraction, appearance-based localization, optical flow, features tracking, plane ground detection/tracking, etc... The recent concept of visual sonar has also been revised.
translated by 谷歌翻译
This paper presents a survey on the latest methods of moving object detection in video sequences captured by a moving camera. Although many researches and excellent works have reviewed the methods of object detection and background subtraction for a fixed camera, there is no survey which presents a complete review of the existing different methods in the case of moving camera. Most methods in this field can be classified into four categories: modelling based background subtraction, trajectory classification, low rank and sparse matrix decomposition, and object tracking. We discuss in details each category and present the main methods which proposed improvements in the general concept of the techniques. We also present challenges and main concerns in this field as well as performance metrics and some benchmark databases available to evaluate the performance of different moving object detection algorithms.
translated by 谷歌翻译
在过去的几年中,特别是卷积神经网络(CNNs)的深度学习技术正在计算机视觉和机器学习领域中大量使用。这种深度学习技术在不同基准测试(如MNIST,CIFAR-10,CIFAR-100,Microsoft COCO和ImageNet)上的不同分类,分割和检测任务中提供了最先进的准确性。然而,在过去十年中,传统的机器学习方法已经进行了大量的Bangla牌照识别研究。由于其识别准确性较差,它们都不会用于部署孟加拉牌照识别系统(BLPRS)的物理系统。在本文中,我们已经实现了基于CNN的Banglalicense板识别系统,具有更高的精度,可以应用于不同的目的,包括路边援助,自动停车场管理系统,车辆牌照状态检测等。除此之外,我们还为BLPRS创建并发布了第一个标准数据库。
translated by 谷歌翻译
城市中的道路网络是巨大的,并且是移动性的关键组成部分。对缺陷的快速反应是不可避免的,这种缺陷不仅可能由于经常磨损而且还因为暴风雨等极端事件而发生,这是至关重要的。因此,需要一种快速,可扩展且具有成本效益的自动化系统来收集有关缺陷的信息。我们提出了一个城市规模道路审计系统,利用深度学习和语义分割的一些最新发展。为了构建系统和对系统进行基准测试,我们策划了一个数据集,其中包含道路缺陷所需的注释。然而,许多需要进行道路审计的标签具有很高的模糊性,我们通过提出标签层次结构来克服这些模糊性。我们还提出了一个多步深度学习模型,它将道路分割,将道路进一步划分为缺陷,为每个缺陷标记框架,最后将缺陷定位在使用GPS收集的地图上。对图像标记的模型进行分析和评估,以及在标签层次结构的不同级别进行分割。
translated by 谷歌翻译
Driver distraction, defined as the diversion of attention away from activities critical for safe driving toward a competing activity, is increasingly recognized as a significant source of injuries and fatalities on the roadway. Additionally, the trend towards increasing the use of in-vehicle information systems is critical because they induce visual, biomechanical and cognitive distraction and may affect driving performance in qualitatively different ways. Non-intrusive methods are strongly preferred for monitoring distraction, and vision-based systems have appeared to be attractive for both drivers and researchers. Biomechanical, visual and cognitive distractions are the most commonly detected types in video-based algorithms. Many distraction detection systems only use a single visual cue and therefore, they may be easily disturbed when occlusion or illumination changes appear. Moreover, the combination of these visual cues is a key and challenging aspect in the development of robust distraction detection systems. These visual cues can be extracted mainly by using face monitoring systems but they should be completed with more visual cues (e.g., hands or body information) or even, distraction detection from specific actions (e.g., phone usage). Additionally, these algorithms should be included in an embedded device or system inside a car. This is not a trivial task and several requirements must be taken into account: reliability, real-time performance, low cost, small size, low power consumption, flexibility and short time-to-market. The key points for the development and implementation of sensors to carry out the detection of distraction will also be reviewed. This paper shows a review of the role of computer vision technology applied to the development of monitoring systems to detect distraction. Some key points considered as both future work and challenges ahead yet to be solved will also be addressed.
translated by 谷歌翻译
计算机视觉在过去十年中已经发展成为许多应用取代人类监督的关键技术。在本文中,我们对公共场所异常检测的相关视觉监控相关研究进行了研究,主要集中在道路上。首先,我们重新审视过去10年来该领域的调查问卷。由于典型异常检测的基础构建块是学习,我们更多地强调应用于视频场景的学习方法。然后,我们总结了过去六年中关于异常检测的重要贡献,主要关注使用单稳态相机的特征,基础技术,应用场景和异常类型。最后,我们讨论计算机视觉相关异常检测技术中的挑战以及一些重要的未来可能性。
translated by 谷歌翻译
This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery. It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems. Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition. Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed. The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared. This review provides a fundamental comparison and analysis of the remaining problems in the field.
translated by 谷歌翻译
Intersections are known for their integral and complex nature due to a variety of the participants' behaviors and interactions. This paper presents a review of recent studies on the behavior at intersections and the safety analysis for three types of participants at intersections: vehicles, drivers, and pedestrians. This paper emphasizes on techniques which are strong candidates for automation with visual sensing technology. A new behavior and safety classification is presented based on key features used for intersection design, planning, and safety. In addition, performance metrics are introduced to evaluate different studies, and insights are provided regarding the state of the art, inputs, algorithms, challenges, and shortcomings.
translated by 谷歌翻译
In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.
translated by 谷歌翻译
在智能交通系统中,随着我们走向智能城市时代,监控和分析道路使用者的实时系统变得越来越重要。基于视觉的对象检测,多目标跟踪和交通事故近似检测框架是智能交通系统的重要应用,特别是在视频监控等领域。虽然深度神经网络最近在许多计算机视觉任务中取得了巨大成功,但所有的计算机框架都采用了统一的框架。从实时性能,复杂的城市环境,高度动态的交通事件和许多交通运动的需求中挑战成倍增加的三项任务仍然具有挑战性。在本文中,我们提出了一种双流卷积网络架构,可以对交通视频数据中的道路使用者进行实时检测,跟踪和近距离事故检测。双流模型包括用于对象检测的空间流网络和用于多对象跟踪的时间流网络容忍运动特征。我们通过结合双流网络的外观特征和运动特征来检测近距离事故。使用航拍视频,我们提出了一种交通事故近似数据集(TNAD),涵盖了适用于基于视觉的流量分析任务的各种类型的交通互动。我们的实验证明了我们的框架在TNAD数据集上具有高帧率的整体竞争定性和定量性能的优势。
translated by 谷歌翻译
Camera deployments are ubiquitous, but existing methods to analyze video feeds do not scale and are error-prone. We describe Optasia, a dataflow system that employs relational query optimization to efficiently process queries on video feeds from many cameras. Key gains of Optasia result from modularizing vision pipelines in such a manner that rela-tional query optimization can be applied. Specifically, Op-tasia can (i) de-duplicate the work of common modules, (ii) auto-parallelize the query plans based on the video input size, number of cameras and operation complexity, (iii) offers chunk-level parallelism that allows multiple tasks to process the feed of a single camera. Evaluation on traffic videos from a large city on complex vision queries shows high accuracy with many fold improvements in query completion time and resource usage relative to existing systems.
translated by 谷歌翻译
过去几十年见证了无人驾驶飞机(UAV)在民用领域的巨大进步,特别是在摄影测量和遥感技术方面。与载人飞机和卫星平台相比,UAV平台具有许多有前途的特点:灵活性,高效率,高空间/时间分辨率,低成本,易操作等,这些都是对其他遥感平台的有效补充。一种具有成本效益的遥感装置。考虑到近年来基于无人机遥感技术的普及和扩展,本文对无人机在遥感社区中的发展和未来前景进行了系统的调查。具体地,首先讨论和总结了基于无人机的遥感数据处理的主要挑战和关键技术。然后,我们概述了无人机在远程传感中的广泛应用。最后,讨论了未来工作的一些前景。我们希望本文能够为遥感研究人员提供近期基于UAV的遥感发展的整体情况,并有助于指导对该主题的进一步研究。
translated by 谷歌翻译
Tracking of vehicles across multiple cameras with non-overlapping views has been a challenging task for the intelligent transportation system (ITS). It is mainly because of high similarity among vehicle models, frequent occlusion, large variation in different viewing perspectives and low video resolution. In this work, we propose a fusion of visual and semantic features for both single-camera tracking (SCT) and inter-camera tracking (ICT). Specifically, a histogram-based adaptive appearance model is introduced to learn long-term history of visual features for each vehicle target. Besides, semantic features including trajectory smoothness, velocity change and temporal information are incorporated into a bottom-up clustering strategy for data association in each single camera view. Across different camera views, we also exploit other information, such as deep learning features, detected license plate features and detected car types, for vehicle re-identification. Additionally, evolutionary optimization is applied to camera calibration for reliable 3D speed estimation. Our algorithm achieves the top performance in both 3D speed estimation and vehicle re-identification at the NVIDIA AI City Challenge 2018.
translated by 谷歌翻译
Anomaly detection on road traffic is an important task due to its great potential in urban traffic management and road safety. It is also a very challenging task since the abnormal event happens very rarely and exhibits different behaviors. In this work, we present a model to detect anomaly in road traffic by learning from the vehicle motion patterns in two distinctive yet correlated modes, i.e., the static mode and the dynamic mode, of the vehicles. The static mode analysis of the vehicles is learned from the background modeling followed by vehicle detection procedure to find the abnormal vehicles that keep still on the road. The dynamic mode analysis of the vehicles is learned from detected and tracked vehicle trajectories to find the abnormal trajectory which is aberrant from the dominant motion patterns. The results from the dual-mode analyses are finally fused together by driven a re-identification model to obtain the final anomaly. Experimental results on the Track 2 testing set of NVIDIA AI CITY CHALLENGE show the effectiveness of the proposed dual-mode learning model and its robustness in different real scenes. Our result ranks the first place on the final Leaderboard of the Track 2.
translated by 谷歌翻译
Many researches are going on in the field of optical character recognition (OCR) for the last few decades and a lot of articles have been published. Also a large number of OCR is available commercially. In this literature a review of the OCR history and the various techniques used for OCR development in the chronological order is being done.
translated by 谷歌翻译
我们调查了自DARPA挑战以来开发的自动驾驶汽车文献中发表的关于自动驾驶汽车的研究,这些汽车配备了可归类为SAE 3级或更高等级的自治系统。自动驾驶汽车自治系统的结构通常被组织到感知系统和决策系统中。感知系统通常分为许多子系统,负责执行诸如装配 - 驾驶 - 汽车定位,静态障碍物映射,移动障碍物检测和跟踪,道路测绘,交通信号检测和识别等任务。决策系统通常被划分为许多子系统,负责任务,例如路线规划,路径规划,行为选择,运动规划和控制。在本次调查中,我们展示了自动驾驶汽车自治系统的典型架构。我们还回顾了相关的感知和决策方法的研究。此外,我们还详细描述了UFES汽车自动化系统的架构,IARA 。最后,我们列出了由科技公司开发并在媒体上报道的着名的自主研究汽车。
translated by 谷歌翻译
With the fast advancements of AICity and omnipresent street cameras, smart transportation can benefit greatly from actionable insights derived from video analytics. We participate the NVIDIA AICity Challenge 2018 in all three tracks of challenges. In Track 1 challenge, we demonstrate automatic traffic flow analysis using the detection and tracking of vehicles with robust speed estimation. In Track 2 challenge, we develop a reliable anomaly detection pipeline that can recognize abnormal incidences including stalled vehicles and crashes with precise locations and time segments. In Track 3 challenge, we present an early result of vehicle re-identification using deep triplet-loss features that matches vehicles across 4 cameras in 15+ hours of videos. All developed methods are evaluated and compared against 30 contesting methods from 70 registered teams on the real-world challenge videos.
translated by 谷歌翻译