在过去的几年中,特别是卷积神经网络(CNNs)的深度学习技术正在计算机视觉和机器学习领域中大量使用。这种深度学习技术在不同基准测试(如MNIST,CIFAR-10,CIFAR-100,Microsoft COCO和ImageNet)上的不同分类,分割和检测任务中提供了最先进的准确性。然而,在过去十年中,传统的机器学习方法已经进行了大量的Bangla牌照识别研究。由于其识别准确性较差,它们都不会用于部署孟加拉牌照识别系统(BLPRS)的物理系统。在本文中,我们已经实现了基于CNN的Banglalicense板识别系统,具有更高的精度,可以应用于不同的目的,包括路边援助,自动停车场管理系统,车辆牌照状态检测等。除此之外,我们还为BLPRS创建并发布了第一个标准数据库。
translated by 谷歌翻译
Container name extraction is very important to the modern container management system.Similar techniques have been suggested for vehicle license plate recognition in past decades.Container name extraction has more complexity from license plate extraction because of the severity of nonuniform illumination and invalidation of color information.The main purpose of this paper is to propose a new methodology for text extraction,segmenting text characters and removing non text,background from images. Existing text extraction methods do not work efficiently in case of images with noice and complex background. Documents with only text work efficiently in OCR.The approach used is based on edge detection,close operation,detecting connected components,removing non text regions and character segmentation.
translated by 谷歌翻译
Tracking of vehicles across multiple cameras with non-overlapping views has been a challenging task for the intelligent transportation system (ITS). It is mainly because of high similarity among vehicle models, frequent occlusion, large variation in different viewing perspectives and low video resolution. In this work, we propose a fusion of visual and semantic features for both single-camera tracking (SCT) and inter-camera tracking (ICT). Specifically, a histogram-based adaptive appearance model is introduced to learn long-term history of visual features for each vehicle target. Besides, semantic features including trajectory smoothness, velocity change and temporal information are incorporated into a bottom-up clustering strategy for data association in each single camera view. Across different camera views, we also exploit other information, such as deep learning features, detected license plate features and detected car types, for vehicle re-identification. Additionally, evolutionary optimization is applied to camera calibration for reliable 3D speed estimation. Our algorithm achieves the top performance in both 3D speed estimation and vehicle re-identification at the NVIDIA AI City Challenge 2018.
translated by 谷歌翻译
Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to diierences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research.
translated by 谷歌翻译
The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.
translated by 谷歌翻译
Camera deployments are ubiquitous, but existing methods to analyze video feeds do not scale and are error-prone. We describe Optasia, a dataflow system that employs relational query optimization to efficiently process queries on video feeds from many cameras. Key gains of Optasia result from modularizing vision pipelines in such a manner that rela-tional query optimization can be applied. Specifically, Op-tasia can (i) de-duplicate the work of common modules, (ii) auto-parallelize the query plans based on the video input size, number of cameras and operation complexity, (iii) offers chunk-level parallelism that allows multiple tasks to process the feed of a single camera. Evaluation on traffic videos from a large city on complex vision queries shows high accuracy with many fold improvements in query completion time and resource usage relative to existing systems.
translated by 谷歌翻译
An automatic extraction method of container identity codes based on template matching is proposed. With different kinds of noises on the image, the container code can hardly be extracted. Initially, the container image is filtered with both adaptive linear and nonlinear filters in order to reduce noise so that the candidate text lines can be properly located. Then, a series of standard templates has been brought forward according to the standard align modes of the container identification (ID) codes. Finally, the align mode of each candidate text line is obtained and then matched with those standard templates and the container ID codes can be extracted automatically. Results show that this method can segment the container ID codes with high performance. Index Terms-Container identity code, image processing, template matching.
translated by 谷歌翻译
目前,居住在公寓内的秘鲁家庭的数量有所增加而不是房屋,这些都有很多优势;然而,在某些情况下,存在一些麻烦,例如通常留在停车场的货物抢劫或使用租户停车场的陌生人的入口(这最后的麻烦有时与建筑物内的绑架或抢劫有关)。由于这些问题,建议使用自驾驶小型车,使用深度学习模型在建筑物内的地下车库中实施车牌监控系统,目的是记录车辆并识别车主(如果他们是租户或不。此外,小型机器人有自己的定位系统,使用信标,当迷你车在途中时,该信标允许识别对应于建筑物的每个租户的停车场的位置。最后,这项工作的目标之一是建造一个低成本的迷你机器人,它将取代昂贵的相机或一起工作,以保证租户的货物安全。
translated by 谷歌翻译
训练良好的深度学习模型通常需要大量注释数据。由于大量标记数据通常难以收集并且甚至难以注释,因此在训练深度神经网络的过程中广泛使用数据增加和数据生成。然而,对于获得满意的性能需要多少标记数据存在非常的共识。在本文中,我们尝试使用车牌字符识别作为示例应用来解决这样的问题。我们应用计算机图形脚本和生成性对抗网络来生成和增加大量带注释的合成车牌图像,其中包含来自少量真实手动标记的车牌图像的逼真颜色,字体和字符组成。生成和增强的数据被混合并用作从DenseNet修改的车牌识别网络的训练数据。实验结果表明,从生成的混合训练数据训练的模型具有良好的泛化能力,所提出的方法在数据集-1和AOLP上实现了新的最新精度,即使原始真实牌照的数量非常有限。另外,当标记图像的数量减少时,由数据生成引起的精度提高变得更加显着。当标记图像的数量增加时,数据增加也起着更重要的作用。
translated by 谷歌翻译
车辆制造和模型识别(MMR)系统提供全自动框架,以识别和分类不同的车辆模型。已经提出了Severalapproach来应对这一挑战,但是它们可以在限制条件下进行。在此,我们将车辆制造和模型识别制定为细粒度分类问题,并提出一种新的可配置的公路车辆制造和模型识别框架。我们受益于无监督的特征学习方法,并且在更多细节中,我们使用位置约束线性编码(LLC)方法作为对输入SIFT特征进行编码的快速特征编码器。所提出的方法可以在不同条件的重新环境中执行。该框架可以识别50种车辆模型,并且具有将不属于指定的50个类别之一的每个其他车辆分类为未知车辆的优点。可以将建议的MMR框架配置为基于应用域更快或更准确。所提出的方法在两个数据集上进行检查,包括伊朗公路车辆数据集和CompuCar数据集。伊朗公路车辆数据集包含50种车辆模型的图像,这些车辆在交通摄像机的不同天气和照明条件下进行了重新评估。实验结果表明,所提出的框架优于伊朗公路车辆数据集的最新方法。 CompuCar数据集的可比结果分别为97.5%和98.4%。
translated by 谷歌翻译
Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of two-and three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance. Index Terms-Behavior understanding and description, fusion of data from multiple cameras, motion detection, personal identification , tracking, visual surveillance.
translated by 谷歌翻译
计算机视觉在过去十年中已经发展成为许多应用取代人类监督的关键技术。在本文中,我们对公共场所异常检测的相关视觉监控相关研究进行了研究,主要集中在道路上。首先,我们重新审视过去10年来该领域的调查问卷。由于典型异常检测的基础构建块是学习,我们更多地强调应用于视频场景的学习方法。然后,我们总结了过去六年中关于异常检测的重要贡献,主要关注使用单稳态相机的特征,基础技术,应用场景和异常类型。最后,我们讨论计算机视觉相关异常检测技术中的挑战以及一些重要的未来可能性。
translated by 谷歌翻译
Mobile robot vision-based navigation has been the source of countless research contributions, from the domains of both vision and control. Vision is becoming more and more common in applications such as localization, automatic map construction, autonomous navigation, path following, inspection, monitoring or risky situation detection. This survey presents those pieces of work, from the nineties until nowadays, which constitute a wide progress in visual navigation techniques for land, aerial and autonomous underwater vehicles. The paper deals with two major approaches: map-based navigation and mapless navigation. Map-based navigation has been in turn subdivided in metric map-based navigation and topological map-based navigation. Our outline to mapless navigation includes reactive techniques based on qualitative characteristics extraction, appearance-based localization, optical flow, features tracking, plane ground detection/tracking, etc... The recent concept of visual sonar has also been revised.
translated by 谷歌翻译
This paper presents a survey on the latest methods of moving object detection in video sequences captured by a moving camera. Although many researches and excellent works have reviewed the methods of object detection and background subtraction for a fixed camera, there is no survey which presents a complete review of the existing different methods in the case of moving camera. Most methods in this field can be classified into four categories: modelling based background subtraction, trajectory classification, low rank and sparse matrix decomposition, and object tracking. We discuss in details each category and present the main methods which proposed improvements in the general concept of the techniques. We also present challenges and main concerns in this field as well as performance metrics and some benchmark databases available to evaluate the performance of different moving object detection algorithms.
translated by 谷歌翻译
In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.
translated by 谷歌翻译
With the fast advancements of AICity and omnipresent street cameras, smart transportation can benefit greatly from actionable insights derived from video analytics. We participate the NVIDIA AICity Challenge 2018 in all three tracks of challenges. In Track 1 challenge, we demonstrate automatic traffic flow analysis using the detection and tracking of vehicles with robust speed estimation. In Track 2 challenge, we develop a reliable anomaly detection pipeline that can recognize abnormal incidences including stalled vehicles and crashes with precise locations and time segments. In Track 3 challenge, we present an early result of vehicle re-identification using deep triplet-loss features that matches vehicles across 4 cameras in 15+ hours of videos. All developed methods are evaluated and compared against 30 contesting methods from 70 registered teams on the real-world challenge videos.
translated by 谷歌翻译
In this paper, a new algorithm for vehicle logo recognition on the basis of an enhanced scale-invariant feature transform (SIFT)-based feature-matching scheme is proposed. This algorithm is assessed on a set of 1200 logo images that belong to ten distinctive vehicle manufacturers. A series of experiments are conducted , splitting the 1200 images to a training set and a testing set, respectively. It is shown that the enhanced matching approach proposed in this paper boosts the recognition accuracy compared with the standard SIFT-based feature-matching method. The reported results indicate a high recognition rate in vehicle logos and a fast processing time, making it suitable for real-time applications.
translated by 谷歌翻译
This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery. It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems. Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition. Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed. The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared. This review provides a fundamental comparison and analysis of the remaining problems in the field.
translated by 谷歌翻译
本文的重点是搜索监控网络中出现的特定车辆。现有方法通常假设从监控视频中很好地裁剪车辆图像,然后使用视觉属性,如颜色和类型,或车牌号码来匹配图像集中的目标车辆。然而,完整的车辆搜索系统应该考虑车辆检测,表示,索引,存储,匹配等问题。此外,基于属性的搜索由于不同摄像机的内部变化和极不确定的环境而无法准确找到相同的车辆。此外,由于低分辨率和噪声,车牌可能在监视场景中被识别。在本文中,设计了一种名为PVSS的渐进式车辆搜索系统来解决上述问题。 PVSS由三个模块组成:爬虫,索引器和搜索器。车辆爬行器旨在检测和跟踪监视视频中的车辆并将捕获的车辆图像,元数据和上下文信息传送到服务器或云。然后由车辆索引器提取和索引多粒度属性,例如视觉特征和车牌指纹。最后,将具有输入车辆图像,时间范围和空间范围的查询三元组作为车辆搜索者的输入。将通过渐进过程在数据库中搜索目标车辆。来自真实监控网络的公共数据集的广泛实验验证了PVSS的有效性。
translated by 谷歌翻译
城市中的道路网络是巨大的,并且是移动性的关键组成部分。对缺陷的快速反应是不可避免的,这种缺陷不仅可能由于经常磨损而且还因为暴风雨等极端事件而发生,这是至关重要的。因此,需要一种快速,可扩展且具有成本效益的自动化系统来收集有关缺陷的信息。我们提出了一个城市规模道路审计系统,利用深度学习和语义分割的一些最新发展。为了构建系统和对系统进行基准测试,我们策划了一个数据集,其中包含道路缺陷所需的注释。然而,许多需要进行道路审计的标签具有很高的模糊性,我们通过提出标签层次结构来克服这些模糊性。我们还提出了一个多步深度学习模型,它将道路分割,将道路进一步划分为缺陷,为每个缺陷标记框架,最后将缺陷定位在使用GPS收集的地图上。对图像标记的模型进行分析和评估,以及在标签层次结构的不同级别进行分割。
translated by 谷歌翻译