Recent advances in computer vision-in the form of deep neural networks-have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NOSCOPE, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. Given a target video, object to detect, and reference neural network, NOSCOPE automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive. NOSCOPE cascades two types of models: specialized models that forego the full generality of the reference model but faithfully mimic its behavior for the target video and object; and difference detectors that highlight temporal differences across frames. We show that the optimal cascade architecture differs across videos and objects, so NOSCOPE uses an efficient cost-based optimizer to search across models and cascades. With this approach, NOSCOPE achieves two to three order of magnitude speed-ups (265-15,500× real-time) on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1-5% of state-of-the-art neural networks.
translated by 谷歌翻译
As low-cost surveillance cameras proliferate, we advocate for these cameras to be zero streaming: ingesting videos directly to their local storage and only communicating with the cloud in response to queries. To support queries over videos stored on zero-streaming cameras, we describe a system that spans the cloud and cameras. The system builds on two unconventional ideas. When ingesting video frames, a camera learns accurate knowledge on a sparse sample of frames, rather than learning inaccurate knowledge on all frames; in executing one query, a camera processes frames in multiple passes with multiple operators trained and picked by the cloud during the query, rather than one-pass processing with operator(s) decided ahead of the query. On diverse queries over 720-hour videos and with typical wireless network bandwidth and low-cost camera hardware, our system runs at more than 100× video realtime. It outperforms competitive alternative designs by at least 4× and up to two orders of magnitude.
translated by 谷歌翻译
Video cameras are pervasively deployed for security and smart city scenarios, with millions of them in large cities worldwide. Achieving the potential of these cameras requires efficiently analyzing the live videos in real-time. We describe VideoStorm, a video analytics system that processes thousands of video analytics queries on live video streams over large clusters. Given the high costs of vision processing, resource management is crucial. We consider two key characteristics of video ana-lytics: resource-quality tradeoff with multi-dimensional configurations, and variety in quality and lag goals. VideoStorm's offline profiler generates query resource-quality profile, while its online scheduler allocates resources to queries to maximize performance on quality and lag, in contrast to the commonly used fair sharing of resources in clusters. Deployment on an Azure cluster of 101 machines shows improvement by as much as 80% in quality of real-world queries and 7× better lag, processing video from operational traffic cameras.
translated by 谷歌翻译
Camera deployments are ubiquitous, but existing methods to analyze video feeds do not scale and are error-prone. We describe Optasia, a dataflow system that employs relational query optimization to efficiently process queries on video feeds from many cameras. Key gains of Optasia result from modularizing vision pipelines in such a manner that rela-tional query optimization can be applied. Specifically, Op-tasia can (i) de-duplicate the work of common modules, (ii) auto-parallelize the query plans based on the video input size, number of cameras and operation complexity, (iii) offers chunk-level parallelism that allows multiple tasks to process the feed of a single camera. Evaluation on traffic videos from a large city on complex vision queries shows high accuracy with many fold improvements in query completion time and resource usage relative to existing systems.
translated by 谷歌翻译
我们介绍了DeepCache,这是一种原理缓存设计,用于持续移动视觉中的深度学习推理。 DeepCache通过在输入视频流中开发时间局部性来提高模型执行效率。它解决了移动视觉引发的一个关键挑战:缓存必须在视频场景变化下运行,同时在缓存性,开销和模式精度损失之间进行权衡。在模型的输入端,DeepCache通过利用视频的内部结构发现视频时间局部性,为此借鉴了视频压缩的经验证据;在模型中,DeepCache通过利用模型的内部结构来传播可重用结果的区域。值得注意的是,DeepCache避免将视频启发式应用于模型内部模型,这些内部结构不是像素,而是高维,难以理解的数据。我们的DeepCache实现与未经修改的深度学习模型一起使用,需要零开发人员的手动工作,因此可立即部署在现成的移动设备上。我们的实验表明,DeepCache平均将推理执行时间节省了18%,最多可节省47%。 DeepCache平均将系统能耗降低20%。
translated by 谷歌翻译
用于视频分析的大型摄像机网络的部署是既定且加速的趋势。许多真实的视频推断应用需要一个共同的问题模板:通过现场视频中的大型摄像机网络搜索感兴趣的对象或活动(例如,人,超速车辆)。这种称为跨摄像头分析的能力是计算和数据密集型的 - 在实时视频流的吞吐量下,要求跨摄像头和跨帧自动搜索。为了解决从大型部署处理每个原始视频帧的成本挑战,我们提出了一种新的系统高效的跨摄像头视频分析ReXCam。 ReXCam利用真实摄像机网络动态中的空间和时间位置来指导其对查询身份的推理时间搜索。在离线分析阶段,ReXCam构建跨摄像机相关模型,该模型对历史交通模式中观察到的位置进行编码。在推理时,ReXCam应用此模型来过滤在空间上和时间上与查询身份的当前位置相关的帧。在偶尔错过检测的情况下,ReXCam会对最近过滤的视频帧执行快速重播搜索,从而实现优雅的恢复。这些技术结合在一起,使得ReXCam可以将计算工作量减少4.6倍,并且通过八个摄像头的知名视频数据集将推理精度提高了27%,同时保持在基线召回率的1-2%之内。
translated by 谷歌翻译
Holistic medical multimedia systems covering end-to-end functionality from data collection to aided diagnosis are highly needed, but rare. In many hospitals, the potential value of multimedia data collected through routine examinations is not recognized. Moreover, the availability of the data is limited, as the health care personnel may not have direct access to stored data. However, medical specialists interact with multimedia content daily through their everyday work and have an increasing interest in finding ways to use it to facilitate their work processes. In this article, we present a novel, holistic multimedia system aiming to tackle automatic analysis of video from gastrointestinal (GI) endoscopy. The proposed system comprises the whole pipeline, including data collection, processing, analysis, and visualization. It combines filters using machine learning, image recognition, and extraction of global and local image features. The novelty is primarily in this holistic approach and its real-time performance, where we automate a complete algorithmic GI screening process. We built the system in a modular way to make it easily extendable to analyze various abnormalities, and we made it efficient in order to run in real time. The conducted experimental evaluation proves that the detection and localization accuracy are comparable or even better than existing systems, but it is by far leading in terms of real-time performance and efficient resource consumption.. 2017. From annotation to computer-aided diagnosis: Detailed evaluation of a medical multimedia system. ACM Trans. Multimedia Comput. Commun.
translated by 谷歌翻译
Mainstream is a new video analysis system that jointly adapts concurrent applications sharing fixed edge resources to maximize aggregate result quality. Mainstream exploits partial-DNN (deep neural network) compute sharing among applications trained through transfer learning from a common base DNN model, decreasing aggregate per-frame compute time. Based on the available resources and mix of applications running on an edge node, Mainstream automatically determines at deployment time the right trade-off between using more specialized DNNs to improve per-frame accuracy, and keeping more of the unspecialized base model to increase sharing and process more frames per second. Experiments with several datasets and event detection tasks on an edge node confirm that Mainstream improves mean event detection F1-scores by up to 47% relative to a static approach of retraining only the last DNN layer and sharing all others ("Max-Sharing") and by 87X relative to the common approach of using fully independent per-application DNNs ("No-Sharing").
translated by 谷歌翻译
高质量的计算机视觉模型通常解决了解真实世界图像的一般分布的问题。然而,大多数相机只观察到这种分布的很小一部分。这提供了通过将紧凑的低成本模型专门用于由单面板观察到的特定分布框架来实现更有效推断的可能性。在本文中,我们采用模型蒸馏技术(使用高成本教师的输出监督低成本学生模型),将精确,低成本的语义分割模型专门化为目标视频流。我们不是从视频流中学习离线数据的专业学生模型,而是通过实时视频在线培训学生,间歇性地运行教师以提供学习目标。 Onlinemodel蒸馏产生语义分割模型,即使目标视频的分布是非静态的,它们也会使Mask R-CNN教师接近7到17倍的推理运行时成本(11到26x FLOP)。我们的方法不需要对目标视频流进行离线预训练,并且比基于流或视频对象分割的解决方案实现更高的准确性和更低的成本。我们还提供了一个新的视频数据集,用于评估长时间运行的视频流的推理效率。
translated by 谷歌翻译
尽管加速卷积神经网络(CNN)受到越来越多的研究关注,但资源消耗的节省总是伴随着准确性的提高。为了提高准确性和减少资源消耗,我们探索了一种称为类倾斜的环境信息,它很容易获得并且在日常生活中广泛存在。由于类随着时间的推移而扭曲了mayswitch,我们提出概率层来利用运行时期间任何开销的类。此外,我们观察到类偏斜,将来可能会出现一些类偏斜,称为hotclass skew,而其他类将不再出现或很少出现,称为冷类偏斜。受源代码优化技术的启发,提出了两种模式,即解释和编译。解释模式在运行时期间针对冷类偏斜进行有效调整,并且编译模式在热点上进行积极优化以便在将来更有效地部署。积极优化由特定类的修剪处理,并提供额外的好处。最后,我们设计了系统框架SECS,以动态检测类偏斜,处理解释和编译,以及在运行时资源预算下选择最精确的体系结构。广泛的评估表明,相对于最先进的卷积神经网络,SECS可以在高精度的情况下实现端到端分类加速3倍至11倍。
translated by 谷歌翻译
Wearable devices with in-built cameras present interesting opportunities for users to capture various aspects of their daily life and are potentially also useful in supporting users with low vision in their everyday tasks. However, state-of-the-art image wearables available in the market are limited to capturing images periodically and do not provide any real-time analysis of the data that might be useful for the wearers. In this paper, we present DeepEye-a matchbox sized wearable camera that is capable of running multiple cloud-scale deep learning models locally on the device, thereby enabling rich analysis of the captured images in near real-time without offloading them to the cloud. DeepEye is powered by a commodity wearable processor (Snapdragon 410) which ensures its wearable form factor. The software architecture for DeepEye addresses a key limitation with executing multiple deep learning models on constrained hardware, that is their limited runtime memory. We propose a novel inference software pipeline that targets the local execution of multiple deep vision models (specifically, CNNs) by interleaving the execution of computation-heavy convolutional layers with the loading of memory-heavy fully-connected layers. Beyond this core idea, the execution framework incorporates: a memory caching scheme and a selective use of model compression techniques that further minimizes memory bottlenecks. Through a series of experiments, we show that our execution framework out-performs the baseline approaches significantly in terms of inference latency, memory requirements and energy consumption.
translated by 谷歌翻译
深度神经网络(DNN)目前广泛用于许多人工智能(AI)应用,包括计算机视觉,语音识别和机器人技术。虽然DNN在许多AI任务上提供最先进的准确性,但却以高计算复杂性为代价。因此,技术要求能够有效地处理DNN以提高能量效率和吞吐量,而不会牺牲应用精度或增加对AI系统中DNN的广泛部署至关重要的硬件成本。本文旨在提供有关实现DNN高效处理目标的最新进展的综合指南和调查。具体而言,它将提供DNN的概述,讨论支持DNN的各种硬件平台和体系结构,并突出降低计算成本的关键趋势通过联合硬件设计和DNN算法变化,仅通过硬件设计变更或DNN。它还将总结各种开发资源,使研究人员和从业人员能够快速开始这一领域,并突出重要的基准测量指标和设计考虑因素,用于评估快速增长的DNN硬件设计数量,可选择包括算法设计,在学术界和行业。读者将从本文中删除以下概念:了解DNN的关键设计注意事项;能够使用基准和比较指标评估不同的DNN硬件实现;了解各种硬件架构和平台之间的权衡;能够评估各种DNN设计技术在高效处理中的实用性;并了解最近的实施趋势和机会。
translated by 谷歌翻译
无人驾驶飞行器(无人驾驶飞机)正在成为一种有前景的环境和基础设施监测技术,广泛应用于各种应用领域。许多这样的应用程序需要使用计算机视觉算法来分析从车载摄像机捕获的信息。这些应用包括检测用于紧急响应和交通监控的车辆。因此,本文探讨了在基于深度卷积神经网络(CNN)的单发物体探测器的开发中所涉及的权衡,所述单核射击神经网络能够使无人机在诸如无人机中执行车辆检测资源不足的环境。本文提出了设计此类系统的绝对方法;数据收集和培训阶段,CNN架构,以及在适合无人机部署的轻量级嵌入式处理平台上有效映射CNN所需的优化。通过分析,我们提出了一种CNN架构,能够检测来自航空无人机图像的车辆,并且可以在各种平台上以每秒5-18帧的速度运行,总体精度为~95%。总的来说,所提出的架构适用于无人机应用,利用可以部署在商用UAV上的低功率嵌入式处理器。
translated by 谷歌翻译
Internet-enabled cameras pervade daily life, generating a huge amount of data, but most of the video they generate is transmitted over wires and analyzed offline with a human in the loop. The ubiq-uity of cameras limits the amount of video that can be sent to the cloud, especially on wireless networks where capacity is at a premium. In this paper, we present Vigil, a real-time distributed wireless surveillance system that leverages edge computing to support real-time tracking and surveillance in enterprise campuses, retail stores, and across smart cities. Vigil intelligently partitions video processing between edge computing nodes co-located with cameras and the cloud to save wireless capacity, which can then be dedicated to Wi-Fi hotspots, offsetting their cost. Novel video frame priori-tization and traffic scheduling algorithms further optimize Vigil's bandwidth utilization. We have deployed Vigil across three sites in both whitespace and Wi-Fi networks. Depending on the level of activity in the scene, experimental results show that Vigil allows a video surveillance system to support a geographical area of coverage between five and 200 times greater than an approach that simply streams video over the wireless network. For a fixed region of coverage and bandwidth, Vigil outperforms the default equal throughput allocation strategy of Wi-Fi by delivering up to 25% more objects relevant to a user's query.
translated by 谷歌翻译
由于数字技术的最新进展和可信数据的可用性,人工智能,深度学习领域已经出现,并且已经证明了其解决复杂学习问题的能力和有效性。特别是,卷积神经网络(CNN)已经证明了它们在图像检测和识别应用中的有效性。但是,它们需要密集的CPU操作和内存带宽,这使得通用CPU无法达到所需的性能水平。因此,使用了专用集成电路(ASIC),现场可编程门阵列(FPGA)和图形处理单元(GPU)的硬件加速器提高CNN的吞吐量更确切地说,FPGA最近被采用来加速深度学习网络的实现,因为它们能够最大化并行性以及由于它们的能量效率。在本文中,我们回顾了现有的加速FPGA深度学习网络的技术。我们强调了各种技术所采用的关键特性,以提高加速性能。此外,我们还提供了有关增强FPGA用于CNN加速的利用率的建议。本文研究的技术代表了基于FPGA的加速学习网络加速器的最新趋势。因此,本次审查预计将指导未来有效的硬件加速器的发展,并对深度学习研究人员有用。
translated by 谷歌翻译
在计算机视觉的进步和相机硬件成本下降的推动下,组织正在整体部署视频摄像机,以便对其物理场所进行空间监控。然而,将视频分析扩展到大型摄像机部署会带来新的挑战,因为计算成本与摄像机源的数量成比例增长。本文由一个简单的问题驱动:我们可以扩展视频分析,以便在我们部署更多相机时,线性增加或甚至保持不变,同时推理精度保持稳定甚至提高。我们相信答案是。我们的主要观察是来自广域相机部署的视频馈送在空间和时间上证明了显着的内容相关性(例如,与其他地理上相关的馈送)。可以利用这些时空相关性来显着减小推理搜索空间的大小,从而降低多摄像机视频分析中的工作负载和误报率。通过讨论用例和技术挑战,我们提出了将视频分析扩展到大型摄像机网络的路线图,并概述了其实现的计划。
translated by 谷歌翻译
In the era of the Internet of Things (IoT), an enormous amount of sensing devices collect and/or generate various sensory data over time for a wide range of fields and applications. Based on the nature of the application, these devices will result in big or fast/real-time data streams. Applying analytics over such data streams to discover new information, predict future insights, and make control decisions is a crucial process that makes IoT a worthy paradigm for businesses and a quality-of-life improving technology. In this paper, we provide a thorough overview on using a class of advanced machine learning techniques, namely Deep Learning (DL), to facilitate the analytics and learning in the IoT domain. We start by articulating IoT data characteristics and identifying two major treatments for IoT data from a machine learning perspective, namely IoT big data analytics and IoT streaming data analytics. We also discuss why DL is a promising approach to achieve the desired analytics in these types of data and applications. The potential of using emerging DL techniques for IoT data analytics are then discussed, and its promises and challenges are introduced. We present a comprehensive background on different DL architectures and algorithms. We also analyze and summarize major reported research attempts that leveraged DL in the IoT domain. The smart IoT devices that have incorporated DL in their intelligence background are also discussed. DL implementation approaches on the fog and cloud centers in support of IoT applications are also surveyed. Finally, we shed light on some challenges and potential directions for future research. At the end of each section, we highlight the lessons learned based on our experiments and review of the recent literature.
translated by 谷歌翻译
在智能交通系统中,随着我们走向智能城市时代,监控和分析道路使用者的实时系统变得越来越重要。基于视觉的对象检测,多目标跟踪和交通事故近似检测框架是智能交通系统的重要应用,特别是在视频监控等领域。虽然深度神经网络最近在许多计算机视觉任务中取得了巨大成功,但所有的计算机框架都采用了统一的框架。从实时性能,复杂的城市环境,高度动态的交通事件和许多交通运动的需求中挑战成倍增加的三项任务仍然具有挑战性。在本文中,我们提出了一种双流卷积网络架构,可以对交通视频数据中的道路使用者进行实时检测,跟踪和近距离事故检测。双流模型包括用于对象检测的空间流网络和用于多对象跟踪的时间流网络容忍运动特征。我们通过结合双流网络的外观特征和运动特征来检测近距离事故。使用航拍视频,我们提出了一种交通事故近似数据集(TNAD),涵盖了适用于基于视觉的流量分析任务的各种类型的交通互动。我们的实验证明了我们的框架在TNAD数据集上具有高帧率的整体竞争定性和定量性能的优势。
translated by 谷歌翻译
云计算平台提供的MLaaS(ML-as-a-Service)产品近来越来越受欢迎。预先训练的机器学习模型被部署在云上以支持基于预测的应用程序和服务。为了实现更高的吞吐量,通过在不同机器上同时运行模型的多个副本来服务传入请求。分布式推理中的斯特拉格勒节点的发生率是一个重要的问题,因为它会增加推理延迟,违反服务的SLO。在本文中,我们提出了一种新的编码推理模型来处理分布图像分类中的落后者。我们提出改进的单镜头物体检测模型,Collage-CNN模型,以有效地提供必要的弹性。拼贴 - CNN模型采用拼合图像形成的拼贴图像作为输入,并在一次拍摄中执行多图像分类。我们使用来自标准图像分类数据集的图像生成自定义训练项目,并训练模型以实现高分类准确性。在云中部署Collage-CNN模型,我们证明与基于复制的方法相比,第99百分位延迟可以减少1.45倍至2.46倍,并且不会降低预测准确性。
translated by 谷歌翻译
检测行人和其他移动物体的能力对于自主车辆至关重要。这必须以最小的系统开销实时完成。本文讨论了环绕视图系统的实现,以识别移动以及靠近自我车辆的静态物体。该算法适用于由鱼眼摄像机捕获的4个视图,这些视图被合并到单个帧中。运动物体检测和跟踪解决方案使用最小系统开销来隔离包含运动对象的感兴趣区域(ROI)。然后使用深度神经网络(DNN)对运动物体进行分类来分析这些ROI。通过在真实的汽车城市环境中进行部署和测试,我们已经证明了该解决方案的实际可行性。我们算法的视频演示已上传到Youtube:https://youtu.be/vpoCfC724iA,https://youtu.be/2X4aqH2bMBs
translated by 谷歌翻译