我们介绍了一个开源深学习库的Pytorchvideo,为各种视频理解任务提供了丰富的模块化,高效,可重复的组件,包括分类,检测,自我监督学习和低级处理。该库涵盖了一系列视频理解工具,包括复制最先进的性能的多模式数据加载,转换和模型。Pytorchvideo进一步支持硬件加速,从而实现移动设备上的实时推断。图书馆基于Pytorch,可以由任何培训框架使用;例如,pytorchlightning,pyslowfast或优雅的愿景。pytorchvideo在https://pytorchvideo.org/提供
translated by 谷歌翻译
随着机器学习系统的计算要求以及机器学习框架的规模和复杂性的增加,基本框架创新变得具有挑战性。尽管计算需求驱动了最近的编译器,网络和硬件的进步,但通过机器学习工具对这些进步的利用却以较慢的速度发生。这部分是由于与现有框架原型制作新的计算范式有关的困难。大型框架将机器学习研究人员和从业人员作为最终用户的优先级优先,并且很少关注能够向前推动框架的系统研究人员 - 我们认为两者都是同等重要的利益相关者。我们介绍了手电筒,这是一个开源库,旨在通过优先考虑开放式,模块化,可定制的内部设备以及最新的,可用于研究的模型和培训设置,以刺激机器学习工具和系统的创新。手电筒使系统研究人员能够快速原型并尝试机器学习计算中的新思想,并且开销低,与其他流行的机器学习框架竞争并经常超过其他流行的机器学习框架。我们将手电筒视为一种工具,可以使可以使广泛使用的图书馆受益,并使机器学习和系统研究人员更加紧密地结合在一起。手电筒可从https://github.com/flashlight/flashlight获得。
translated by 谷歌翻译
机器学习的进步为低端互联网节点(例如微控制器)带来了新的机会,将情报带入了情报。传统的机器学习部署具有较高的记忆力,并计算足迹阻碍了其在超资源约束的微控制器上的直接部署。本文强调了为MicroController类设备启用机载机器学习的独特要求。研究人员为资源有限的应用程序使用专门的模型开发工作流程,以确保计算和延迟预算在设备限制之内,同时仍保持所需的性能。我们表征了微控制器类设备的机器学习模型开发的广泛适用的闭环工作流程,并表明几类应用程序采用了它的特定实例。我们通过展示多种用例,将定性和数值见解介绍到模型开发的不同阶段。最后,我们确定了开放的研究挑战和未解决的问题,要求仔细考虑前进。
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying generalpurpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (≈ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
translated by 谷歌翻译
受益于扩大云基础设施,今天深度神经网络(DNN)在云中培训时具有越来越高的性能。研究人员花了几个月的努力,竞争额外的模型精度百分比。但是,当这些模型实际上在实践中部署在边缘设备上时,通常情况可能会突然下降超过10%而无明显原因。关键挑战是,在边缘设备上对ML推理执行并不多的可见性,并且在边缘部署过程中对潜在问题的认识很少。我们呈现ml-exray,一个端到端的框架,它提供了ML执行的层级细节的可见性,并帮助开发人员分析和调试云到边缘部署问题。更常见的是,子最佳边缘性能的原因不仅可以在模型本身中介绍,而是在整个数据流和部署过程中的每一个操作。评估显示ML-EXRARE可以有效地捕获部署问题,例如使用ML-EXRARE的预处理错误,量化问题,次优内核等,用户需要写入不到15行代码以完全检查边缘部署管道。消除这些问题,ML-EXRARE可以通过最多30%的模型性能,Pinpoint忽略层,指导用户通过两个数量级来优化内核执行延迟。代码和API将被释放为开源多语言仪表库和Python部署验证库。
translated by 谷歌翻译
Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered stateof-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/ huggingface/transformers.
translated by 谷歌翻译
机器学习传感器代表了嵌入式机器学习应用程序未来的范式转移。当前的嵌入式机器学习(ML)实例化遭受了复杂的整合,缺乏模块化以及数据流动的隐私和安全问题。本文提出了一个以数据为中心的范式,用于将传感器智能嵌入边缘设备上,以应对这些挑战。我们对“传感器2.0”的愿景需要将传感器输入数据和ML处理从硬件级别隔离到更广泛的系统,并提供一个薄的界面,以模拟传统传感器的功能。这种分离导致模块化且易于使用的ML传感器设备。我们讨论了将ML处理构建到嵌入式系统上控制微处理器的软件堆栈中的标准方法所带来的挑战,以及ML传感器的模块化如何减轻这些问题。 ML传感器提高了隐私和准确性,同时使系统构建者更容易将ML集成到其产品中,以简单的组件。我们提供了预期的ML传感器和说明性数据表的例子,以表现出来,并希望这将建立对话使我们朝着传感器2.0迈进。
translated by 谷歌翻译
Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered stateof-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/ huggingface/transformers.
translated by 谷歌翻译
近年来,在移动设备上部署深度学习(DL)一直是一个显着的趋势。为了支持对开发DL的快速推断,DL库作为算法和硬件扮演着至关重要的角色。不幸的是,先前的工作从未深入现代DL Libs的生态系统,并为其性能提供定量结果。在本文中,我们首先建立了一个全面的基准,其中包括6个代表性DL LIB和15种多元化的DL模型。然后,我们在10个移动设备上进行了广泛的实验,这有助于揭示当前移动DL LIBS生态系统的完整景观。例如,我们发现表现最佳的DL LIB在不同的模型和硬件中严重碎片,这些DL Libs之间的差距可能相当巨大。实际上,DL LIB的影响会淹没算法或硬件的优化,例如模型量化和基于GPU/DSP的异质计算。最后,在观察结果上,我们总结了对DL Lib生态系统中不同角色的实际意义。
translated by 谷歌翻译
对将AI功能从云上的数据中心转移到边缘或最终设备的需求越来越大,这是由在智能手机,AR/VR设备,自动驾驶汽车和各种汽车上运行的快速实时AI的应用程序举例说明的。物联网设备。然而,由于DNN计算需求与边缘或最终设备上的计算能力之间的较大增长差距,这种转变受到了严重的阻碍。本文介绍了XGEN的设计,这是DNN的优化框架,旨在弥合差距。 XGEN将横切共同设计作为其一阶考虑。它的全栈AI面向AI的优化包括在DNN软件堆栈的各个层的许多创新优化,所有这些优化都以合作的方式设计。独特的技术使XGEN能够优化各种DNN,包括具有极高深度的DNN(例如Bert,GPT,其他变形金刚),并生成代码比现有DNN框架中的代码快几倍,同时提供相同的准确性水平。
translated by 谷歌翻译
传统的数据湖泊通过启用时间旅行,运行SQL查询,使用酸性交易摄入数据以及可视化PBABYTE尺度数据集在云存储中,为分析工作负载提供了关键的数据基础架构。它们使组织能够分解数据孤岛,解锁数据驱动的决策,提高运营效率并降低成本。但是,随着深度学习接管常见的分析工作流程,传统数据湖泊对诸如自然语言处理(NLP),音频处理,计算机视觉和涉及非尾巴数据集的应用程序的有用程度降低。本文介绍了Deep Lake,这是一个开源湖泊,用于在Activeloop开发的深度学习应用程序。 Deep Lake保持了一项关键区别的香草数据湖的好处:它以张量的形式存储复杂数据,例如图像,视频,注释以及表格数据,并将数据迅速流式传输到网络上(a )张量查询语言,(b)浏览器可视化引擎或(c)不牺牲GPU利用率的深度学习框架。可以从Pytorch,Tensorflow,Jax,与许多MLOPS工具集成在一起的数据集。
translated by 谷歌翻译
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.
translated by 谷歌翻译
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-ofthe-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator.The system is open sourced and in production use inside several major companies.
translated by 谷歌翻译
Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.
translated by 谷歌翻译
本文介绍了有关如何架构,设计和优化深神经网络(DNN)的最新概述,以提高性能并保留准确性。该论文涵盖了一组跨越整个机器学习处理管道的优化。我们介绍两种类型的优化。第一个改变了DNN模型,需要重新训练,而第二个则不训练。我们专注于GPU优化,但我们认为提供的技术可以与其他AI推理平台一起使用。为了展示DNN模型优化,我们在流行的Edge AI推理平台(Nvidia Jetson Agx Xavier)上改善了光流的最先进的深层网络体系结构之一,RAFT ARXIV:2003.12039。
translated by 谷歌翻译
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: https://github. com/mit-han-lab/temporal-shift-module.
translated by 谷歌翻译
我们介绍了Lavis,这是一个开源深度学习库,用于语言视觉研究和应用。拉维斯(Lavis)的目标是作为一个一站式综合图书馆,它为研究人员和从业人员提供了可访问语言视觉领域的最新进步,并赋予未来的研究和发展。它具有统一的界面,可轻松访问最新的图像语言,视频语言模型和常见数据集。 Lavis支持对各种任务的培训,评估和基准测试,包括多模式分类,检索,字幕,视觉问题答案,对话和预训练。同时,该库还高度可扩展且可配置,从而促进了未来的开发和定制。在此技术报告中,我们描述了图书馆的设计原理,关键组成部分和功能,并在常见的语言视觉任务中提出基准测试结果。该库可在以下网址获得:https://github.com/salesforce/lavis。
translated by 谷歌翻译
深神经网络(DNNS)在各种机器学习(ML)应用程序中取得了巨大成功,在计算机视觉,自然语言处理和虚拟现实等中提供了高质量的推理解决方案。但是,基于DNN的ML应用程序也带来计算和存储要求的增加了很多,对于具有有限的计算/存储资源,紧张的功率预算和较小形式的嵌入式系统而言,这尤其具有挑战性。挑战还来自各种特定应用的要求,包括实时响应,高通量性能和可靠的推理准确性。为了应对这些挑战,我们介绍了一系列有效的设计方法,包括有效的ML模型设计,定制的硬件加速器设计以及硬件/软件共同设计策略,以启用嵌入式系统上有效的ML应用程序。
translated by 谷歌翻译
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, generalpurpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that Tensor-Flow achieves for several real-world applications.
translated by 谷歌翻译