深度神经网络在一系列任务上的性能显着提高,对计算资源的需求不断增长,从而使低资源设备(内存和电池电量有限)的部署不可行。与实价模型相比,二元神经网络(BNNS)在极端的压缩和加速增长方面解决了问题。我们提出了一种简单但有效的方法,通过通过早期验证策略统一BNN来加速推理。我们的方法允许简单实例根据决策阈值尽早退出,并利用添加到不同中间层的输出层以避免执行整个二进制模型。我们对三个音频分类任务以及四个BNNS架构进行了广泛评估我们的方法。我们的方法证明了有利的质量效率权衡,同时可以通过系统用户指定的基于熵的阈值来控制。它还基于现有的BNN体系结构而无需进行不同效率水平的单个模型,从而获得更好的加速(延迟小于6ms)。它还提供了一种直接的方法来估计样本难度和对数据集中某些类别周围不确定性的更好理解。
translated by 谷歌翻译
这项工作介绍了Brillsson,这是一种基于二进制神经网络的新型表示模型,用于广泛的非语义语音任务。我们从一个大型且价值的琐事模型中使用知识蒸馏来训练该模型,其中仅用于训练Trillsson的数据集中只有一小部分。由此产生的Brillsson型号的尺寸仅为2MB,潜伏期小于8ms,使其适合在低资源设备(例如可穿戴设备)中部署。我们在八项基准任务(包括但不限于口语识别,情感识别,荒地状况诊断和关键字斑点)上评估布里尔森,并证明我们提出的拟议的超轻质和低延迟模型以及大型模型以及大型模型。
translated by 谷歌翻译
Deep neural networks have long training and processing times. Early exits added to neural networks allow the network to make early predictions using intermediate activations in the network in time-sensitive applications. However, early exits increase the training time of the neural networks. We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. QuickNets are trained in a layer-wise manner such that each successive layer is only trained on samples that could not be correctly classified by the previous layers. We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation. Additionally, we introduce commitment layers that significantly improve the early exits by identifying for over-confident predictions and demonstrate its success.
translated by 谷歌翻译
减少大深度学习模型的处理时间的问题是许多现实世界应用中的根本挑战。早期退出方法通过将附加内部分类器(IC)附加到神经网络的中间层来努力实现这一目标。 IC可以快速返回简单示例的预测,结果,降低整个模型的平均推理时间。但是,如果特定IC不决定早期回答,则其预测被丢弃,其计算有效地浪费。为了解决这个问题,我们引入零时间浪费(ZTW),这是一种新的方法,其中每个IC重用由其前辈返回的预测(1)在IC和(2)之间以相对于类似的方式组合先前输出之间的直接连接。我们对各个数据集和架构进行了广泛的实验,以证明ZTW实现了比最近提出的早期退出方法的其他更好的比例与推理时间权衡。
translated by 谷歌翻译
由于最近在ML和IoT中的突破,部署机器学习(ML)在MilliWatt-Scale-Scale-Scale-Scale Edge设备(Tinyml)上正在越来越受欢迎。但是,Tinyml的功能受到严格的功率和计算约束的限制。 Tinyml中的大多数当代研究都集中在模型压缩技术上,例如模型修剪和量化,以适合低端设备上的ML模型。然而,由于积极的压缩迅速缩小了模型能力和准确性,因此通过现有技术获得的能源消耗和推理时间的改善是有限的。在保留其模型容量的同时,改善推理时间和/或降低功率的另一种方法是通过早期筛选网络。这些网络将中间分类器沿基线神经网络放置,如果中间分类器对其预测表现出足够的信心,则可以促进神经网络计算的早期退出。早期效果网络的先前工作集中在大型网络上,超出了通常用于Tinyml应用程序的功能。在本文中,我们讨论了将早期外观添加到最先进的小型CNN中的挑战,并设计了一种早期筛选架构T-RECX,以解决这些挑战。此外,我们开发了一种方法来减轻在最终退出中通过利用早期外观学到的高级代表性来减轻网络过度思考的影响。我们从MLPERF微小的基准套件中评估了三个CNN的T-RECX,用于图像分类,关键字发现和视觉唤醒单词检测任务。我们的结果表明,T-RECX提高了基线网络的准确性,并显着减少了微小CNN的平均推理时间。 T-RECX达到了32.58%的平均拖鞋降低,以换取所有评估模型的1%精度。此外,我们的技术提高了我们评估的三个模型中的两个基线网络的准确性
translated by 谷歌翻译
通过利用数据示例多样性,早期的exit网络最近成为一种突出的神经网络体系结构,以加速深度学习推断过程。但是,早期出口的中间分类器会引入其他计算开销,这对于资源约束的边缘人工智能(AI)不利。在本文中,我们提出了一种早期退出预测机制,以减少由早期EXIT网络支持的设备边缘共同指导系统中的设备计算开销。具体而言,我们设计了一个低复杂性模块,即出口预测指标,以指导一些明显的“硬”样品以绕过早期出口的计算。此外,考虑到不同的通信带宽,我们扩展了潜伏期感知的边缘推理的提前退出预测机制,该机制通过一些简单的回归模型适应了出口预测变量的预测阈值和早期EXEST网络的置信阈值。广泛的实验结果证明了退出预测因子在早期EXIT网络的准确性和设备计算开销之间取得更好的权衡。此外,与基线方法相比,在不同的带宽条件下,提出的延迟感知边缘推理的方法可以达到更高的推理精度。
translated by 谷歌翻译
基于惯性数据的人类活动识别(HAR)是从智能手机到超低功率传感器的嵌入式设备上越来越扩散的任务。由于深度学习模型的计算复杂性很高,因此大多数嵌入式HAR系统基于简单且不那么精确的经典机器学习算法。这项工作弥合了在设备上的HAR和深度学习之间的差距,提出了一组有效的一维卷积神经网络(CNN),可在通用微控制器(MCUS)上部署。我们的CNN获得了将超参数优化与子字节和混合精确量化的结合,以在分类结果和记忆职业之间找到良好的权衡。此外,我们还利用自适应推断作为正交优化,以根据处理后的输入来调整运行时的推理复杂性,从而产生更灵活的HAR系统。通过在四个数据集上进行实验,并针对超低功率RISC-V MCU,我们表明(i)我们能够为HAR获得一组丰富的帕累托(Pareto)最佳CNN,以范围超过1个数量级记忆,潜伏期和能耗; (ii)由于自适应推断,我们可以从单个CNN开始得出> 20个运行时操作模式,分类分数的不同程度高达10%,并且推理复杂性超过3倍,并且内存开销有限; (iii)在四个基准中的三个基准中,我们的表现都超过了所有以前的深度学习方法,将记忆占用率降低了100倍以上。获得更好性能(浅层和深度)的少数方法与MCU部署不兼容。 (iv)我们所有的CNN都与推理延迟<16ms的实时式evice Har兼容。他们的记忆职业在0.05-23.17 kb中有所不同,其能源消耗为0.005和61.59 UJ,可在较小的电池供应中进行多年的连续操作。
translated by 谷歌翻译
语义细分是许多视觉系统的骨干,从自动驾驶汽车和机器人导航到增强现实和电信。在有限的资源信封内经常在严格的延迟约束下运行,对有效执行的优化变得很重要。同时,目标平台的异质功能以及不同应用程序的不同限制需要设计和培训多个针对特定目标的细分模型,从而导致过度维护成本。为此,我们提出了一个框架,用于将最新的分割CNN转换为多EXIT语义细分(MESS)网络:经过特殊训练的模型,这些模型沿其深度沿其深度进行参数化的早期出口到i)在推断过程中动态保存计算更容易的样本和ii)通过提供可定制的速度准确性权衡来节省培训和维护成本。设计和培训此类网络天真地损害了性能。因此,我们为多EXIT网络提出了新颖的两期培训方案。此外,Mess的参数化可以使附件分割头的数字,位置和体系结构以及退出策略通过详尽的搜索在<1GPUH中进行部署。这使得混乱能够快速适应每个目标用例的设备功能和应用要求,并提供火车一路上的部署解决方案。与原始的骨干网络相比,Mess变体具有相同精度的潜伏期增长率高达2.83倍,而相同的计算预算的潜伏期提高到同一计算预算的准确性高5.33 pp。最后,与最先进的技术相比,MESS提供了更快的架构选择订单。
translated by 谷歌翻译
Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real-time and energy-sensitive applications.To address this issue, we present BranchyNet, a novel deep network architecture that is augmented with additional side branch classifiers. The architecture allows prediction results for a large portion of test samples to exit the network early via these branches when samples can already be inferred with high confidence. BranchyNet exploits the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points. For more difficult samples, which are expected less frequently, BranchyNet will use further or all network layers to provide the best likelihood of correct prediction. We study the BranchyNet architecture using several well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) and show that it can both improve accuracy and significantly reduce the inference time of the network.
translated by 谷歌翻译
目前,这是一个热门的研究主题,可以在深度学习和物联网技术的帮助下实现大量光谱数据的准确,高效和实时识别。深度神经网络在光谱分析中起着关键作用。但是,更深层模型的推断是以静态方式进行的,不能根据设备进行调整。并非所有样本都需要分配所有计算以实现自信的预测,这阻碍了最大化整体性能。为了解决上述问题,我们提出了一个具有自适应推理的光谱数据分类框架。具体而言,要为不同样本分配不同的计算,同时更好地利用不同设备之间的协作,我们利用早期外观体系结构,将中间分类器放置在架构的不同深度,并在预测置信度达到预设阈值时输出结果。我们提出了一个自我介绍学习的训练范式,最深的分类器对浅的分类器进行了软监督,以最大程度地提高其性能和训练速度。同时,为了减轻早期外观范式中中间分类器的位置和数字设置的性能脆弱性,我们提出了一个自适应的残留网络。它可以调整不同曲线位置下每个块中的层数,因此它可以专注于曲线的重要位置(例如:拉曼峰),并根据任务性能和计算资源准确地分配适当的计算预算。据我们所知,本文是首次尝试通过自适应推断物联网平台下的光谱检测来进行优化。我们进行了许多实验,实验结果表明,我们所提出的方法可以比现有方法实现更高的计算预算性能。
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
诸如智能手机和自治车辆的移动设备越来越依赖深神经网络(DNN)来执行复杂的推理任务,例如图像分类和语音识别等。但是,在移动设备上连续执行整个DNN可以快速消耗其电池。虽然任务卸载到云/边缘服务器可能会降低移动设备的计算负担,但信道质量,网络和边缘服务器负载中的不稳定模式可能导致任务执行的显着延迟。最近,已经提出了基于分割计算(SC)的方法,其中DNN被分成在移动设备上和边缘服务器上执行的头部和尾模型。最终,这可能会降低带宽使用以及能量消耗。另一种叫做早期退出(EE)的方法,列车模型在架构中呈现多个“退出”,每个都提供越来越高的目标准确性。因此,可以根据当前条件或应用需求进行准确性和延迟之间的权衡。在本文中,我们通过呈现最相关方法的比较,对SC和EE策略进行全面的综合调查。我们通过提供一系列引人注目的研究挑战来结束论文。
translated by 谷歌翻译
空间冗余广泛存在于视觉识别任务中,即图像或视频帧中的判别特征通常对应于像素的子集,而剩余区域与手头的任务无关。因此,在时间和空间消耗方面,处理具有相等计算量的所有像素的静态模型导致相当冗余。在本文中,我们将图像识别问题标准为顺序粗致细特征学习过程,模仿人类视觉系统。具体地,所提出的浏览和焦点网络(GFNET)首先以低分辨率比例提取输入图像的快速全局表示,然后策略性地参加一系列突出(小)区域以学习更精细的功能。顺序过程自然地促进了在测试时间的自适应推断,因为一旦模型对其预测充分信心,可以终止它,避免了进一步的冗余计算。值得注意的是,在我们模型中定位判别区域的问题被制定为增强学习任务,因此不需要除分类标签之外的其他手动注释。 GFNET是一般的,灵活,因为它与任何现成的骨干网型号(例如MobileCenets,Abservennet和TSM)兼容,可以方便地部署为特征提取器。对各种图像分类和视频识别任务的广泛实验以及各种骨干模型,证明了我们方法的显着效率。例如,它通过1.3倍降低了高效MobileNet-V3的平均等待时间,而不会牺牲精度。代码和预先训练的模型可在https://github.com/blackfeather-wang/gfnet-pytorch获得。
translated by 谷歌翻译
深神经网络(DNN)已成为许多应用程序域(包括基于Web的服务)的重要组成部分。这些服务需要高吞吐量和(接近)实时功能,例如,对用户的请求做出反应或反应,或者按时处理传入数据流。但是,DNN设计的趋势是朝着具有许多层和参数的较大模型,以实现更准确的结果。尽管这些模型通常是预先训练的,但是在如此大的模型中,计算复杂性仍然相对显着,从而阻碍了低推断潜伏期。实施缓存机制是用于加速服务响应时间的典型系统工程解决方案。但是,传统的缓存通常不适合基于DNN的服务。在本文中,我们提出了一种端到端自动化解决方案,以根据其计算复杂性和推理延迟来提高基于DNN的服务的性能。我们的缓存方法采用了DNN模型和早期出口的自我介绍的思想。提出的解决方案是一种自动化的在线层缓存机制,如果提前出口之一中的高速缓存模型足够有信心,则可以在推理时间提早退出大型模型。本文的主要贡献之一是,我们将该想法实施为在线缓存,这意味着缓存模型不需要访问培训数据,并且仅根据运行时的传入数据执行,使其适用于应用程序使用预训练的模型。我们的实验在两个下游任务(面部和对象分类)上结果表明,平均而言,缓存可以将这些服务的计算复杂性降低到58 \%(就FLOPS计数而言),并将其推断潜伏期提高到46 \%精度低至零至零。
translated by 谷歌翻译
Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. This brings a train-test mismatch problem that all the prediction heads are optimized on all types of data in training phase while the deeper heads will only see difficult inputs in testing phase. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. To mitigate this problem, we formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively. We name our method BoostNet. Our experiments show it achieves the state-of-the-art performance on CIFAR100 and ImageNet datasets in both anytime and budgeted-batch prediction modes. Our code is released at https://github.com/SHI-Labs/Boosted-Dynamic-Networks.
translated by 谷歌翻译
具有早期退出机制的最先进的神经网络通常需要大量的培训和微调,以通过低计算成本来实现良好的性能。我们提出了一种新颖的早期出口技术,基于样本的类手段,提前出口课程(E $^2 $ cm)。与大多数现有方案不同,E $^2 $ cm不需要基于梯度的内部分类器培训,并且不会通过任何方式修改基本网络。这使其对于低功率设备的神经网络培训特别有用,如无线边缘网络。我们评估了E $^2 $ cm的性能和间接费用,例如MobileNetV3,EdgisterNet,Resnet和数据集,例如CIFAR-100,Imagenet和KMNIST。我们的结果表明,鉴于固定的培训时间预算,与现有的早期退出机制相比,E $^2 $ cm的准确性更高。此外,如果培训时间预算没有限制,则可以将E $^2 $ cm与现有的早期退出计划相结合,以提高后者的性能,从而在计算成本和网络准确性之间取得更好的权衡。我们还表明,E $^2 $ cm可用于降低无监督学习任务中的计算成本。
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition and natural language understanding. One of the significant contributors to its success is the proliferation of end devices that acted as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference is the main challenge. Usually, central cloud servers are used for the computation, but it opens up other significant challenges, such as high latency, increased communication costs, and privacy concerns. To mitigate these drawbacks, considerable efforts have been made to push the processing of DL models to edge servers. Moreover, the confluence point of DL and edge has given rise to edge intelligence (EI). This survey paper focuses primarily on the fifth level of EI, called all in-edge level, where DL training and inference (deployment) are performed solely by edge servers. All in-edge is suitable when the end devices have low computing resources, e.g., Internet-of-Things, and other requirements such as latency and communication cost are important in mission-critical applications, e.g., health care. Firstly, this paper presents all in-edge computing architectures, including centralized, decentralized, and distributed. Secondly, this paper presents enabling technologies, such as model parallelism and split learning, which facilitate DL training and deployment at edge servers. Thirdly, model adaptation techniques based on model compression and conditional computation are described because the standard cloud-based DL deployment cannot be directly applied to all in-edge due to its limited computational resources. Fourthly, this paper discusses eleven key performance metrics to evaluate the performance of DL at all in-edge efficiently. Finally, several open research challenges in the area of all in-edge are presented.
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译
最近,使用卷积神经网络(CNNS)存在移动和嵌入式应用的爆炸性增长。为了减轻其过度的计算需求,开发人员传统上揭示了云卸载,突出了高基础设施成本以及对网络条件的强烈依赖。另一方面,强大的SOC的出现逐渐启用设备执行。尽管如此,低端和中层平台仍然努力充分运行最先进的CNN。在本文中,我们展示了Dyno,一种分布式推断框架,将两全其人的最佳框架结合起来解决了几个挑战,例如设备异质性,不同的带宽和多目标要求。启用这是其新的CNN特定数据包装方法,其在onloading计算时利用CNN的不同部分的精度需求的可变性以及其新颖的调度器,该调度器共同调谐分区点并在运行时传输数据精度适应其执行环境的推理。定量评估表明,Dyno优于当前最先进的,通过竞争对手的CNN卸载系统,在竞争对手的CNN卸载系统上提高吞吐量超过一个数量级,最高可达60倍的数据。
translated by 谷歌翻译