在资源受限的嵌入式系统上部署卷积神经网络的关键推动力是二进制神经网络(BNN)。 BNNS通过将功能和权重进行分配来保存内存并简化计算。不幸的是,二进制不可避免地伴随着准确性的严重降低。为了减少二进制和完整精确网络之间的准确性差距,最近提出了许多维修方法,我们已经将其分类并在本章中进行了单一概述。维修方法分为两个主要分支,培训技术和网络拓扑变化,可以进一步分为较小的类别。后一个类别为嵌入式系统引入了额外的成本(能源消耗或额外的面积),而前者则没有。从我们的概述中,我们可以观察到在减少准确性差距方面取得了进展,但是BNN论文并不对应使用哪种修复方法进行对齐,以获得高度准确的BNN。因此,本章包含一项经验综述,该综述评估了许多维修方法的好处,而不是Resnet-20 \&Cifar10和Resnet-18 \&Cifar100基准。我们发现三个维修类别最有益:功能二进制器,功能归一化和双重残留。基于这篇评论,我们讨论未来的方向和研究机会。我们勾勒出与BNN在嵌入式系统上相关的收益和成本,因为BNN是否能够缩小准确性差距,同时在资源受限的嵌入式系统上保持高能效率仍然有待观察。
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
模型二进制化是一种压缩神经网络并加速其推理过程的有效方法。但是,1位模型和32位模型之间仍然存在显着的性能差距。实证研究表明,二进制会导致前进和向后传播中的信息损失。我们提出了一个新颖的分布敏感信息保留网络(DIR-NET),该网络通过改善内部传播和引入外部表示,将信息保留在前后传播中。 DIR-NET主要取决于三个技术贡献:(1)最大化二进制(IMB)的信息:最小化信息损失和通过重量平衡和标准化同时同时使用权重/激活的二进制误差; (2)分布敏感的两阶段估计器(DTE):通过共同考虑更新能力和准确的梯度来通过分配敏感的软近似来保留梯度的信息; (3)代表性二进制 - 意识蒸馏(RBD):通过提炼完整精确和二元化网络之间的表示来保留表示信息。 DIR-NET从统一信息的角度研究了BNN的前进过程和后退过程,从而提供了对网络二进制机制的新见解。我们的DIR-NET中的三种技术具有多功能性和有效性,可以在各种结构中应用以改善BNN。关于图像分类和客观检测任务的综合实验表明,我们的DIR-NET始终优于主流和紧凑型体系结构(例如Resnet,vgg,vgg,EfficityNet,darts和mobilenet)下最新的二进制方法。此外,我们在现实世界中的资源有限设备上执行DIR-NET,该设备可实现11.1倍的存储空间和5.4倍的速度。
translated by 谷歌翻译
We introduce a method to train Quantized Neural Networks (QNNs) -neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At traintime the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.
translated by 谷歌翻译
模型量化已成为加速深度学习推理的不可或缺的技术。虽然研究人员继续推动量化算法的前沿,但是现有量化工作通常是不可否认的和不可推销的。这是因为研究人员不选择一致的训练管道并忽略硬件部署的要求。在这项工作中,我们提出了模型量化基准(MQBench),首次尝试评估,分析和基准模型量化算法的再现性和部署性。我们为实际部署选择多个不同的平台,包括CPU,GPU,ASIC,DSP,并在统一培训管道下评估广泛的最新量化算法。 MQBENCK就像一个连接算法和硬件的桥梁。我们进行全面的分析,并找到相当大的直观或反向直观的见解。通过对齐训练设置,我们发现现有的算法在传统的学术轨道上具有大致相同的性能。虽然用于硬件可部署量化,但有一个巨大的精度差距,仍然不稳定。令人惊讶的是,没有现有的算法在MQBench中赢得每一项挑战,我们希望这项工作能够激发未来的研究方向。
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译
We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32× memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a Binary-Weight-Network version of AlexNet is the same as the full-precision AlexNet. We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than 16% in top-1 accuracy. Our code is available at: http://allenai.org/plato/xnornet.
translated by 谷歌翻译
Top-1 ImageNet优化促进了可能在推理设置中不切实际的网络。二元神经网络(BNN)具有显着降低计算强度,但现有模型的质量低。为了克服这种缺陷,我们提出了PokeConv,一个二进制卷积块,这是通过添加多个剩余路径的技术提高BNN的质量,并调整激活函数。我们将其应用于Reset-50并优化Reset的初始卷积层,这很难二向化。我们命名由此产生的网络系列POKBNN。选择这些技术以产生最高1精度和网络成本的良好改进。为了使成本的联合优化以及准确性,我们定义算术计算工作(ACE),用于量化和二值化网络的硬件和能量启发成本度量。我们还确定需要优化控制二值化梯度近似的探索过的超参数。我们在高精度上建立了一种新的,强大的最先进(SOTA),以及常用的CPU64成本,ACE成本和网络大小指标。 ReactNET-ADAM是BNN中的先前SOTA,实现了7.9 ACE的70.5%的前1个精度。一小块的炭达到70.5%的前1个,成本降低超过3倍;一个较大的POKBNN以7.8 ACE获得75.6%的顶级1,在不增加成本的情况下,准确性提高超过5%以上。 JAX /亚麻和再现说明中的POKEBNN实现是开放的。
translated by 谷歌翻译
基于量化的模型压缩是高性能和快速的推理方法,其产生与其全精密浮点对应物相比高压缩的模型。最极端的量化是参数的1位表示,使得它们仅具有两个可能的值,通常是-1(0)或+1,从而能够仅使用添加能够实现普遍存在的点产品。这项工作的主要贡献是引入一种方法来平滑确定权重的二进制载体的组合问题,以通过反向衰竭的经验风险最小化最小化给定目标的预期损失。这是通过利用实际值,连续参数的确定性和可微分的变换来近似多变量二进制状态来实现的。该方法在训练中增加了很少的开销,可以很容易地应用于对原始架构的任何实质性修改,不会引入额外的饱和非线性或辅助损耗,并且不会禁止应用其他二值化的其他方法。与文献中的常见断言相反,证明二元加权网络可以用相同的标准优化技术和类似的封路数据设置作为它们的全精密对应物,专门具有大学学习率和$ L_2 $正常化的动量SGD。为了得出结论实验,证明该方法在与其全精密对应物相比,各种架构的许多电感图像分类任务中的方法非常好。源代码在https://bitbucket.org/yanivshu/binary_weighted_networks_public中公开可用。
translated by 谷歌翻译
二进制神经网络(BNN)是卷积神经网络(CNN)的极端量化版本,其所有功能和权重映射到仅1位。尽管BNN节省了大量的内存和计算需求以使CNN适用于边缘或移动设备,但由于二进制后的表示能力降低,BNN遭受了网络性能的下降。在本文中,我们提出了一个新的可更换且易于使用的卷积模块reponv,该模块reponv通过复制输入或沿通道维度的输出来增强特征地图,而不是$ \ beta $ times,而没有额外的参数和卷积计算费用。我们还定义了一组Reptran规则,可以在整个BNN模块中使用Repconv,例如二进制卷积,完全连接的层和批处理归一化。实验表明,在Reptran转换之后,一组高度引用的BNN与原始BNN版本相比,实现了普遍的性能。例如,Rep-Recu-Resnet-20的前1位准确性,即REPBCONV增强的RECU-RESNET-20,在CIFAR-10上达到了88.97%,比原始网络高1.47%。 Rep-Adambnn-Reactnet-A在Imagenet上获得了71.342%的TOP-1精度,这是BNN的最新结果。代码和型号可在以下网址提供:https://github.com/imfinethanks/rep_adambnn。
translated by 谷歌翻译
本文研究了重量和激活都将二进制神经网络(BNN)二进制为1位值,从而大大降低了记忆使用率和计算复杂性。由于现代深层神经网络具有复杂的设计,具有复杂的架构,其准确性,因此权重和激活分布的多样性非常高。因此,传统的符号函数不能很好地用于有效地在BNN中进行全精度值。为此,我们提出了一种称为Adabin的简单而有效的方法,可自适应获得最佳的二进制集$ \ {b_1,b_2 \} $($ b_1,b_1,b_2 \ in \ mathbb {r} $)的重量和激活而不是固定集(即$ \ { - 1,+1 \} $)。通过这种方式,提出的方法可以更好地拟合不同的分布,并提高二进制特征的表示能力。实际上,我们使用中心位置和1位值的距离来定义新的二进制量化函数。对于权重,我们提出了一种均衡方法,将对称分布的对称中心与实价分布相对,并最大程度地减少它们的kullback-leibler差异。同时,我们引入了一种基于梯度的优化方法,以获取这两个激活参数,这些参数以端到端的方式共同训练。基准模型和数据集的实验结果表明,拟议的Adabin能够实现最新性能。例如,我们使用RESNET-18体系结构在Imagenet上获得66.4 \%TOP-1的精度,并使用SSD300获得了Pascal VOC的69.4映射。
translated by 谷歌翻译
由于神经网络变得更加强大,因此在现实世界中部署它们的愿望是一个上升的愿望;然而,神经网络的功率和准确性主要是由于它们的深度和复杂性,使得它们难以部署,尤其是在资源受限的设备中。最近出现了神经网络量化,以满足这种需求通过降低网络的精度来降低神经网络的大小和复杂性。具有较小和更简单的网络,可以在目标硬件的约束中运行神经网络。本文调查了在过去十年中开发的许多神经网络量化技术。基于该调查和神经网络量化技术的比较,我们提出了该地区的未来研究方向。
translated by 谷歌翻译
我们日常生活中的深度学习是普遍存在的,包括自驾车,虚拟助理,社交网络服务,医疗服务,面部识别等,但是深度神经网络在训练和推理期间需要大量计算资源。该机器学习界主要集中在模型级优化(如深度学习模型的架构压缩),而系统社区则专注于实施级别优化。在其间,在算术界中提出了各种算术级优化技术。本文在模型,算术和实施级技术方面提供了关于资源有效的深度学习技术的调查,并确定了三种不同级别技术的资源有效的深度学习技术的研究差距。我们的调查基于我们的资源效率度量定义,阐明了较低级别技术的影响,并探讨了资源有效的深度学习研究的未来趋势。
translated by 谷歌翻译
混合精确的深神经网络达到了硬件部署所需的能源效率和吞吐量,尤其是在资源有限的情况下,而无需牺牲准确性。但是,不容易找到保留精度的最佳每层钻头精度,尤其是在创建巨大搜索空间的大量模型,数据集和量化技术中。为了解决这一困难,最近出现了一系列文献,并且已经提出了一些实现有希望的准确性结果的框架。在本文中,我们首先总结了文献中通常使用的量化技术。然后,我们对混合精液框架进行了彻底的调查,该调查是根据其优化技术进行分类的,例如增强学习和量化技术,例如确定性舍入。此外,讨论了每个框架的优势和缺点,我们在其中呈现并列。我们最终为未来的混合精液框架提供了指南。
translated by 谷歌翻译
In this work we introduce a binarized deep neural network (BDNN) model. BDNNs are trained using a novel binarized back propagation algorithm (BBP), which uses binary weights and binary neurons during the forward and backward propagation, while retaining precision of the stored weights in which gradients are accumulated. At test phase, BDNNs are fully binarized and can be implemented in hardware with low circuit complexity. The proposed binarized networks can be implemented using binary convolutions and proxy matrix multiplications with only standard binary XNOR and population count (popcount) operations. BBP is expected to reduce energy consumption by at least two orders of magnitude when compared to the hardware implementation of existing training algorithms. We obtained near state-of-the-art results with BDNNs on the permutation-invariant MNIST, CIFAR-10 and SVHN datasets.
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
深度学习在广泛的AI应用方面取得了有希望的结果。较大的数据集和模型一致地产生更好的性能。但是,我们一般花费更长的培训时间,以更多的计算和沟通。在本调查中,我们的目标是在模型精度和模型效率方面提供关于大规模深度学习优化的清晰草图。我们调查最常用于优化的算法,详细阐述了大批量培训中出现的泛化差距的可辩论主题,并审查了解决通信开销并减少内存足迹的SOTA策略。
translated by 谷歌翻译
当今的大多数计算机视觉管道都是围绕深神经网络构建的,卷积操作需要大部分一般的计算工作。与标准算法相比,Winograd卷积算法以更少的MAC计算卷积,当使用具有2x2尺寸瓷砖$ F_2 $的版本时,3x3卷积的操作计数为2.25倍。即使收益很大,Winograd算法具有较大的瓷砖尺寸,即$ f_4 $,在提高吞吐量和能源效率方面具有更大的潜力,因为它将所需的MAC降低了4倍。不幸的是,具有较大瓷砖尺寸的Winograd算法引入了数值问题,这些问题阻止了其在整数域特异性加速器上的使用和更高的计算开销,以在空间和Winograd域之间转换输入和输出数据。为了解锁Winograd $ F_4 $的全部潜力,我们提出了一种新颖的Tap-Wise量化方法,该方法克服了使用较大瓷砖的数值问题,从而实现了仅整数的推断。此外,我们介绍了以功率和区域效率的方式处理Winograd转换的自定义硬件单元,并展示了如何将此类自定义模块集成到工业级,可编程的DSA中。对大量最先进的计算机视觉基准进行了广泛的实验评估表明,Tap-Wise量化算法使量化的Winograd $ F_4 $网络几乎与FP32基线一样准确。 Winograd增强的DSA可实现高达1.85倍的能源效率,最高可用于最先进的细分和检测网络的端到端速度高达1.83倍。
translated by 谷歌翻译
With the advent of deep learning application on edge devices, researchers actively try to optimize their deployments on low-power and restricted memory devices. There are established compression method such as quantization, pruning, and architecture search that leverage commodity hardware. Apart from conventional compression algorithms, one may redesign the operations of deep learning models that lead to more efficient implementation. To this end, we propose EuclidNet, a compression method, designed to be implemented on hardware which replaces multiplication, $xw$, with Euclidean distance $(x-w)^2$. We show that EuclidNet is aligned with matrix multiplication and it can be used as a measure of similarity in case of convolutional layers. Furthermore, we show that under various transformations and noise scenarios, EuclidNet exhibits the same performance compared to the deep learning models designed with multiplication operations.
translated by 谷歌翻译
Binary neural networks are the extreme case of network quantization, which has long been thought of as a potential edge machine learning solution. However, the significant accuracy gap to the full-precision counterparts restricts their creative potential for mobile applications. In this work, we revisit the potential of binary neural networks and focus on a compelling but unanswered problem: how can a binary neural network achieve the crucial accuracy level (e.g., 80%) on ILSVRC-2012 ImageNet? We achieve this goal by enhancing the optimization process from three complementary perspectives: (1) We design a novel binary architecture BNext based on a comprehensive study of binary architectures and their optimization process. (2) We propose a novel knowledge-distillation technique to alleviate the counter-intuitive overfitting problem observed when attempting to train extremely accurate binary models. (3) We analyze the data augmentation pipeline for binary networks and modernize it with up-to-date techniques from full-precision models. The evaluation results on ImageNet show that BNext, for the first time, pushes the binary model accuracy boundary to 80.57% and significantly outperforms all the existing binary networks. Code and trained models are available at: https://github.com/hpi-xnor/BNext.git.
translated by 谷歌翻译