Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. Recently, SNNs with backpropagation through time (BPTT) have achieved a higher accuracy result on image recognition tasks than other SNN training algorithms. Despite the success from the algorithm perspective, prior works neglect the evaluation of the hardware energy overheads of BPTT due to the lack of a hardware evaluation platform for this SNN training algorithm. Moreover, although SNNs have long been seen as an energy-efficient counterpart of ANNs, a quantitative comparison between the training cost of SNNs and ANNs is missing. To address the aforementioned issues, in this work, we introduce SATA (Sparsity-Aware Training Accelerator), a BPTT-based training accelerator for SNNs. The proposed SATA provides a simple and re-configurable systolic-based accelerator architecture, which makes it easy to analyze the training energy for BPTT-based SNN training algorithms. By utilizing the sparsity, SATA increases its computation energy efficiency by $5.58 \times$ compared to the one without using sparsity. Based on SATA, we show quantitative analyses of the energy efficiency of SNN training and compare the training cost of SNNs and ANNs. The results show that, on Eyeriss-like systolic-based architecture, SNNs consume $1.27\times$ more total energy with sparsities when compared to ANNs. We find that such high training energy cost is from time-repetitive convolution operations and data movements during backpropagation. Moreover, to propel the future SNN training algorithm design, we provide several observations on energy efficiency for different SNN-specific training parameters and propose an energy estimation framework for SNN training. Code for our framework is made publicly available.
translated by 谷歌翻译
在本文中,我们提出了一种节能的SNN体系结构,该体系结构可以通过提高的精度无缝地运行深度尖峰神经网络(SNN)。首先,我们提出了一个转换意识培训(CAT),以减少无硬件实施开销而无需安排SNN转换损失。在拟议的CAT中,可以有效利用用于在ANN训练过程中模拟SNN的激活函数,以减少转换后的数据表示误差。基于CAT技术,我们还提出了一项首要尖峰编码,该编码可以通过使用SPIKE时间信息来轻巧计算。支持提出技术的SNN处理器设计已使用28nm CMOS流程实施。该处理器的推理能量分别为486.7UJ,503.6UJ和1426UJ的最高1级准确性,分别为91.7%,67.9%和57.4%,分别为CIFAR-10,CIFAR-100和TININE-IMIMAGENET处理。16具有5位对数权重。
translated by 谷歌翻译
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.
translated by 谷歌翻译
由于其异步,稀疏和二进制信息处理,尖峰神经网络(SNN)最近成为人工神经网络(ANN)的低功耗替代品。为了提高能源效率和吞吐量,可以在使用新兴的非挥发性(NVM)设备在模拟域中实现多重和蓄积(MAC)操作的回忆横梁上实现SNN。尽管SNN与回忆性横梁具有兼容性,但很少关注固有的横杆非理想性和随机性对SNN的性能的影响。在本文中,我们对SNN在非理想横杆上的鲁棒性进行了全面分析。我们检查通过学习算法训练的SNN,例如,替代梯度和ANN-SNN转换。我们的结果表明,跨多个时间阶段的重复横梁计算会导致错误积累,从而导致SNN推断期间的性能下降。我们进一步表明,经过较少时间步长培训的SNN在部署在磁带横梁上时可以更好地准确。
translated by 谷歌翻译
尖峰神经网络是低功率环境的有效计算模型。基于SPIKE的BP算法和ANN-TO-SNN(ANN2SNN)转换是SNN培训的成功技术。然而,尖峰碱BP训练速度很慢,需要大量的记忆成本。尽管Ann2NN提供了一种培训SNN的低成本方式,但它需要许多推理步骤才能模仿训练有素的ANN以表现良好。在本文中,我们提出了一个snn-to-ang(SNN2ANN)框架,以快速和记忆的方式训练SNN。 SNN2ANN由2个组成部分组成:a)ANN和SNN和B)尖峰映射单元之间的重量共享体系结构。首先,该体系结构在ANN分支上训练重量共享参数,从而快速训练和SNN的记忆成本较低。其次,尖峰映射单元确保ANN的激活值是尖峰特征。结果,可以通过训练ANN分支来优化SNN的分类误差。此外,我们设计了一种自适应阈值调整(ATA)算法来解决嘈杂的尖峰问题。实验结果表明,我们的基于SNN2ANN的模型在基准数据集(CIFAR10,CIFAR100和TININE-IMAGENET)上表现良好。此外,SNN2ANN可以在0.625倍的时间步长,0.377倍训练时间,0.27倍GPU内存成本以及基于SPIKE的BP模型的0.33倍尖峰活动下实现可比精度。
translated by 谷歌翻译
Spiking Neural Networks (SNNs) are bio-plausible models that hold great potential for realizing energy-efficient implementations of sequential tasks on resource-constrained edge devices. However, commercial edge platforms based on standard GPUs are not optimized to deploy SNNs, resulting in high energy and latency. While analog In-Memory Computing (IMC) platforms can serve as energy-efficient inference engines, they are accursed by the immense energy, latency, and area requirements of high-precision ADCs (HP-ADC), overshadowing the benefits of in-memory computations. We propose a hardware/software co-design methodology to deploy SNNs into an ADC-Less IMC architecture using sense-amplifiers as 1-bit ADCs replacing conventional HP-ADCs and alleviating the above issues. Our proposed framework incurs minimal accuracy degradation by performing hardware-aware training and is able to scale beyond simple image classification tasks to more complex sequential regression tasks. Experiments on complex tasks of optical flow estimation and gesture recognition show that progressively increasing the hardware awareness during SNN training allows the model to adapt and learn the errors due to the non-idealities associated with ADC-Less IMC. Also, the proposed ADC-Less IMC offers significant energy and latency improvements, $2-7\times$ and $8.9-24.6\times$, respectively, depending on the SNN model and the workload, compared to HP-ADC IMC.
translated by 谷歌翻译
尖峰神经网络(SNN)由于其固有的高表象激活而引起了传统人工神经网络(ANN)的势能有效替代品。但是,大多数先前的SNN方法都使用类似Ann的架构(例如VGG-NET或RESNET),这可以为SNN中二进制信息的时间序列处理提供亚最佳性能。为了解决这个问题,在本文中,我们介绍了一种新型的神经体系结构搜索(NAS)方法,以找到更好的SNN体系结构。受到最新的NAS方法的启发,这些方法从初始化时从激活模式中找到了最佳体系结构,我们选择了可以代表不同数据样本的不同尖峰激活模式的体系结构,而无需训练。此外,为了进一步利用尖峰之间的时间信息,我们在层之间搜索馈电的连接以及向后连接(即时间反馈连接)。有趣的是,我们的搜索算法发现的SNASNET通过向后连接实现了更高的性能,这表明设计SNN体系结构以适当使用时间信息的重要性。我们对三个图像识别基准进行了广泛的实验,我们表明SNASNET可以实现最新的性能,而时间段明显较低(5个时间段)。代码可在GitHub上找到。
translated by 谷歌翻译
由于稀疏,异步和二进制事件(或尖峰)驱动加工,尖峰神经网络(SNNS)最近成为深度学习的替代方案,可以在神经形状硬件上产生巨大的能效益。然而,从划痕训练高精度和低潜伏期的SNN,患有尖刺神经元的非微弱性质。要在SNNS中解决此培训问题,我们重新批准批量标准化,并通过时间(BNTT)技术提出时间批量标准化。大多数先前的SNN工程到现在忽略了批量标准化,认为它无效地训练时间SNN。与以前的作品不同,我们提出的BNTT沿着时轴沿着时间轴解耦的参数,以捕获尖峰的时间动态。在BNTT中的时间上不断发展的可学习参数允许神经元通过不同的时间步长来控制其尖峰率,从头开始实现低延迟和低能量训练。我们对CiFar-10,CiFar-100,微小想象特和事件驱动的DVS-CIFAR10数据集进行实验。 BNTT允许我们首次在三个复杂的数据集中培训深度SNN架构,只需25-30步即可。我们还使用BNTT中的参数分布提前退出算法,以降低推断的延迟,进一步提高了能量效率。
translated by 谷歌翻译
编译器框架对于广泛使用基于FPGA的深度学习加速器来说是至关重要的。它们允许研究人员和开发人员不熟悉硬件工程,以利用域特定逻辑所获得的性能。存在传统人工神经网络的各种框架。然而,没有多大的研究努力已经进入创建针对尖刺神经网络(SNNS)进行优化的框架。这种新一代的神经网络对于在边缘设备上部署AI的越来越有趣,其具有紧密的功率和资源约束。我们的端到端框架E3NE为FPGA自动生成高效的SNN推理逻辑。基于Pytorch模型和用户参数,它应用各种优化,并评估基于峰值的加速器固有的权衡。多个水平的并行性和新出现的神经编码方案的使用导致优于先前的SNN硬件实现的效率。对于类似的型号,E3NE使用的硬件资源的少于50%,功率较低20%,同时通过幅度降低延迟。此外,可扩展性和通用性允许部署大规模的SNN模型AlexNet和VGG。
translated by 谷歌翻译
由于它们的时间加工能力及其低交换(尺寸,重量和功率)以及神经形态硬件中的节能实现,尖峰神经网络(SNNS)已成为传统人工神经网络(ANN)的有趣替代方案。然而,培训SNNS所涉及的挑战在准确性方面有限制了它们的表现,从而限制了他们的应用。因此,改善更准确的特征提取的学习算法和神经架构是SNN研究中的当前优先级之一。在本文中,我们展示了现代尖峰架构的关键组成部分的研究。我们在从最佳执行网络中凭经验比较了图像分类数据集中的不同技术。我们设计了成功的残余网络(Reset)架构的尖峰版本,并测试了不同的组件和培训策略。我们的结果提供了SNN设计的最新版本,它允许在尝试构建最佳视觉特征提取器时进行明智的选择。最后,我们的网络优于CIFAR-10(94.1%)和CIFAR-100(74.5%)数据集的先前SNN架构,并将现有技术与DVS-CIFAR10(71.3%)相匹配,参数较少而不是先前的状态艺术,无需安静转换。代码在https://github.com/vicenteax/spiking_resnet上获得。
translated by 谷歌翻译
深度神经网络(DNN)的记录断裂性能具有沉重的参数化,导致外部动态随机存取存储器(DRAM)进行存储。 DRAM访问的禁用能量使得在资源受限的设备上部署DNN是不普遍的,呼叫最小化重量和数据移动以提高能量效率。我们呈现SmartDeal(SD),算法框架,以进行更高成本的存储器存储/访问的较低成本计算,以便在推理和培训中积极提高存储和能量效率。 SD的核心是一种具有结构约束的新型重量分解,精心制作以释放硬件效率潜力。具体地,我们将每个重量张量分解为小基矩阵的乘积以及大的结构稀疏系数矩阵,其非零被量化为-2的功率。由此产生的稀疏和量化的DNN致力于为数据移动和重量存储而大大降低的能量,因为由于稀疏的比特 - 操作和成本良好的计算,恢复原始权重的最小开销。除了推理之外,我们采取了另一次飞跃来拥抱节能培训,引入创新技术,以解决培训时出现的独特障碍,同时保留SD结构。我们还设计专用硬件加速器,充分利用SD结构来提高实际能源效率和延迟。我们在不同的设置中对多个任务,模型和数据集进行实验。结果表明:1)应用于推理,SD可实现高达2.44倍的能效,通过实际硬件实现评估; 2)应用于培训,储存能量降低10.56倍,减少了10.56倍和4.48倍,与最先进的训练基线相比,可忽略的准确性损失。我们的源代码在线提供。
translated by 谷歌翻译
基于von-neumann架构的传统计算系统,数据密集型工作负载和应用程序(如机器学习)和应用程序都是基本上限制的。随着数据移动操作和能量消耗成为计算系统设计中的关键瓶颈,对近数据处理(NDP),机器学习和特别是神经网络(NN)的加速器等非传统方法的兴趣显着增加。诸如Reram和3D堆叠的新兴内存技术,这是有效地架构基于NN的基于NN的加速器,因为它们的工作能力是:高密度/低能量存储和近记忆计算/搜索引擎。在本文中,我们提出了一种为NN设计NDP架构的技术调查。通过基于所采用的内存技术对技术进行分类,我们强调了它们的相似之处和差异。最后,我们讨论了需要探索的开放挑战和未来的观点,以便改进和扩展未来计算平台的NDP架构。本文对计算机学习领域的计算机架构师,芯片设计师和研究人员来说是有价值的。
translated by 谷歌翻译
Spiking Neural networks (SNN) have emerged as an attractive spatio-temporal computing paradigm for a wide range of low-power vision tasks. However, state-of-the-art (SOTA) SNN models either incur multiple time steps which hinder their deployment in real-time use cases or increase the training complexity significantly. To mitigate this concern, we present a training framework (from scratch) for one-time-step SNNs that uses a novel variant of the recently proposed Hoyer regularizer. We estimate the threshold of each SNN layer as the Hoyer extremum of a clipped version of its activation map, where the clipping threshold is trained using gradient descent with our Hoyer regularizer. This approach not only downscales the value of the trainable threshold, thereby emitting a large number of spikes for weight update with a limited number of iterations (due to only one time step) but also shifts the membrane potential values away from the threshold, thereby mitigating the effect of noise that can degrade the SNN accuracy. Our approach outperforms existing spiking, binary, and adder neural networks in terms of the accuracy-FLOPs trade-off for complex image recognition tasks. Downstream experiments on object detection also demonstrate the efficacy of our approach.
translated by 谷歌翻译
我们提出了一种新的学习算法,使用传统的人工神经网络(ANN)作为代理训练尖刺神经网络(SNN)。我们分别与具有相同网络架构和共享突触权重的集成和火(IF)和Relu神经元进行两次SNN和ANN网络。两个网络的前进通过完全独立。通过假设具有速率编码的神经元作为Relu的近似值,我们将SNN中的SNN的误差进行了回复,以更新共享权重,只需用SNN的ANN最终输出替换ANN最终输出。我们将建议的代理学习应用于深度卷积的SNNS,并在Fahion-Mnist和CiFar10的两个基准数据集上进行评估,分别为94.56%和93.11%的分类准确性。所提出的网络可以优于培训的其他深鼻涕,训练,替代学习,代理梯度学习,或从深处转换。转换的SNNS需要长时间的仿真时间来达到合理的准确性,而我们的代理学习导致高效的SNN,模拟时间较短。
translated by 谷歌翻译
The term ``neuromorphic'' refers to systems that are closely resembling the architecture and/or the dynamics of biological neural networks. Typical examples are novel computer chips designed to mimic the architecture of a biological brain, or sensors that get inspiration from, e.g., the visual or olfactory systems in insects and mammals to acquire information about the environment. This approach is not without ambition as it promises to enable engineered devices able to reproduce the level of performance observed in biological organisms -- the main immediate advantage being the efficient use of scarce resources, which translates into low power requirements. The emphasis on low power and energy efficiency of neuromorphic devices is a perfect match for space applications. Spacecraft -- especially miniaturized ones -- have strict energy constraints as they need to operate in an environment which is scarce with resources and extremely hostile. In this work we present an overview of early attempts made to study a neuromorphic approach in a space context at the European Space Agency's (ESA) Advanced Concepts Team (ACT).
translated by 谷歌翻译
尖峰神经网络(SNNS)最近成为新一代的低功耗深神经网络,二进制尖峰在多个时间步中传达信息。 SNN的修剪非常重要,因为它们被部署在资源控制移动/边缘设备上。先前的SNN修剪作品的重点是浅SNN(2〜6层),但是,最深层的SNN(> 16层)是由最先进的SNN作品提出的,这很难与当前的修剪工作兼容。为了扩展针对深SNN的修剪技术,我们研究了彩票假说(LTH),该假说(LTH)指出,密集的网络包含较小的子网络(即获胜的票),这些子网与密集网络相当。我们对LTH的研究表明,获胜的门票始终存在于各种数据集和体系结构的深SNN中,可提供多达97%的稀疏性,而没有巨大的性能降级。但是,LTH的迭代搜索过程与SNN的多个时间段相结合时,带来了巨大的培训计算成本。为了减轻这种沉重的搜索成本,我们提出了早期(ET)票,从而从少量的时间步中找到重要的重量连接性。提出的ET票可以与常见的修剪技术无缝结合,以查找获胜门票,例如迭代级修剪(IMP)和早鸟(EB)门票。我们的实验结果表明,与IMP或EB方法相比,提出的ET票可将搜索时间缩短多达38%。
translated by 谷歌翻译
我们日常生活中的深度学习是普遍存在的,包括自驾车,虚拟助理,社交网络服务,医疗服务,面部识别等,但是深度神经网络在训练和推理期间需要大量计算资源。该机器学习界主要集中在模型级优化(如深度学习模型的架构压缩),而系统社区则专注于实施级别优化。在其间,在算术界中提出了各种算术级优化技术。本文在模型,算术和实施级技术方面提供了关于资源有效的深度学习技术的调查,并确定了三种不同级别技术的资源有效的深度学习技术的研究差距。我们的调查基于我们的资源效率度量定义,阐明了较低级别技术的影响,并探讨了资源有效的深度学习研究的未来趋势。
translated by 谷歌翻译
Event-based simulations of Spiking Neural Networks (SNNs) are fast and accurate. However, they are rarely used in the context of event-based gradient descent because their implementations on GPUs are difficult. Discretization with the forward Euler method is instead often used with gradient descent techniques but has the disadvantage of being computationally expensive. Moreover, the lack of precision of discretized simulations can create mismatches between the simulated models and analog neuromorphic hardware. In this work, we propose a new exact error-backpropagation through spikes method for SNNs, extending Fast \& Deep to multiple spikes per neuron. We show that our method can be efficiently implemented on GPUs in a fully event-based manner, making it fast to compute and precise enough for analog neuromorphic hardware. Compared to the original Fast \& Deep and the current state-of-the-art event-based gradient-descent algorithms, we demonstrate increased performance on several benchmark datasets with both feedforward and convolutional SNNs. In particular, we show that multi-spike SNNs can have advantages over single-spike networks in terms of convergence, sparsity, classification latency and sensitivity to the dead neuron problem.
translated by 谷歌翻译
由于降低了von-neumann架构运行深度学习模型的功耗的基本限制,在聚光灯下,基于低功率尖刺神经网络的神经栓塞系统的研究。为了整合大量神经元,神经元需要设计占据一个小面积,而是随着技术缩小,模拟神经元难以缩放,并且它们遭受降低的电压净空/动态范围和电路非线性。鉴于此,本文首先模拟了在28nm工艺中设计的现有电流镜的电压域神经元的非线性行为,并显示了神经元非线性的效果严重降低了SNN推理精度。然后,为了减轻这个问题,我们提出了一种新的神经元,该新型神经元在时域中加入输入的尖峰,并且大大改善了线性度,从而改善了与现有电压域神经元相比的推理精度。在Mnist DataSet上进行测试,所提出的神经元的推理误差率与理想神经元的引起误差率不同于0.1%。
translated by 谷歌翻译
尖峰神经网络(SNNS)模仿大脑中信息传播可以通过离散和稀疏的尖峰来能够能够通过离散和稀疏的尖峰来处理时空信息,从而受到相当大的关注。为了提高SNN的准确性和能源效率,大多数以前的研究仅集中在训练方法上,并且很少研究建筑的效果。我们研究了先前研究中使用的设计选择,从尖峰的准确性和数量来看,发现它们不是最适合SNN的。为了进一步提高准确性并减少SNN产生的尖峰,我们提出了一个称为Autosnn的尖峰感知神经体系结构搜索框架。我们定义一个搜索空间,该搜索空间由架构组成,而没有不良的设计选择。为了启用Spike-Aware Architecture搜索,我们引入了一种健身,该健身既考虑尖峰的准确性和数量。 Autosnn成功地搜索了SNN体系结构,这些体系结构在准确性和能源效率方面都超过了手工制作的SNN。我们彻底证明了AutoSNN在包括神经形态数据集在内的各种数据集上的有效性。
translated by 谷歌翻译