智能论文笔记

Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark

Shiv Ram Dubey , Satish Kumar Singh , Bidyut Baran Chaudhuri

分类：机器学习 | 神经与进化计算

2021-09-29

近年来，神经网络已显示出巨大的增长，以解决许多问题。已经引入了各种类型的神经网络来处理不同类型的问题。但是，任何神经网络的主要目标是使用层层次结构将非线性可分离的输入数据转换为更线性可分离的抽象特征。这些层是线性和非线性函数的组合。最流行和常见的非线性层是激活功能（AFS），例如Logistic Sigmoid，Tanh，Relu，Elu，Swish和Mish。在本文中，在神经网络中为AFS提供了全面的概述和调查，以进行深度学习。涵盖了不同类别的AFS，例如Logistic Sigmoid和Tanh，基于RELU，基于ELU和基于学习的AFS。还指出了AFS的几种特征，例如输出范围，单调性和平滑度。在具有不同类型的数据的不同网络的18个最先进的AF中，还进行了性能比较。提出了AFS的见解，以使研究人员受益于进一步的研究和从业者在不同选择中进行选择。用于实验比较的代码发布于：\ url {https://github.com/shivram1987/activationfunctions}。

translated by 谷歌翻译

How important are activation functions in regression and classification? A survey, performance comparison, and future directions

Ameya D. Jagtap , George Em Karniadakis

分类：机器学习

2022-09-06

受生物神经元的启发，激活功能在许多现实世界中常用的任何人工神经网络的学习过程中起着重要作用。文献中已经提出了各种激活功能，用于分类和回归任务。在这项工作中，我们调查了过去已经使用的激活功能以及当前的最新功能。特别是，我们介绍了多年来激活功能的各种发展以及这些激活功能的优势以及缺点或局限性。我们还讨论了经典（固定）激活功能，包括整流器单元和自适应激活功能。除了基于表征的激活函数的分类法外，还提出了基于应用的激活函数的分类法。为此，对MNIST，CIFAR-10和CIFAR-100等分类数据集进行了各种固定和自适应激活函数的系统比较。近年来，已经出现了一个具有物理信息的机器学习框架，以解决与科学计算有关的问题。为此，我们还讨论了在物理知识的机器学习框架中使用的激活功能的各种要求。此外，使用Tensorflow，Pytorch和Jax等各种机器学习库之间进行了不同的固定和自适应激活函数进行各种比较。

translated by 谷歌翻译

Adaptively Customizing Activation Functions for Various Layers

Haigen Hu , Aizhu Liu , Qiu Guan , Xiaoxin Li , Shengyong Chen , Qianwei Zhou

分类：计算机视觉 | 机器学习

2021-12-17

为了增强神经网络的非线性并提高输入和响应变量之间的映射能力，激活函数在数据中扮演更复杂的关系和模式的重要作用。在这项工作中，提出了一种新颖的方法，仅通过向传统的激活功能（如Sigmoid，TanH和Relu）添加很少的参数来自适应地自定义激活函数。为了验证所提出的方法的有效性，提出了关于加速收敛性和提高性能的一些理论和实验分析，并基于各种网络模型进行一系列实验（例如AlexNet，Vggnet，Googlenet，Reset和DenSenet）和各种数据集（如Cifar10，CiFar100，MiniimAgenet，Pascal VOC和Coco）。为了进一步验证各种优化策略和使用场景中的有效性和适用性，还在不同的优化策略（如SGD，势头，adagrad，Adadelta和AdaDelta和Adam）之间实施了一些比较实验以及与分类和检测等不同的识别任务。结果表明，提出的方法非常简单，但在收敛速度，精度和泛化方面具有显着性能，它可以超越像雷丝和自适应功能等其他流行的方法，如在整体性能方面几乎所有实验。该代码公开可在https://github.com/huhaigen/aptove-custivation-操作系统上使用。该包装包括所提出的三种自适应激活功能，可用于可重复性目的。

translated by 谷歌翻译

Nish: A Novel Negative Stimulated Hybrid Activation Function

Yildiray Anagun , Sahin Isik

分类：机器学习 | 人工智能 | 计算机视觉 | 神经与进化计算

2022-10-17

An activation function has a significant impact on the efficiency and robustness of the neural networks. As an alternative, we evolved a cutting-edge non-monotonic activation function, Negative Stimulated Hybrid Activation Function (Nish). It acts as a Rectified Linear Unit (ReLU) function for the positive region and a sinus-sigmoidal function for the negative region. In other words, it incorporates a sigmoid and a sine function and gaining new dynamics over classical ReLU. We analyzed the consistency of the Nish for different combinations of essential networks and most common activation functions using on several most popular benchmarks. From the experimental results, we reported that the accuracy rates achieved by the Nish is slightly better than compared to the Mish in classification.

translated by 谷歌翻译

Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons

Matthew Mithra Noel , Shubham Bharadwaj , Venkataraman Muthiah-Nakarajan , Praneet Dutta , Geraldine Bessie Amali

分类：神经与进化计算

2021-11-07

非线性激活功能赋予神经网络，具有学习复杂的高维功能的能力。激活功能的选择是一个重要的超参数，确定深神经网络的性能。它显着影响梯度流动，训练速度，最终是神经网络的表示力。像Sigmoids这样的饱和活化功能遭受消失的梯度问题，不能用于深神经网络。通用近似定理保证，Sigmoids和Relu的多层网络可以学习任意复杂的连续功能，以任何准确性。尽管多层神经网络来学习任意复杂的激活功能，但传统神经网络中的每个神经元（使用SIGMOIDS和Relu类似的网络）具有单个超平面作为其决策边界，因此进行线性分类。因此，具有S形，Relu，Swish和Mish激活功能的单个神经元不能学习XOR函数。最近的研究已经发现了两层和三个人皮层中的生物神经元，具有摆动激活功能并且能够单独学习XOR功能。生物神经元中振荡激活功能的存在可能部分解释生物和人工神经网络之间的性能差距。本文提出了4个新的振荡激活功能，使单个神经元能够在没有手动功能工程的情况下学习XOR功能。本文探讨了使用振荡激活功能来解决较少神经元并减少培训时间的分类问题的可能性。

translated by 谷歌翻译

First Power Linear Unit with Sign

Boxi Duan

分类：计算机视觉

2021-11-29

本文提出了一种新的和富有激光激活方法，被称为FPLUS，其利用具有形式的极性标志的数学功率函数。它是通过常见的逆转操作来启发，同时赋予仿生学的直观含义。制剂在某些先前知识和预期特性的条件下理论上得出，然后通过使用典型的基准数据集通过一系列实验验证其可行性，其结果表明我们的方法在许多激活功能中拥有卓越的竞争力，以及兼容稳定性许多CNN架构。此外，我们将呈现给更广泛类型的功能延伸到称为PFPlus的函数，具有两个可以固定的或学习的参数，以便增加其表现力的容量，并且相同的测试结果验证了这种改进。

translated by 谷歌翻译

Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators

Scott C. Lowe , Robert Earle , Jason d'Eon , Thomas Trappenberg , Sageev Oore

分类：机器学习 | 人工智能 | 计算机视觉

2021-10-22

The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independent probabilities. Such theories are important to formalize more complex dendritic operations in real neurons, and these operations can be used as activation functions within a neural network, introducing probabilistic Boolean-logic as the core operation of the neural network. Since these functions involve taking multiple exponents and logarithms, they are computationally expensive and not well suited to be directly used within neural networks. Consequently, we construct efficient approximations named $\text{AND}_\text{AIL}$ (the AND operator Approximate for Independent Logits), $\text{OR}_\text{AIL}$, and $\text{XNOR}_\text{AIL}$, which utilize only comparison and addition operations, have well-behaved gradients, and can be deployed as activation functions in neural networks. Like MaxOut, $\text{AND}_\text{AIL}$ and $\text{OR}_\text{AIL}$ are generalizations of ReLU to two-dimensions. While our primary aim is to formalize dendritic computations within a logit-space probabilistic-Boolean framework, we deploy these new activation functions, both in isolation and in conjunction to demonstrate their effectiveness on a variety of tasks including image classification, transfer learning, abstract reasoning, and compositional zero-shot learning.

translated by 谷歌翻译

1-Dimensional polynomial neural networks for audio signal related problems

Habib Ben Abdallah , Christopher J. Henry , Sheela Ramanna

分类：机器学习

2020-09-09

除了极其非线性的情况外，如果不是数十亿个参数来解决或至少要获得良好的解决方案，并且众所周知，众所周知，众所周知，并且通过深化和扩大其拓扑来实现复杂性的神经网络增加更好近似所需的非线性水平。然而，紧凑的拓扑始终优先于更深的拓扑，因为它们提供了使用较少计算单元和更少参数的优势。这种兼容性以减少的非线性的价格出现，因此有限的解决方案搜索空间。我们提出了使用自动多项式内核估计的1维多项式神经网络（1DPNN）模型，用于1维卷积神经网络（1dcnns），并且从第一层引入高度的非线性，这可以补偿深度的需要和/或宽拓扑。我们表明，这种非线性使得模型能够产生比与音频信号相关的各种分类和回归问题的常规1dcnn的计算和空间复杂性更好的结果，即使它在神经元水平上引入了更多的计算和空间复杂性。实验在三个公共数据集中进行，并证明，在解决的问题上，所提出的模型可以在更少的时间内从数据中提取比1dcnn更多的相关信息，并且存储器较少。

translated by 谷歌翻译

Non-linear Neurons with Human-like Apical Dendrite Activations

Mariana-Iuliana Georgescu , Radu Tudor Ionescu , Nicolae-Catalin Ristea , Nicu Sebe

分类：神经与进化计算 | 计算机视觉 | 机器学习 | (统计)机器学习

2020-02-02

为了对线性不可分离的数据进行分类，神经元通常被组织成具有至少一个隐藏层的多层神经网络。灵感来自最近神经科学的发现，我们提出了一种新的神经元模型以及一种新的激活函数，可以使用单个神经元来学习非线性决策边界。我们表明标准神经元随后是新颖的顶端枝晶激活（ADA）可以使用100 \％的精度来学习XOR逻辑函数。此外，我们在计算机视觉，信号处理和自然语言处理中进行五个基准数据集进行实验，即摩洛哥，utkface，crema-d，时尚mnist和微小的想象成，表明ADA和泄漏的ADA功能提供了卓越的结果用于各种神经网络架构的整流线性单元（Relu），泄漏的Relu，RBF和嗖嗖声，例如单隐层或两个隐藏层的多层的Perceptrons（MLPS）和卷积神经网络（CNNS），如LENET，VGG，RESET和字符级CNN。当我们使用具有顶端树突激活（Pynada）的金字塔神经元改变神经元的标准模型时，我们获得进一步的性能改进。我们的代码可用于：https://github.com/raduionescu/pynada。

translated by 谷歌翻译

Resource-Efficient Neural Networks for Embedded Systems

Wolfgang Roth , Günther Schindler , Bernhard Klein , Robert Peharz , Sebastian Tschiatschek , Holger Fröning , Franz Pernkopf , Zoubin Ghahramani

分类： (统计)机器学习 | 机器学习

2020-01-07

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.

translated by 谷歌翻译

Two Decades of Bengali Handwritten Digit Recognition: A Survey

A. B. M. Ashikur Rahman , Md. Bakhtiar Hasan , Sabbir Ahmed , Tasnim Ahmed , Md. Hamjajul Ashmafee , Mohammad Ridwan Kabir , Md. Hasanul Kabir

分类：计算机视觉

2022-06-05

手写数字识别（HDR）是光学特征识别（OCR）领域中最具挑战性的任务之一。不管语言如何，HDR都存在一些固有的挑战，这主要是由于个人跨个人的写作风格的变化，编写媒介和环境的变化，无法在反复编写任何数字等时保持相同的笔触。除此之外，特定语言数字的结构复杂性可能会导致HDR的模棱两可。多年来，研究人员开发了许多离线和在线HDR管道，其中不同的图像处理技术与传统的机器学习（ML）基于基于的和/或基于深度学习（DL）的体系结构相结合。尽管文献中存在有关HDR的广泛审查研究的证据，例如：英语，阿拉伯语，印度，法尔西，中文等，但几乎没有对孟加拉人HDR（BHDR）的调查，这缺乏对孟加拉语HDR（BHDR）的研究，而这些调查缺乏对孟加拉语HDR（BHDR）的研究。挑战，基础识别过程以及可能的未来方向。在本文中，已经分析了孟加拉语手写数字的特征和固有的歧义，以及二十年来最先进的数据集的全面见解和离线BHDR的方法。此外，还详细讨论了一些涉及BHDR的现实应用特定研究。本文还将作为对离线BHDR背后科学感兴趣的研究人员的汇编，煽动了对相关研究的新途径的探索，这可能会进一步导致在不同应用领域对孟加拉语手写数字进行更好的离线认识。

translated by 谷歌翻译

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze , Yu-Hsin Chen , Tien-Ju Yang , Joel Emer

分类：

2017-03-27

Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems.This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-designs, being proposed in academia and industry.The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the trade-offs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

translated by 谷歌翻译

Neural Networks with A La Carte Selection of Activation Functions

Moshe Sipper

分类：神经与进化计算 | 机器学习

2022-06-24

激活功能（AFS）对于神经网络的成功（或失败）至关重要，近年来受到了越来越多的关注，研究人员寻求设计新颖的AFS来改善网络性能的某些方面。在本文中，我们采取了另一个方向，其中我们将一系列已知的AFS结合到成功的体系结构中，提出了三种方法来进行有益地这样做的方法：1）随机生成AF架构，2）使用Optuna，一个自动超级参数优化软件框架，一个自动的超级参数优化框架，带有树结构化的parzen估计量（TPE）采样器，3）将Optuna与协方差矩阵适应演化策略（CMA-ES）采样器一起使用。我们表明，与由Relu隐藏单元和SoftMax输出单元组成的标准网络相比，所有方法对于25个分类问题通常会产生明显的更好的结果。带有TPE采样器的Optuna成为最佳的AF架构生产方法。

translated by 谷歌翻译

How to train accurate BNNs for embedded systems?

Floran de Putter , Henk Corporaal

分类：机器学习 | 计算机视觉

2022-06-24

在资源受限的嵌入式系统上部署卷积神经网络的关键推动力是二进制神经网络（BNN）。 BNNS通过将功能和权重进行分配来保存内存并简化计算。不幸的是，二进制不可避免地伴随着准确性的严重降低。为了减少二进制和完整精确网络之间的准确性差距，最近提出了许多维修方法，我们已经将其分类并在本章中进行了单一概述。维修方法分为两个主要分支，培训技术和网络拓扑变化，可以进一步分为较小的类别。后一个类别为嵌入式系统引入了额外的成本（能源消耗或额外的面积），而前者则没有。从我们的概述中，我们可以观察到在减少准确性差距方面取得了进展，但是BNN论文并不对应使用哪种修复方法进行对齐，以获得高度准确的BNN。因此，本章包含一项经验综述，该综述评估了许多维修方法的好处，而不是Resnet-20 \＆Cifar10和Resnet-18 \＆Cifar100基准。我们发现三个维修类别最有益：功能二进制器，功能归一化和双重残留。基于这篇评论，我们讨论未来的方向和研究机会。我们勾勒出与BNN在嵌入式系统上相关的收益和成本，因为BNN是否能够缩小准确性差距，同时在资源受限的嵌入式系统上保持高能效率仍然有待观察。

translated by 谷歌翻译

diffGrad: An Optimization Method for Convolutional Neural Networks

Shiv Ram Dubey , Soumendu Chakraborty , Swalpa Kumar Roy , Snehasis Mukherjee , Satish Kumar Singh , Bidyut Baran Chaudhuri

分类：机器学习 | 计算机视觉 | 神经与进化计算

2019-09-12

随机梯度体面（SGD）是深神经网络成功背后的核心技术之一。梯度提供有关功能具有最陡变化率的方向的信息。基本SGD的主要问题是通过梯度行为而对所有参数的相等大小的步骤进行更改。因此，深度网络优化的有效方式是为每个参数进行自适应步骤尺寸。最近，已经进行了几次尝试，以改善梯度下降方法，例如Adagrad，Adadelta，RMSProp和Adam。这些方法依赖于平方过去梯度的指数移动平均线的平方根。因此，这些方法不利用梯度的局部变化。在本文中，基于当前和立即梯度（即，差异）之间的差异提出了一种新颖的优化器。在所提出的差异优化技术中，以这样的方式调整步长，使得它应该具有更大的梯度改变参数的较大步长，以及用于较低梯度改变参数的较低步长。收敛分析是使用在线学习框架的遗憾方法完成。在本文中进行严格的分析超过三种合成复合的非凸功能。图像分类实验也在CiFar10和CiFAR100数据集上进行，以观察漫反射的性能，相对于最先进的优化器，例如SGDM，Adagrad，Adadelta，RMSProp，Amsgrad和Adam。基于基于单元（Reset）的基于卷积神经网络（CNN）架构用于实验中。实验表明，Diffgrad优于其他优化器。此外，我们表明差异对使用不同的激活功能训练CNN的均匀良好。源代码在https://github.com/shivram1987/diffgrad公开使用。

translated by 谷歌翻译

Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy

Michael J. Smith , James E. Geach

分类：机器学习

2022-11-07

In recent years, deep learning has infiltrated every field it has touched, reducing the need for specialist knowledge and automating the process of knowledge discovery from data. This review argues that astronomy is no different, and that we are currently in the midst of a deep learning revolution that is transforming the way we do astronomy. We trace the history of astronomical connectionism from the early days of multilayer perceptrons, through the second wave of convolutional and recurrent neural networks, to the current third wave of self-supervised and unsupervised deep learning. We then predict that we will soon enter a fourth wave of astronomical connectionism, in which finetuned versions of an all-encompassing 'foundation' model will replace expertly crafted deep learning models. We argue that such a model can only be brought about through a symbiotic relationship between astronomy and connectionism, whereby astronomy provides high quality multimodal data to train the foundation model, and in turn the foundation model is used to advance astronomical research.

translated by 谷歌翻译

Deep learning for time series classification: a review

Hassan Ismail Fawaz , Germain Forestier , Jonathan Weber , Lhassane Idoumghar , Pierre-Alain Muller

分类：

2018-09-12

Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-ofthe-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

translated by 谷歌翻译

Deep Learning -- A first Meta-Survey of selected Reviews across Scientific Disciplines, their Commonalities, Challenges and Research Impact

Jan Egger , Antonio Pepe , Christina Gsaxner , Yuan Jin , Jianning Li , Roman Kern

分类：计算机视觉 | 机器学习 | 神经与进化计算

2020-11-16

深度学习属于人工智能领域，机器执行通常需要某种人类智能的任务。类似于大脑的基本结构，深度学习算法包括一种人工神经网络，其类似于生物脑结构。利用他们的感官模仿人类的学习过程，深入学习网络被送入（感官）数据，如文本，图像，视频或声音。这些网络在不同的任务中优于最先进的方法，因此，整个领域在过去几年中看到了指数增长。这种增长在过去几年中每年超过10,000多种出版物。例如，只有在医疗领域中的所有出版物中覆盖的搜索引擎只能在Q3 2020中覆盖所有出版物的子集，用于搜索术语“深度学习”，其中大约90％来自过去三年。因此，对深度学习领域的完全概述已经不可能在不久的将来获得，并且在不久的将来可能会难以获得难以获得子场的概要。但是，有几个关于深度学习的综述文章，这些文章专注于特定的科学领域或应用程序，例如计算机愿景的深度学习进步或在物体检测等特定任务中进行。随着这些调查作为基础，这一贡献的目的是提供对不同科学学科的深度学习的第一个高级，分类的元调查。根据底层数据来源（图像，语言，医疗，混合）选择了类别（计算机愿景，语言处理，医疗信息和其他工程）。此外，我们还审查了每个子类别的常见架构，方法，专业，利弊，评估，挑战和未来方向。

translated by 谷歌翻译

Deep sparse rectifier neural networks

分类：

While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabeled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labeled datasets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised neural networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training.

translated by 谷歌翻译

Quantized neural networks: Training neural networks with low precision weights and activations

分类：

We introduce a method to train Quantized Neural Networks (QNNs) -neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At traintime the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.

translated by 谷歌翻译