The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC). However, the variation of signal scales across and within time series data, makes it challenging to decide on proper RF sizes for TSC. In this paper, we propose a dynamic sparse network (DSN) with sparse connections for TSC, which can learn to cover various RF without cumbersome hyper-parameters tuning. The kernels in each sparse layer are sparse and can be explored under the constraint regions by dynamic sparse training, which makes it possible to reduce the resource cost. The experimental results show that the proposed DSN model can achieve state-of-art performance on both univariate and multivariate TSC datasets with less than 50\% computational cost compared with recent baseline methods, opening the path towards more accurate resource-aware methods for time series analyses. Our code is publicly available at: https://github.com/QiaoXiao7282/DSN.
translated by 谷歌翻译
接受场(RF)的大小一直是时间序列分类任务中一维卷积神经网络(1D-CNN)的最重要因素之一。已经采取了巨大的努力来选择适当的大小,因为它对性能产生了巨大影响,并且每个数据集都有很大的不同。在本文中,我们为1D-CNN提出了一个Omni级块(OS-Block),其中内核大小由简单而通用的规则决定。特别是,它是一组内核大小,可以根据时间序列的长度通过多个素数组成,可以有效地覆盖不同数据集的最佳RF大小。实验结果表明,具有OSBlock的模型可以达到与搜索最佳RF尺寸的模型相似的性能,并且由于最佳的最佳RF尺寸捕获能力,具有OS-Block的简单1D-CNN模型可实现最新状态。四个时间序列基准的ART性能,包括来自多个域的单变量和多元数据。全面的分析和讨论阐明了为什么OS-Block可以在不同数据集中捕获最佳的RF尺寸。可用代码[https://github.com/wensi-tang/os-cnn]
translated by 谷歌翻译
Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .
translated by 谷歌翻译
自视觉变压器(VIT)出现以来,变形金刚在计算机视觉世界中迅速发光。卷积神经网络(CNN)的主要作用似乎受到越来越有效的基于变压器的模型的挑战。最近,几个先进的卷积模型以当地但大量注意机制的驱动的大型内核进行反击,显示出吸引力的性能和效率。尽管其中一个(即Replknet)令人印象深刻地设法将内核大小扩展到31x31,而性能提高,但随着内核大小的持续增长,性能开始饱和,与Swin Transformer等高级VIT的缩放趋势相比。在本文中,我们探讨了训练大于31x31的极端卷积的可能性,并测试是否可以通过策略性地扩大卷积来消除性能差距。这项研究最终是从稀疏性的角度施加极大核的食谱,该核心可以将内核平滑地扩展到61x61,并且性能更好。我们提出了稀疏的大内核网络(SLAK),这是一种纯CNN架构,配备了51x51个核,可以与最先进的层次变压器和现代探测器架构(如Convnext和Repleknet and Replknet and Replknet and Replknet and Replinext and Replknet and Replinext and Convnext and Replentical conternels cor相同或更好在成像网分类以及典型的下游任务上。我们的代码可在此处提供https://github.com/vita-group/slak。
translated by 谷歌翻译
关于稀疏神经网络训练(稀疏训练)的最新研究表明,通过从头开始训练本质上稀疏的神经网络可以实现绩效和效率之间的令人信服的权衡。现有的稀疏训练方法通常努力在一次跑步中找到最佳的稀疏子网,而无需涉及任何昂贵的密集或预训练步骤。例如,作为最突出的方向之一,动态稀疏训练(DST)能够通过在训练过程中迭代发展稀疏拓扑来实现竞争性训练的竞争性能。在本文中,我们认为最好分配有限的资源来创建多个低损失的稀疏子网并将其超级置于更强的基因,而不是完全分配所有资源以找到单个子网络。为了实现这一目标,需要两个Desiderata:(1)在一个培训过程中有效生产许多低损失的子网,即所谓的廉价门票,仅限于用于密集培训的标准培训时间; (2)将这些廉价的门票有效地超级为一个更强的子网,而无需超越约束参数预算。为了证实我们的猜想,我们提出了一种新颖的稀疏训练方法,称为\ textbf {sup-tickets},可以在单个稀疏到较小的训练过程中同时满足上述两个desiderata。在CIFAR-10/100和Imagenet上的各种现代体系结构中,我们表明,SUP-Tickets与现有的稀疏训练方法无缝集成,并显示出一致的性能提高。
translated by 谷歌翻译
彩票票证假设(LTH)表明,密集的模型包含高度稀疏的子网(即获奖门票),可以隔离培训以完全准确。尽管做出了许多激动人心的努力,但仍有一个“常识”很少受到挑战:通过迭代级修剪(IMP)发现了一张获胜的票,因此由此产生的修剪子网仅具有非结构化的稀疏性。这一差距限制了在实践中赢得门票的吸引力,因为高度不规则的稀疏模式在硬件上加速的挑战是挑战性的。同时,直接将结构化修剪替换为非结构化的修剪,以更严重地损害绩效,并且通常无法找到获胜的票。在本文中,我们证明了第一个积极的结果是,总体上可以有效地找到结构上稀疏的获胜票。核心思想是在每一轮(非结构化)IMP之后附加“后处理技术”,以实施结构稀疏的形成。具体而言,我们首先在某些被认为很重要的通道中“重新填充”修剪元素,然后“重新组”非零元素以创建灵活的群体结构模式。我们确定的渠道和团体结构子网都赢得了彩票,并以现有硬件很容易支持的大量推理加速。广泛的实验,在多个网络骨架的不同数据集上进行,一致验证了我们的建议,表明LTH的硬件加速障碍现在已被删除。具体而言,结构上的获胜票最多可获得{64.93%,64.84%,60.23%}的运行时间节省,以{36%〜80%,74%,58%}的稀疏性在{Cifar,cifar,tiny-imageNet,imageNet}上保持可比较的精度。代码在https://github.com/vita-group/structure-lth上。
translated by 谷歌翻译
最近对稀疏神经网络的作品已经证明了独立从头开始训练稀疏子网,以匹配其相应密集网络的性能。然而,识别这种稀疏的子网(获奖票)涉及昂贵的迭代火车 - 培训 - 培训过程(例如,彩票票证假设)或过度扩展的训练时间(例如,动态稀疏训练)。在这项工作中,我们在稀疏神经网络训练和深度合并技术之间汲取了独特的联系,产生了一个名为FreeTickets的新型集合学习框架。 FreeTickets而不是从密集的网络开始,随机初始化稀疏的子网,然后在动态调整其稀疏掩码的同时列举子网,从而在整个训练过程中产生许多不同的稀疏子网。 FreeTickets被定义为这些稀疏子网的集合,在这种单次通过,稀疏稀疏训练中自由获得,其仅使用Vanilla密集培训所需的计算资源的一小部分。此外,尽管是模型的集合,但与单一密集模型相比,FreeTickets的参数和训练拖鞋更少:这种看似反向直观的结果是由于每个子网的高稀疏性。与标准致密基线相比,观察到惯性基因术,以预测准确性,不确定度估计,鲁棒性和效率相比表现出显着的全面改进。 FreeTickets在ImageNet上只使用后者所需的四分之一的培训拖鞋,可以轻松地表达Naive Deep EndleBe。我们的结果提供了对稀疏神经网络的强度的见解,并表明稀疏性的好处超出了通常预期的推理效率。
translated by 谷歌翻译
In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.
translated by 谷歌翻译
现有的多尺度解决方案会导致仅增加接受场大小的风险,同时忽略小型接受场。因此,有效构建自适应神经网络以识别各种空间尺度对象是一个具有挑战性的问题。为了解决这个问题,我们首先引入一个新的注意力维度,即除了现有的注意力维度(例如渠道,空间和分支)之外,并提出了一个新颖的选择性深度注意网络,以对称地处理各种视觉中的多尺度对象任务。具体而言,在给定神经网络的每个阶段内的块,即重新连接,输出层次功能映射共享相同的分辨率但具有不同的接收场大小。基于此结构属性,我们设计了一个舞台建筑模块,即SDA,其中包括树干分支和类似SE的注意力分支。躯干分支的块输出融合在一起,以通过注意力分支指导其深度注意力分配。根据提出的注意机制,我们可以动态选择不同的深度特征,这有助于自适应调整可变大小输入对象的接收场大小。这样,跨块信息相互作用会导致沿深度方向的远距离依赖关系。与其他多尺度方法相比,我们的SDA方法结合了从以前的块到舞台输出的多个接受场,从而提供了更广泛,更丰富的有效接收场。此外,我们的方法可以用作其他多尺度网络以及注意力网络的可插入模块,并创造为SDA- $ x $ net。它们的组合进一步扩展了有效的接受场的范围,可以实现可解释的神经网络。我们的源代码可在\ url {https://github.com/qingbeiguo/sda-xnet.git}中获得。
translated by 谷歌翻译
时间序列数据通常仅在观察过程中的中断时仅在有限的时间范围内获得。为了对这样的部分时间序列进行分类,我们需要考虑1)从2)不同时间戳绘制的可变长度数据。为了解决第一个问题,现有的卷积神经网络在卷积层之后使用全球池取消长度差异。这种体系结构遭受了将整个时间相关性纳入长数据和避免用于简短数据的功能崩溃之间的权衡。为了解决这种权衡,我们提出了自适应多尺度合并,该池从自适应数量的层中汇总了功能,即仅用于简短数据的前几层和更多的长数据层。此外,为了解决第二个问题,我们引入了时间编码,将观察时间戳嵌入中间特征中。我们的私有数据集和UCR/UEA时间序列档案中的实验表明,我们的模块提高了分类精度,尤其是在部分时间序列获得的短数据上。
translated by 谷歌翻译
Dynamic networks have been extensively explored as they can considerably improve the model's representation power with acceptable computational cost. The common practice in implementing dynamic networks is to convert given static layers into fully dynamic ones where all parameters are dynamic and vary with the input. Recent studies empirically show the trend that the more dynamic layers contribute to ever-increasing performance. However, such a fully dynamic setting 1) may cause redundant parameters and high deployment costs, limiting the applicability of dynamic networks to a broader range of tasks and models, and more importantly, 2) contradicts the previous discovery in the human brain that \textit{when human brains process an attention-demanding task, only partial neurons in the task-specific areas are activated by the input, while the rest neurons leave in a baseline state.} Critically, there is no effort to understand and resolve the above contradictory finding, leaving the primal question -- to make the computational parameters fully dynamic or not? -- unanswered. The main contributions of our work are challenging the basic commonsense in dynamic networks, and, proposing and validating the \textsc{cherry hypothesis} -- \textit{A fully dynamic network contains a subset of dynamic parameters that when transforming other dynamic parameters into static ones, can maintain or even exceed the performance of the original network.} Technically, we propose a brain-inspired partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones. Also, we further design Iterative Mode Partition to partition the dynamic- and static-subnet, which alleviates the redundancy in traditional fully dynamic networks. Our hypothesis and method are comprehensively supported by large-scale experiments with typical advanced dynamic methods.
translated by 谷歌翻译
深度神经网络(DNN)在解决许多真实问题方面都有效。较大的DNN模型通常表现出更好的质量(例如,精度,精度),但它们的过度计算会导致长期推理时间。模型稀疏可以降低计算和内存成本,同时保持模型质量。大多数现有的稀疏算法是单向移除的重量,而其他人则随机或贪婪地探索每层进行修剪的小权重子集。这些算法的局限性降低了可实现的稀疏性水平。此外,许多算法仍然需要预先训练的密集模型,因此遭受大的内存占地面积。在本文中,我们提出了一种新颖的预定生长和修剪(间隙)方法,而无需预先培训密集模型。它通过反复生长一个层次的层来解决以前的作品的缺点,然后在一些训练后修剪回到稀疏。实验表明,使用所提出的方法修剪模型匹配或击败高度优化的密集模型的质量,在各种任务中以80%的稀疏度,例如图像分类,客观检测,3D对象分段和翻译。它们还优于模型稀疏的其他最先进的(SOTA)方法。作为一个例子,通过间隙获得的90%不均匀的稀疏resnet-50模型在想象中实现了77.9%的前1个精度,提高了先前的SOTA结果1.5%。所有代码将公开发布。
translated by 谷歌翻译
由于稀疏神经网络通常包含许多零权重,因此可以在不降低网络性能的情况下潜在地消除这些不必要的网络连接。因此,设计良好的稀疏神经网络具有显着降低拖鞋和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体地,重量被重新参数化为可培训权重变量和二进制掩模的元素方向乘法。因此,由二进制掩模完全描述网络连接,其由单位步进函数调制。理论上,从理论上证明了使用直通估计器(STE)进行网络修剪的基本原理。这一原则是STE的代理梯度应该是积极的,确保掩模变量在其最小值处收敛。在找到泄漏的Relu后,SoftPlus和Identity Stes可以满足这个原理,我们建议采用SCL的身份STE以进行离散面膜松弛。我们发现不同特征的面具梯度非常不平衡,因此,我们建议将每个特征的掩模梯度标准化以优化掩码变量训练。为了自动训练稀疏掩码,我们将网络连接总数作为我们的客观函数中的正则化术语。由于SCL不需要由网络层设计人员定义的修剪标准或超级参数,因此在更大的假设空间中探讨了网络,以实现最佳性能的优化稀疏连接。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并选择各种基线网络结构的重要网络连接。 SCL培训的深度学习模型以稀疏性,精度和减少脚波特的SOTA人类设计和自动修剪方法训练。
translated by 谷歌翻译
图形神经网络(GNNS)由于图形数据的规模和模型参数的数量呈指数增长,因此限制了它们在实际应用中的效用,因此往往会遭受高计算成本。为此,最近的一些作品着重于用彩票假设(LTH)稀疏GNN,以降低推理成本,同时保持绩效水平。但是,基于LTH的方法具有两个主要缺点:1)它们需要对密集模型进行详尽且迭代的训练,从而产生了极大的训练计算成本,2)它们仅修剪图形结构和模型参数,但忽略了节点功能维度,存在大量冗余。为了克服上述局限性,我们提出了一个综合的图形渐进修剪框架,称为CGP。这是通过在一个训练过程中设计在训练图周期修剪范式上进行动态修剪GNN来实现的。与基于LTH的方法不同,提出的CGP方法不需要重新训练,这大大降低了计算成本。此外,我们设计了一个共同策略,以全面地修剪GNN的所有三个核心元素:图形结构,节点特征和模型参数。同时,旨在完善修剪操作,我们将重生过程引入我们的CGP框架,以重新建立修剪但重要的连接。提出的CGP通过在6个GNN体系结构中使用节点分类任务进行评估,包括浅层模型(GCN和GAT),浅但深度散发模型(SGC和APPNP)以及Deep Models(GCNII和RESGCN),总共有14个真实图形数据集,包括来自挑战性开放图基准的大规模图数据集。实验表明,我们提出的策略在匹配时大大提高了训练和推理效率,甚至超过了现有方法的准确性。
translated by 谷歌翻译
The automated machine learning (AutoML) field has become increasingly relevant in recent years. These algorithms can develop models without the need for expert knowledge, facilitating the application of machine learning techniques in the industry. Neural Architecture Search (NAS) exploits deep learning techniques to autonomously produce neural network architectures whose results rival the state-of-the-art models hand-crafted by AI experts. However, this approach requires significant computational resources and hardware investments, making it less appealing for real-usage applications. This article presents the third version of Pareto-Optimal Progressive Neural Architecture Search (POPNASv3), a new sequential model-based optimization NAS algorithm targeting different hardware environments and multiple classification tasks. Our method is able to find competitive architectures within large search spaces, while keeping a flexible structure and data processing pipeline to adapt to different tasks. The algorithm employs Pareto optimality to reduce the number of architectures sampled during the search, drastically improving the time efficiency without loss in accuracy. The experiments performed on images and time series classification datasets provide evidence that POPNASv3 can explore a large set of assorted operators and converge to optimal architectures suited for the type of data provided under different scenarios.
translated by 谷歌翻译
As a powerful engine, vanilla convolution has promoted huge breakthroughs in various computer tasks. However, it often suffers from sample and content agnostic problems, which limits the representation capacities of the convolutional neural networks (CNNs). In this paper, we for the first time model the scene features as a combination of the local spatial-adaptive parts owned by the individual and the global shift-invariant parts shared to all individuals, and then propose a novel two-branch dual complementary dynamic convolution (DCDC) operator to flexibly deal with these two types of features. The DCDC operator overcomes the limitations of vanilla convolution and most existing dynamic convolutions who capture only spatial-adaptive features, and thus markedly boosts the representation capacities of CNNs. Experiments show that the DCDC operator based ResNets (DCDC-ResNets) significantly outperform vanilla ResNets and most state-of-the-art dynamic convolutional networks on image classification, as well as downstream tasks including object detection, instance and panoptic segmentation tasks, while with lower FLOPs and parameters.
translated by 谷歌翻译
Sparse neural networks attract increasing interest as they exhibit comparable performance to their dense counterparts while being computationally efficient. Pruning the dense neural networks is among the most widely used methods to obtain a sparse neural network. Driven by the high training cost of such methods that can be unaffordable for a low-resource device, training sparse neural networks sparsely from scratch has recently gained attention. However, existing sparse training algorithms suffer from various issues, including poor performance in high sparsity scenarios, computing dense gradient information during training, or pure random topology search. In this paper, inspired by the evolution of the biological brain and the Hebbian learning theory, we present a new sparse training approach that evolves sparse neural networks according to the behavior of neurons in the network. Concretely, by exploiting the cosine similarity metric to measure the importance of the connections, our proposed method, Cosine similarity-based and Random Topology Exploration (CTRE), evolves the topology of sparse neural networks by adding the most important connections to the network without calculating dense gradient in the backward. We carried out different experiments on eight datasets, including tabular, image, and text datasets, and demonstrate that our proposed method outperforms several state-of-the-art sparse training algorithms in extremely sparse neural networks by a large gap. The implementation code is available on https://github.com/zahraatashgahi/CTRE
translated by 谷歌翻译
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-ofthe-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
translated by 谷歌翻译
神经体系结构搜索(NAS)在从给定的超网中寻找有效的深神经网络(DNN)方面取得了惊人的成功。同时,彩票票证假设表明,DNN包含可以从头开始训练的小子网,以达到比原始DNN的可比精度或更高的精度。因此,目前是通过第一次搜索然后修剪的管道开发有效的DNN的常见做法。然而,这样做通常需要进行搜索训练培训过程,因此计算成本过高。在本文中,我们首次发现高效的DNN及其彩票子网(即彩票)可以直接从超级网络中直接识别,我们将其称为超级票,这是通过共同体系结构的两合一培训方案。搜索和参数修剪。此外,我们制定了一种进步和统一的超级标识识别策略,该策略使子网络在超网训练期间的连通性更改,比传统的稀疏培训更高的准确性和效率折衷。最后,我们评估了从一个任务中汲取的这种确定的超级款项是否可以很好地转移到其他任务,从而验证其同时处理多个任务的潜力。对三个任务和四个基准数据集进行的广泛实验和消融研究表明,与典型的NAS和修剪管道相比,我们所提出的超级款项实现了提高的准确性和效率权衡。可以在https://github.com/rice-eic/supertickets上获得代码和预估计的模型。
translated by 谷歌翻译
最近,稀疏培训已成为有希望的范式,可在边缘设备上有效地深入学习。当前的研究主要致力于通过进一步增加模型稀疏性来降低培训成本。但是,增加的稀疏性并不总是理想的,因为它不可避免地会在极高的稀疏度下引入严重的准确性降解。本文打算探索其他可能的方向,以有效,有效地降低稀疏培训成本,同时保持准确性。为此,我们研究了两种技术,即层冻结和数据筛分。首先,层冻结方法在密集的模型训练和微调方面取得了成功,但在稀疏训练域中从未采用过。然而,稀疏训练的独特特征可能会阻碍层冻结技术的结合。因此,我们分析了在稀疏培训中使用层冻结技术的可行性和潜力,并发现它有可能节省大量培训成本。其次,我们提出了一种用于数据集有效培训的数据筛分方法,该方法通过确保在整个培训过程中仅使用部分数据集来进一步降低培训成本。我们表明,这两种技术都可以很好地整合到稀疏训练算法中,以形成一个通用框架,我们将其配置为SPFDE。我们的广泛实验表明,SPFDE可以显着降低培训成本,同时从三个维度中保留准确性:重量稀疏性,层冻结和数据集筛分。
translated by 谷歌翻译