Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and errorprone process. Because of this, there is growing interest in automated neural architecture search methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.
translated by 谷歌翻译
神经体系结构搜索(NAS)最近在深度学习社区中变得越来越流行,主要是因为它可以提供一个机会,使感兴趣的用户没有丰富的专业知识,从而从深度神经网络(DNNS)的成功中受益。但是,NAS仍然很费力且耗时,因为在NAS的搜索过程中需要进行大量的性能估计,并且训练DNNS在计算上是密集的。为了解决NAS的主要局限性,提高NAS的效率对于NAS的设计至关重要。本文以简要介绍了NAS的一般框架。然后,系统地讨论了根据代理指标评估网络候选者的方法。接下来是对替代辅助NAS的描述,该NAS分为三个不同类别,即NAS的贝叶斯优化,NAS的替代辅助进化算法和NAS的MOP。最后,讨论了剩余的挑战和开放研究问题,并在这个新兴领域提出了有希望的研究主题。
translated by 谷歌翻译
尽管人工神经网络(ANN)取得了重大进展,但其设计过程仍在臭名昭著,这主要取决于直觉,经验和反复试验。这个依赖人类的过程通常很耗时,容易出现错误。此外,这些模型通常与其训练环境绑定,而没有考虑其周围环境的变化。神经网络的持续适应性和自动化对于部署后模型可访问性的几个领域至关重要(例如,IoT设备,自动驾驶汽车等)。此外,即使是可访问的模型,也需要频繁的维护后部署后,以克服诸如概念/数据漂移之类的问题,这可能是繁琐且限制性的。当前关于自适应ANN的艺术状况仍然是研究的过早领域。然而,一种自动化和持续学习形式的神经体系结构搜索(NAS)最近在深度学习研究领域中获得了越来越多的动力,旨在提供更强大和适应性的ANN开发框架。这项研究是关于汽车和CL之间交集的首次广泛综述,概述了可以促进ANN中充分自动化和终身可塑性的不同方法的研究方向。
translated by 谷歌翻译
深度学习技术在各种任务中都表现出了出色的有效性,并且深度学习具有推进多种应用程序(包括在边缘计算中)的潜力,其中将深层模型部署在边缘设备上,以实现即时的数据处理和响应。一个关键的挑战是,虽然深层模型的应用通常会产生大量的内存和计算成本,但Edge设备通常只提供非常有限的存储和计算功能,这些功能可能会在各个设备之间差异很大。这些特征使得难以构建深度学习解决方案,以释放边缘设备的潜力,同时遵守其约束。应对这一挑战的一种有希望的方法是自动化有效的深度学习模型的设计,这些模型轻巧,仅需少量存储,并且仅产生低计算开销。该调查提供了针对边缘计算的深度学习模型设计自动化技术的全面覆盖。它提供了关键指标的概述和比较,这些指标通常用于量化模型在有效性,轻度和计算成本方面的水平。然后,该调查涵盖了深层设计自动化技术的三类最新技术:自动化神经体系结构搜索,自动化模型压缩以及联合自动化设计和压缩。最后,调查涵盖了未来研究的开放问题和方向。
translated by 谷歌翻译
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.
translated by 谷歌翻译
The automated machine learning (AutoML) field has become increasingly relevant in recent years. These algorithms can develop models without the need for expert knowledge, facilitating the application of machine learning techniques in the industry. Neural Architecture Search (NAS) exploits deep learning techniques to autonomously produce neural network architectures whose results rival the state-of-the-art models hand-crafted by AI experts. However, this approach requires significant computational resources and hardware investments, making it less appealing for real-usage applications. This article presents the third version of Pareto-Optimal Progressive Neural Architecture Search (POPNASv3), a new sequential model-based optimization NAS algorithm targeting different hardware environments and multiple classification tasks. Our method is able to find competitive architectures within large search spaces, while keeping a flexible structure and data processing pipeline to adapt to different tasks. The algorithm employs Pareto optimality to reduce the number of architectures sampled during the search, drastically improving the time efficiency without loss in accuracy. The experiments performed on images and time series classification datasets provide evidence that POPNASv3 can explore a large set of assorted operators and converge to optimal architectures suited for the type of data provided under different scenarios.
translated by 谷歌翻译
Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In this work, in order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks-PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS [41], a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results. We note the lack of source material needed to exactly reproduce these results, and further discuss the robustness of published results given the various sources of variability in NAS experimental setups. Relatedly, we provide all information (code, random seeds, documentation) needed to exactly reproduce our results, and report our random search with weight-sharing results for each benchmark on multiple runs.
translated by 谷歌翻译
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space. Direct comparison under the same search space shows that our method is up to 5 times more efficient than the RL method of Zoph et al. (2018) in terms of number of models evaluated, and 8 times faster in terms of total compute. The structures we discover in this way achieve state of the art classification accuracies on CIFAR-10 and ImageNet.
translated by 谷歌翻译
神经结构中的标准范例(NAS)是搜索具有特定操作和连接的完全确定性体系结构。在这项工作中,我们建议寻找最佳运行分布,从而提供了一种随机和近似解,可用于采样任意长度的架构。我们提出并显示,给定架构单元格,其性能主要取决于使用的操作的比率,而不是典型的搜索空间中的任何特定连接模式;也就是说,操作排序的小变化通常是无关紧要的。这种直觉与任何特定的搜索策略都具有正交,并且可以应用于多样化的NAS算法。通过对4数据集和4个NAS技术的广泛验证(贝叶斯优化,可分辨率搜索,本地搜索和随机搜索),我们表明操作分布(1)保持足够的辨别力来可靠地识别解决方案,并且(2)显着识别比传统的编码更容易优化,导致大量速度,几乎没有成本性能。实际上,这种简单的直觉显着降低了电流方法的成本,并可能使NAS用于更广泛的应用中。
translated by 谷歌翻译
大多数机器学习算法由一个或多个超参数配置,必须仔细选择并且通常会影响性能。为避免耗时和不可递销的手动试验和错误过程来查找性能良好的超参数配置,可以采用各种自动超参数优化(HPO)方法,例如,基于监督机器学习的重新采样误差估计。本文介绍了HPO后,本文审查了重要的HPO方法,如网格或随机搜索,进化算法,贝叶斯优化,超带和赛车。它给出了关于进行HPO的重要选择的实用建议,包括HPO算法本身,性能评估,如何将HPO与ML管道,运行时改进和并行化结合起来。这项工作伴随着附录,其中包含关于R和Python的特定软件包的信息,以及用于特定学习算法的信息和推荐的超参数搜索空间。我们还提供笔记本电脑,这些笔记本展示了这项工作的概念作为补充文件。
translated by 谷歌翻译
深入学习的强化学习(RL)的结合导致了一系列令人印象深刻的壮举,许多相信(深)RL提供了一般能力的代理。然而,RL代理商的成功往往对培训过程中的设计选择非常敏感,这可能需要繁琐和易于易于的手动调整。这使得利用RL对新问题充满挑战,同时也限制了其全部潜力。在许多其他机器学习领域,AutomL已经示出了可以自动化这样的设计选择,并且在应用于RL时也会产生有希望的初始结果。然而,自动化强化学习(AutorL)不仅涉及Automl的标准应用,而且还包括RL独特的额外挑战,其自然地产生了不同的方法。因此,Autorl已成为RL中的一个重要研究领域,提供来自RNA设计的各种应用中的承诺,以便玩游戏等游戏。鉴于RL中考虑的方法和环境的多样性,在不同的子领域进行了大部分研究,从Meta学习到进化。在这项调查中,我们寻求统一自动的领域,我们提供常见的分类法,详细讨论每个区域并对研究人员来说是一个兴趣的开放问题。
translated by 谷歌翻译
Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. We trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the precomputed dataset. We demonstrate its utility by analyzing the dataset as a whole and by benchmarking a range of architecture optimization algorithms.
translated by 谷歌翻译
超参数优化构成了典型的现代机器学习工作流程的很大一部分。这是由于这样一个事实,即机器学习方法和相应的预处理步骤通常只有在正确调整超参数时就会产生最佳性能。但是在许多应用中,我们不仅有兴趣仅仅为了预测精度而优化ML管道;确定最佳配置时,必须考虑其他指标或约束,从而导致多目标优化问题。由于缺乏知识和用于多目标超参数优化的知识和容易获得的软件实现,因此通常在实践中被忽略。在这项工作中,我们向读者介绍了多个客观超参数优化的基础知识,并激励其在应用ML中的实用性。此外,我们从进化算法和贝叶斯优化的领域提供了现有优化策略的广泛调查。我们说明了MOO在几个特定ML应用中的实用性,考虑了诸如操作条件,预测时间,稀疏,公平,可解释性和鲁棒性之类的目标。
translated by 谷歌翻译
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. ENAS constructs a large computational graph, where each subgraph represents a neural network architecture, hence forcing all architectures to share their parameters. A controller is trained with policy gradient to search for a subgraph that maximizes the expected reward on a validation set. Meanwhile a model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Sharing parameters among child models allows ENAS to deliver strong empirical performances, whilst using much fewer GPU-hours than existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On Penn Treebank, ENAS discovers a novel architecture that achieves a test perplexity of 56.3, on par with the existing state-of-the-art among all methods without post-training processing. On CIFAR-10, ENAS finds a novel architecture that achieves 2.89% test error, which is on par with the 2.65% test error of NASNet (Zoph et al., 2018).
translated by 谷歌翻译
近年来,行业和学术界的深度学习(DL)迅速发展。但是,找到DL模型的最佳超参数通常需要高计算成本和人类专业知识。为了减轻上述问题,进化计算(EC)作为一种强大的启发式搜索方法显示出在DL模型的自动设计中,所谓的进化深度学习(EDL)具有重要优势。本文旨在从自动化机器学习(AUTOML)的角度分析EDL。具体来说,我们首先从机器学习和EC阐明EDL,并将EDL视为优化问题。根据DL管道的说法,我们系统地介绍了EDL方法,从功能工程,模型生成到具有新的分类法的模型部署(即,什么以及如何发展/优化),专注于解决方案表示和搜索范式的讨论通过EC处理优化问题。最后,提出了关键的应用程序,开放问题以及可能有希望的未来研究线。这项调查回顾了EDL的最新发展,并为EDL的开发提供了有见地的指南。
translated by 谷歌翻译
深度学习的巨大进步导致了跨越众多领域的前所未有的成就。虽然深度神经网络的性能是可培制的,但这种模型的架构设计和可解释性是非竞争的。已经引入了通过神经结构搜索(NAS)自动化神经网络架构的设计。最近的进展通过利用分布式计算和新颖的优化算法,这些方法更加务实。但是,在优化架构以获得可解释性的情况下几乎没有作用。为此,我们提出了一种多目标分布式NAS框架,可针对任务性能和内省进行优化。我们利用非主导的分类遗传算法(NSGA-II)并说明可以通过人类更好地理解的造成架构的AI(XAI)技术。框架在几个图像分类数据集上进行评估。我们展示了对内省能力和任务错误的联合优化,导致更具脱屑的体系结构,可在可容忍的错误中执行。
translated by 谷歌翻译
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller discovers neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on a validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Sharing parameters among child models allows ENAS to deliver strong empirical performances, while using much fewer GPUhours than existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS finds a novel architecture that achieves 2.89% test error, which is on par with the 2.65% test error of NAS-Net (Zoph et al., 2018).
translated by 谷歌翻译
Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82% with only 2.5M parameters, which is on par with the state-of-the-art. Code is publicly available on GitHub: https://github.com/D-X-Y/NAS-Projects.
translated by 谷歌翻译
Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.
translated by 谷歌翻译
There is growing interest in automating neural network architecture design. Existing architecture search methods can be computationally expensive, requiring thousands of different architectures to be trained from scratch. Recent work has explored weight sharing across models to amortize the cost of training. Although previous methods reduced the cost of architecture search by orders of magnitude, they remain complex, requiring hypernetworks or reinforcement learning controllers. We aim to understand weight sharing for one-shot architecture search. With careful experimental analysis, we show that it is possible to efficiently identify promising architectures from a complex search space without either hypernetworks or RL.
translated by 谷歌翻译