智能论文笔记

DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model

Shreshth Tuli , Giuliano Casale , Ludmila Cherkasova , Nicholas R. Jennings

分类：人工智能

2022-12-02

The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can be mined for fault prediction, labeling this data for training is a manual process and thus a limiting factor for automation. Due to this, many companies resort to unsupervised fault-tolerance models. Yet, failure models of this kind can incur a loss of accuracy when they need to adapt to non-stationary workloads and diverse host characteristics. To cope with this, we propose a novel modeling approach, called DeepFT, to proactively avoid system overloads and their adverse effects by optimizing the task scheduling and migration decisions. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system and co-simulation based self-supervised learning to dynamically adapt the model in volatile settings. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts. Extensive experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks shows that DeepFT can outperform state-of-the-art baseline methods in fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1 scores for fault-detection, reducing service deadline violations by up to 37\% while also improving response time by up to 9%.

translated by 谷歌翻译

PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing

Shreshth Tuli , Giuliano Casale , Nicholas R. Jennings

分类：机器学习

2021-12-04

由于边缘设备的不可靠性以及现代应用的严格的服务截止日期，构建一个容错的边缘系统可以快速地对节点过载或故障发生的挑战是具有挑战性的。此外，不必要的任务迁移可能会强调系统网络，从而强调需要智能和解析故障恢复方案。现有方法通常无法适应高度挥发性的工作量或准确地检测和诊断故障以获得最佳修复。因此，需要一种坚固且主动的容错机制来满足服务级别目标。在这项工作中，我们提出了一种使用生成的对冲网络（GaN）的复合AI模型来预测集装箱边缘部署中的主动容错的抢占迁移决策。 Pregan使用串联的共同模拟与GaN一起学习几次异常的分类器，并主动预测可靠计算的迁移决策。基于Raspberry-PI的边缘环境的广泛实验表明，Pregan可以在故障检测，诊断和分类中优于最先进的基线方法，从而实现高质量的服务。与所考虑的基线中的最佳方法相比，Pregan完成了5.1％的准确故障检测，更高的诊断得分和23.8％的开销。

translated by 谷歌翻译

DRAGON: Decentralized Fault Tolerance in Edge Federations

Shreshth Tuli , Giuliano Casale , Nicholas R. Jennings

分类：人工智能

2022-08-16

Edge Federation是一种新的计算范式，无缝地互连多个边缘服务提供商的资源。此类系统中的一个关键挑战是在受约束设备中部署基于延迟和AI的资源密集型应用程序。为了应对这一挑战，我们提出了一种新型的基于记忆有效的深度学习模型，即生成优化网络（GON）。与甘斯不同，成人使用单个网络既区分输入又生成样本，从而大大降低了它们的内存足迹。利用奇数的低内存足迹，我们提出了一种称为Dragon的分散性故障耐受性方法，该方法运行模拟（按照数字建模双胞胎）来快速预测和优化边缘联邦的性能。在多个基于Raspberry-Pi的联合边缘配置上使用现实世界边缘计算基准测试的广泛实验表明，龙可以胜过故障检测和服务质量（QOS）指标的基线方法。具体而言，所提出的方法给出了与最佳深度学习方法（DL）方法更高的F1分数，而与启发式方法相比，记忆力较低。这使得违反能源消耗，响应时间和服务水平协议分别提高了74％，63％和82％。

translated by 谷歌翻译

MCDS: AI Augmented Workflow Scheduling in Mobile Edge Cloud Computing Systems

Shreshth Tuli , Giuliano Casale , Nicholas R. Jennings

分类：人工智能

2021-12-14

工作流程调度是一个并行和分布式计算（PDC）的长期研究，旨在有效地利用计算资源来满足用户的服务要求。最近提出的调度方法利用边缘计算平台的低响应时间来优化服务质量（QoS）。然而，由于计算异质性，移动设备的延迟以及工作负载资源要求的挥发性，因此由于计算异质性而挑战，在移动边缘云系统中的调度工作流程应用是具有挑战性的。为了克服这些困难，它是必不可少的，但同时具有挑战性，开发一种有效地模拟QoS目标的长视力优化方案。在这项工作中，我们提出了MCDS：Monte Carlo学习使用Deep代理模型来有效地安排移动边缘云计算系统中的工作流程应用。 MCD是一种基于人工智能（AI）的调度方法，它使用基于树的搜索策略和基于深度神经网络的代理模型来估计即时动作的长期QoS影响，以实现调度决策的鲁棒优化。物理和模拟边缘云试验台的实验表明，MCD在能耗，响应时间，SLA违规方面可以改善最先进的方法，违规和成本分别至少为6.13,4.56,45.09和30.71％。

translated by 谷歌翻译

GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments

Shreshth Tuli , Giuliano Casale , Nicholas R. Jennings

分类：机器学习

2021-12-16

最近，已经提出了使用代理模型的智能调度方法，以便在异构雾环境中有效地分配易失性任务。确定性代理模型，深神经网络（DNN）和基于梯度的优化等进步允许达到低能量消耗和响应时间。然而，确定估计优化的客观值的确定性代理模型，不考虑可以导致高服务级别协议（SLA）违规率的服务质量（QoS）目标函数的不确定性。此外，DNN训练的脆性性质，防止这些模型达到最小的能量或响应时间。为了克服这些困难，我们提出了一种新的调度程序：GOSH I.E.使用二阶衍生物和异源塑料深层代理模型的梯度优化。 GOSH使用二阶梯度基于基于梯度的优化方法来获得更好的QoS并减少迭代的次数，以收敛到调度决定，随后降低调度时间。 GOSH而不是Vanilla DNN，使用自然参数网络来近似客观分数。此外，较低的置信度优化方法可以通过采用基于误差的探索来在贪婪最小化和不确定性降低之间找到最佳权衡。因此，GOSH及其共模的扩展GOSH *可以快速调整并达到比基线方法更好的客观评分。我们表明GOSH *达到比GOSH更好的客观分数，但它仅适用于高资源可用性设置，而GOSH则适用于有限的资源设置。 GOSH和GOSH的真实系统实验*在能源消耗，响应时间和SLA分别违反最多18,27和82％的情况下，对最先进的技术进行了显着改善。

translated by 谷歌翻译

IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective

Li Yang , Abdallah Shami

分类：机器学习

2022-09-16

近年来，随着传感器和智能设备的广泛传播，物联网（IoT）系统的数据生成速度已大大增加。在物联网系统中，必须经常处理，转换和分析大量数据，以实现各种物联网服务和功能。机器学习（ML）方法已显示出其物联网数据分析的能力。但是，将ML模型应用于物联网数据分析任务仍然面临许多困难和挑战，特别是有效的模型选择，设计/调整和更新，这给经验丰富的数据科学家带来了巨大的需求。此外，物联网数据的动态性质可能引入概念漂移问题，从而导致模型性能降解。为了减少人类的努力，自动化机器学习（AUTOML）已成为一个流行的领域，旨在自动选择，构建，调整和更新机器学习模型，以在指定任务上实现最佳性能。在本文中，我们对Automl区域中模型选择，调整和更新过程中的现有方法进行了审查，以识别和总结将ML算法应用于IoT数据分析的每个步骤的最佳解决方案。为了证明我们的发现并帮助工业用户和研究人员更好地实施汽车方法，在这项工作中提出了将汽车应用于IoT异常检测问题的案例研究。最后，我们讨论并分类了该领域的挑战和研究方向。

translated by 谷歌翻译

Deep Learning-Driven Edge Video Analytics: A Survey

Renjie Xu , Saiedeh Razavi , Rong Zheng

分类：计算机视觉 | 机器学习

2022-11-28

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.

translated by 谷歌翻译

SensorSCAN: Self-Supervised Learning and Deep Clustering for Fault Diagnosis in Chemical Processes

Maksim Golyadkin , Vitaliy Pozdnyakov , Leonid Zhukov , Ilya Makarov

分类：机器学习 | 人工智能

2022-08-17

现代工业设施在生产过程中生成大量的原始传感器数据。该数据用于监视和控制过程，可以分析以检测和预测过程异常。通常，数据必须由专家注释，以进一步用于预测建模。当今的大多数研究都集中在需要手动注释数据的无监督异常检测算法或监督方法上。这些研究通常是使用过程模拟器生成的狭窄事件类别的数据进行的，并且在公开可用的数据集上很少验证建议的算法。在本文中，我们提出了一种新型的方法，用于用于工业化学传感器数据的无监督故障检测和诊断。我们根据具有各种故障类型的田纳西州伊士曼进程的两个公开数据集证明了我们的模型性能。结果表明，我们的方法显着优于现有方法（固定FPR的+0.2-0.3 TPR），并在不使用专家注释的情况下检测大多数过程故障。此外，我们进行了实验，以证明我们的方法适用于未提前不知道故障类型数量的现实世界应用。

translated by 谷歌翻译

System Resilience through Health Monitoring and Reconfiguration

Ion Matei , Wiktor Piotrowski , Alexandre Perez , Johan de Kleer , Jorge Tierno , Wendy Mungovan , Vance Turnewitsch

分类：人工智能

2022-08-30

我们展示了一个端到端框架，以提高人造系统对不可预见的事件的弹性。该框架基于基于物理的数字双胞胎模型和三个负责实时故障诊断，预后和重新配置的模块。故障诊断模块使用基于模型的诊断算法来检测和分离断层，并在系统中产生干预措施，以消除不确定的诊断解决方案。我们通过使用基于物理学的数字双胞胎的平行化和替代模型来扩展故障诊断算法为所需的实时性能。预后模块跟踪故障进度，并训练在线退化模型，以计算系统组件的剩余使用寿命。此外，我们使用降解模型来评估断层进程对操作要求的影响。重新配置模块使用基于PDDL的计划，并带有语义附件来调整系统控件，从而最大程度地减少了对系统操作的故障影响。我们定义一个弹性度量，并以燃料系统模型的示例来说明该指标如何通过我们的框架改进。

translated by 谷歌翻译

Deep Learning for Time Series Anomaly Detection: A Survey

Zahra Zamanzadeh Darban , Geoffrey I. Webb , Shirui Pan , Charu C. Aggarwal , Mahsa Salehi

分类：机器学习 | 人工智能

2022-11-09

Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.

translated by 谷歌翻译

SDRM3: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Seah Kim , Hyoukjun Kwon , Jinook Song , Jihyuck Jo , Yu-Hsin Chen , Liangzhen Lai , Vikas Chandra

分类：机器学习

2022-12-07

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control often involve dynamic behaviors in various levels; task, model, and layers (or, ML operators) within a model. Such dynamic behaviors are new challenges to the system software in an ML system because the overall system load is unpredictable unlike traditional ML workloads. Also, the real-time processing requires to meet deadlines, and multi-model workloads involve highly heterogeneous models. As RTMM workloads often run on resource-constrained devices (e.g., VR headset), developing an effective scheduler is an important research problem. Therefore, we propose a new scheduler, SDRM3, that effectively handles various dynamicity in RTMM style workloads targeting multi-accelerator systems. To make scheduling decisions, SDRM3 quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. SDRM3 has tunable parameters that provide fast adaptivity to dynamic workload changes based on a gradient descent-like online optimization, which typically converges within five steps for new workloads. In addition, we also propose a method to exploit model level dynamicity based on Supernet for exploiting the trade-off between the scheduling effectiveness and model performance (e.g., accuracy), which dynamically selects a proper sub-network in a Supernet based on the system loads. In our evaluation on five realistic RTMM workload scenarios, SDRM3 reduces the overall UXCost, which is a energy-delay-product (EDP)-equivalent metric for real-time applications defined in the paper, by 37.7% and 53.2% on geometric mean (up to 97.6% and 97.1%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

translated by 谷歌翻译

Towards trustworthy Energy Disaggregation: A review of challenges, methods and perspectives for Non-Intrusive Load Monitoring

Maria Kaselimi , Eftychios Protopapadakis , Athanasios Voulodimos , Nikolaos Doulamis , Anastasios Doulamis

分类：机器学习 | 人工智能

2022-07-05

非侵入性负载监控（NILM）是将总功率消耗分为单个子组件的任务。多年来，已经合并了信号处理和机器学习算法以实现这一目标。关于最先进的方法，进行了许多出版物和广泛的研究工作，以涉及最先进的方法。科学界最初使用机器学习工具的尼尔姆问题制定和描述的最初兴趣已经转变为更实用的尼尔姆。如今，我们正处于成熟的尼尔姆时期，在现实生活中的应用程序方案中尝试使用尼尔姆。因此，算法的复杂性，可转移性，可靠性，实用性和普遍的信任度是主要的关注问题。这篇评论缩小了早期未成熟的尼尔姆时代与成熟的差距。特别是，本文仅对住宅电器的尼尔姆方法提供了全面的文献综述。本文分析，总结并介绍了大量最近发表的学术文章的结果。此外，本文讨论了这些方法的亮点，并介绍了研究人员应考虑的研究困境，以应用尼尔姆方法。最后，我们表明需要将传统分类模型转移到一个实用且值得信赖的框架中。

translated by 谷歌翻译

Cadence: A Practical Time-series Partitioning Algorithm for Unlabeled IoT Sensor Streams

Tahiya Chowdhury , Murtadha Aldeer , Shantanu Laghate , Jorge Ortiz

分类：机器学习

2021-12-06

TimeSeries Partitioning是大多数机器学习驱动的传感器的IOT应用程序的重要步骤。本文介绍了一种采样效率，鲁棒，时序分割模型和算法。我们表明，通过基于最大平均差异（MMD）的分割目标来学习特定于分割目标的表示，我们的算法可以鲁布布地检测不同应用程序的时间序列事件。我们的损耗功能允许我们推断是否从相同的分布（空假设）中绘制了连续的样本序列，并确定拒绝零假设的对之间的变化点（即，来自不同的分布）。我们展示了其在基于环境传感的活动识别的实际IOT部署中的适用性。此外，虽然文献中存在许多关于变更点检测的作品，但我们的模型明显更简单，匹配或优于最先进的方法。我们可以平均地在9-93秒内完全培训我们的模型，而在不同应用程序上的数据的差异很小。

translated by 谷歌翻译

RUAD: unsupervised anomaly detection in HPC systems

Martin Molan , Andrea Borghesi , Daniele Cesarini , Luca Benini , Andrea Bartolini

分类：机器学习 | 人工智能

2022-08-28

现代高性能计算（HPC）系统的复杂性日益增加，需要引入自动化和数据驱动的方法，以支持系统管理员为增加系统可用性的努力。异常检测是改善可用性不可或缺的一部分，因为它减轻了系统管理员的负担，并减少了异常和解决方案之间的时间。但是，对当前的最新检测方法进行了监督和半监督，因此它们需要具有异常的人体标签数据集 - 在生产HPC系统中收集通常是不切实际的。基于聚类的无监督异常检测方法，旨在减轻准确的异常数据的需求，到目前为止的性能差。在这项工作中，我们通过提出RUAD来克服这些局限性，RUAD是一种新型的无监督异常检测模型。 Ruad比当前的半监督和无监督的SOA方法取得了更好的结果。这是通过考虑数据中的时间依赖性以及在模型体系结构中包括长短期限内存单元的实现。提出的方法是根据tier-0系统（带有980个节点的Cineca的Marconi100的完整历史）评估的。 RUAD在半监督训练中达到曲线（AUC）下的区域（AUC）为0.763，在无监督的训练中达到了0.767的AUC，这改进了SOA方法，在半监督训练中达到0.747的AUC，无需训练的AUC和0.734的AUC在无处不在的AUC中提高了AUC。训练。它还大大优于基于聚类的当前SOA无监督的异常检测方法，其AUC为0.548。

translated by 谷歌翻译

A Survey of Machine Learning for Computer Architecture and Systems

Nan Wu , Yuan Xie

分类：机器学习

2021-02-16

计算机架构和系统已优化了很长时间，以便高效执行机器学习（ML）模型。现在，是时候重新考虑ML和系统之间的关系，并让ML转换计算机架构和系统的设计方式。这有一个双重含义：改善设计师的生产力，以及完成良性周期。在这篇论文中，我们对应用ML进行计算机架构和系统设计的工作进行了全面的审查。首先，我们考虑ML技术在架构/系统设计中的典型作用，即快速预测建模或设计方法，我们执行高级分类学。然后，我们总结了通过ML技术解决的计算机架构/系统设计中的常见问题，并且所用典型的ML技术来解决它们中的每一个。除了在狭义中强调计算机架构外，我们采用数据中心可被认为是仓库规模计算机的概念;粗略的计算机系统中提供粗略讨论，例如代码生成和编译器;我们还注意ML技术如何帮助和改造设计自动化。我们进一步提供了对机会和潜在方向的未来愿景，并设想应用ML的计算机架构和系统将在社区中蓬勃发展。

translated by 谷歌翻译

Generative Anomaly Detection for Time Series Datasets

Zhuangwei Kang , Ayan Mukhopadhyay , Aniruddha Gokhale , Shijie Wen , Abhishek Dubey

分类：机器学习 | 人工智能

2022-06-28

在智能交通系统中，交通拥堵异常检测至关重要。运输机构的目标有两个方面：监视感兴趣领域的一般交通状况，并在异常拥堵状态下定位道路细分市场。建模拥塞模式可以实现这些目标，以实现全市道路的目标，相当于学习多元时间序列（MTS）的分布。但是，现有作品要么不可伸缩，要么无法同时捕获MTS中的空间信息。为此，我们提出了一个由数据驱动的生成方法组成的原则性和全面的框架，该方法可以执行可拖动的密度估计来检测流量异常。我们的方法在特征空间中的第一群段段，然后使用条件归一化流以在无监督的设置下在群集级别识别异常的时间快照。然后，我们通过在异常群集上使用内核密度估计器来识别段级别的异常。关于合成数据集的广泛实验表明，我们的方法在召回和F1得分方面显着优于几种最新的拥塞异常检测和诊断方法。我们还使用生成模型来采样标记的数据，该数据可以在有监督的环境中训练分类器，从而减轻缺乏在稀疏设置中进行异常检测的标记数据。

translated by 谷歌翻译

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

Wei Yang Bryan Lim , Nguyen Cong Luong , Dinh Thai Hoang , Yutao Jiao , Ying-Chang Liang , Qiang Yang , Dusit Niyato , Chunyan Miao

分类：

2019-09-26

In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.

translated by 谷歌翻译

Predictive Auto-scaling with OpenStack Monasca

Giacomo Lanciano , Filippo Galli , Tommaso Cucinotta , Davide Bacciu , Andrea Passarella

分类：机器学习

2021-11-03

云自动缩放机制通常基于缩放集群的无功自动化规则，每当某些指标，例如情况下的平均CPU使用量超过预定义阈值。调整这些规则在缩放群集时变得特别繁琐，群集涉及不可忽略的时间来引导新实例，因为它经常在生产云服务中发生。要处理此问题，我们提出了一种基于在不久的将来进化的系统的自动缩放云服务的架构。我们的方法利用时序预测技术，如基于机器学习和人工神经网络的那些，以预测关键指标的未来动态，例如资源消耗度量，并在它们上应用基于阈值的缩放策略。结果是一种预测自动化策略，例如，能够在云应用程序的负载中自动预测峰值，并提前触发适当的缩放操作以适应流量的预期增加。我们将我们的方法称为开源OpenStack组件，它依赖于并扩展，并扩展了Monasca所提供的监控能力，从而增加了可以通过散热或尖林等管制成分来利用的预测度量。我们使用经常性神经网络和多层的Perceptron显示实验结果，作为预测器，与简单的线性回归和传统的非预测自动缩放策略进行比较。但是，所提出的框架允许根据需要轻松定制预测政策。

translated by 谷歌翻译

CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework

Shikhar Tuli , Chia-Hao Li , Ritvik Sharma , Niraj K. Jha

分类：机器学习

2022-12-07

Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks either explore a limited search space or employ suboptimal exploration techniques for simultaneous design decision investigations of the ML model and the accelerator. Furthermore, training the ML model and simulating the accelerator performance is computationally expensive. To address these limitations, this work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench. It is composed of two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators. CNNBench leverages an advanced search technique, BOSHNAS, to efficiently train a neural heteroscedastic surrogate model to converge to an optimal CNN architecture by employing second-order gradients. AccelBench performs cycle-accurate simulations for a diverse set of accelerator architectures in a vast design space. With the proposed co-design method, called BOSHCODE, our best CNN-accelerator pair achieves 1.4% higher accuracy on the CIFAR-10 dataset compared to the state-of-the-art pair, while enabling 59.1% lower latency and 60.8% lower energy consumption. On the ImageNet dataset, it achieves 3.7% higher Top1 accuracy at 43.8% lower latency and 11.2% lower energy consumption. CODEBench outperforms the state-of-the-art framework, i.e., Auto-NBA, by achieving 1.5% higher accuracy and 34.7x higher throughput, while enabling 11.0x lower energy-delay product (EDP) and 4.0x lower chip area on CIFAR-10.

translated by 谷歌翻译

Machine Learning for Microcontroller-Class Hardware -- A Review

Swapnil Sayan Saha , Sandeep Singh Sandha , Mani Srivastava

分类：机器学习

2022-05-29

机器学习的进步为低端互联网节点（例如微控制器）带来了新的机会，将情报带入了情报。传统的机器学习部署具有较高的记忆力，并计算足迹阻碍了其在超资源约束的微控制器上的直接部署。本文强调了为MicroController类设备启用机载机器学习的独特要求。研究人员为资源有限的应用程序使用专门的模型开发工作流程，以确保计算和延迟预算在设备限制之内，同时仍保持所需的性能。我们表征了微控制器类设备的机器学习模型开发的广泛适用的闭环工作流程，并表明几类应用程序采用了它的特定实例。我们通过展示多种用例，将定性和数值见解介绍到模型开发的不同阶段。最后，我们确定了开放的研究挑战和未解决的问题，要求仔细考虑前进。

translated by 谷歌翻译