Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research. We introduce a framework that makes this easier by providing a simple interface for tracking realtime energy consumption and carbon emissions, as well as generating standardized online appendices. Utilizing this framework, we create a leaderboard for energy efficient reinforcement learning algorithms to incentivize responsible research in this area as an example for other areas of machine learning. Finally, based on case studies using our framework, we propose strategies for mitigation of carbon emissions and reduction of energy consumption. By making accounting easier, we hope to further the sustainable development of machine learning experiments and spur more research into energy efficient algorithms.
translated by 谷歌翻译
通过提供前所未有的计算资源访问,云计算能够在机器学习等技术中快速增长,其计算需求产生了高能源成本和相应的碳足迹。结果,最近的奖学金呼吁更好地估计AI的温室气体影响:当今的数据科学家无法轻松或可靠地访问该信息的测量,从而排除了可行策略的发展。向用户提供有关软件碳强度的信息的云提供商是一种基本的垫脚石,以最大程度地减少排放。在本文中,我们提供了一个测量软件碳强度的框架,并建议通过使用每个能量单元使用基于位置和特定时间的边际排放数据来测量运行碳排放。我们为一组自然语言处理和计算机视觉的现代模型提供了操作软件强度的测量,以及各种模型尺寸,包括预处理61亿个参数语言模型。然后,我们评估了一套用于减少Microsoft Azure Cloud Compute平台排放的方法套件:使用不同地理区域中的云实例,在一天中的不同时间使用云实例,并在边际碳强度高于某个阈值时动态暂停云实例。我们证实了先前的结果,即数据中心的地理区域在给定云实例的碳强度中起着重要作用,并发现选择合适的区域可能具有最大的运营排放减少影响。我们还表明,一天中的时间对操作软件碳强度有显着影响。最后,我们最终提出了有关机器学习从业人员如何使用软件碳强度信息来减少环境影响的建议。
translated by 谷歌翻译
自动化机器学习(Automl)努力自动配置机器学习算法及其组合的整体(软件)解决方案 - 机器学习管道 - 针对手头的学习任务(数据集)量身定制。在过去十年中,Automl已成为具有数百个贡献的热门研究课题。虽然Automl提供了许多前景,但也称它也是相当资源密集的,这是其主要批评的主要观点之一。高资源消耗的主要原因是许多方法依赖于许多ML管道的(昂贵)评估,同时寻找良好的候选者。由于使用许多数据集和方法进行了大规模实验,因此在Automl方法研究的背景下放大了这个问题,每个数据都是用几种重复来排除随机效应的几个重复的实验。本文阐述了最近的绿色AI的精神,是为了提高对问题的自动化研究人员的意识,并详细阐述可能的补救措施。为此,我们确定了四类行动,社区可能采取更加可持续的自动化计划,即接近设计,基准,研究激励和透明度。
translated by 谷歌翻译
Progress in machine learning (ML) comes with a cost to the environment, given that training ML models requires significant computational resources, energy and materials. In the present article, we aim to quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes of~\carboneq~if we consider only the dynamic power consumption, and 50.5 tonnes if we account for all processes ranging from equipment manufacturing to energy-based operational consumption. We also study the energy requirements and carbon emissions of its deployment for inference via an API endpoint receiving user queries in real-time. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of ML models and future research directions that can contribute towards improving carbon emissions reporting.
translated by 谷歌翻译
深度神经网络的规模和复杂性继续成倍增长,大大增加了这些模型训练和推断的能源消耗。我们介绍了一个开源软件包ECO2AI,以帮助数据科学家和研究人员以直接的方式跟踪其模型的能源消耗和同等的二氧化碳排放。在Eco2ai中,我们强调能源消耗跟踪和正确的区域二氧化碳排放会计的准确性。我们鼓励研究社区搜索具有较低计算成本的新最佳人工智能(AI)架构。动机还来自基于AI的温室气体与可持续AI和绿色AI途径隔离周期的概念。
translated by 谷歌翻译
本文探讨了超线性增长趋势的环境影响,从整体角度来看,跨越数据,算法和系统硬件。我们通过在行业规模机器学习用例中检查模型开发周期来表征AI计算的碳足迹,同时考虑系统硬件的生命周期。进一步迈出一步,我们捕获AI计算的操作和制造碳足迹,并为硬件 - 软件设计和尺度优化的结束分析以及如何帮助降低AI的整体碳足迹。根据行业经验和经验教训,我们分享关键挑战,并在AI的许多方面上绘制了重要的发展方向。我们希望本文提出的关键信息和见解能够激发社区以环保的方式推进AI领域。
translated by 谷歌翻译
丹尼德缩放结束和摩尔法的放缓使能量使用数据中心在不可持续的道路上。数据中心已经是全球电力使用的大部分,应用需求以快速缩放。我们认为,数据中心计算的碳强度的大幅减少可以通过以软件为中心的方法来实现:通过修改系统API,通过修改系统API来使应用程序开发人员可见的能量和碳,使其成为可能进行知情的贸易性能和碳排放之间,并通过提高应用程序编程水平,以便灵活地使用更节能的计算和存储方法。我们还为系统软件奠定了一个研究议程,以减少数据中心计算的碳足迹。
translated by 谷歌翻译
With the rise of AI in recent years and the increase in complexity of the models, the growing demand in computational resources is starting to pose a significant challenge. The need for higher compute power is being met with increasingly more potent accelerators and the use of large compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, energy efficiency plays an important role for AI model developers and infrastructure operators alike. The energy consumption of AI workloads depends on the model implementation and the utilized hardware. Therefore, accurate measurements of the power draw of AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. To this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. Our results indicate that 1. deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional inefficiency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately - while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.
translated by 谷歌翻译
Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.
translated by 谷歌翻译
Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.
translated by 谷歌翻译
负责将数据从存储转移到GPU的同时,在培训机器学习模型的同时,数据加载器可能会大大提高培训工作的绩效。最近的进步不仅通过大大减少训练时间,而且还提供了新功能,例如从远程存储(如S3)加载数据,这表明了希望。在本文中,我们是第一个将数据加载器区分为深度学习(DL)工作流程中的单独组件并概述其结构和功能的组件。最后,我们提供了可用的不同数据库,其功能,可用性和性能方面的权衡以及从中获得的见解的全面比较。
translated by 谷歌翻译
多机构增强学习(MARL)已成为解决分散决策问题的有用方法。近年来提出的许多突破性算法一直在稳步增长。在这项工作中,我们仔细研究了这一快速发展,重点是在合作Marl的大量研究中采用的评估方法。通过对先前工作进行详细的荟萃分析,涵盖了从2016年至2022年接受出版的75篇论文,我们引起了人们对真正进步率的质疑的令人担忧的趋势。我们在更广泛的背景下进一步考虑了这些趋势,并从单一AGENT RL文献中获得了有关类似问题的灵感,这些建议以及仍然适用于MARL的建议。将这些建议与我们分析的新见解相结合,我们提出了合作MARL的标准化绩效评估方案。我们认为,这样的标准协议,如果被广泛采用,将大大提高未来研究的有效性和信誉,使复制和可重复性更加容易,并提高该领域的能力,通过能够通过能够准确评估进度的速度进行跨不同作品的合理比较。最后,我们在我们的项目网站上公开发布荟萃分析数据,以供未来的评估研究:https://sites.google.com/view/marl-andard-protocol
translated by 谷歌翻译
比较不同的汽车框架是具有挑战性的,并且经常做错了。我们引入了一个开放且可扩展的基准测试,该基准遵循最佳实践,并在比较自动框架时避免常见错误。我们对71个分类和33项回归任务进行了9个著名的自动框架进行了详尽的比较。通过多面分析,评估模型的准确性,与推理时间的权衡以及框架失败,探索了自动框架之间的差异。我们还使用Bradley-terry树来发现相对自动框架排名不同的任务子集。基准配备了一个开源工具,该工具与许多自动框架集成并自动化经验评估过程端到端:从框架安装和资源分配到深入评估。基准测试使用公共数据集,可以轻松地使用其他Automl框架和任务扩展,并且具有最新结果的网站。
translated by 谷歌翻译
机器学习传感器代表了嵌入式机器学习应用程序未来的范式转移。当前的嵌入式机器学习(ML)实例化遭受了复杂的整合,缺乏模块化以及数据流动的隐私和安全问题。本文提出了一个以数据为中心的范式,用于将传感器智能嵌入边缘设备上,以应对这些挑战。我们对“传感器2.0”的愿景需要将传感器输入数据和ML处理从硬件级别隔离到更广泛的系统,并提供一个薄的界面,以模拟传统传感器的功能。这种分离导致模块化且易于使用的ML传感器设备。我们讨论了将ML处理构建到嵌入式系统上控制微处理器的软件堆栈中的标准方法所带来的挑战,以及ML传感器的模块化如何减轻这些问题。 ML传感器提高了隐私和准确性,同时使系统构建者更容易将ML集成到其产品中,以简单的组件。我们提供了预期的ML传感器和说明性数据表的例子,以表现出来,并希望这将建立对话使我们朝着传感器2.0迈进。
translated by 谷歌翻译
机器学习(ML)系统的开发和部署可以用现代工具轻松执行,但该过程通常是匆忙和意思是结束的。缺乏勤奋会导致技术债务,范围蠕变和未对准的目标,模型滥用和失败,以及昂贵的后果。另一方面,工程系统遵循明确定义的流程和测试标准,以简化高质量,可靠的结果的开发。极端是航天器系统,其中关键任务措施和鲁棒性在开发过程中根深蒂固。借鉴航天器工程和ML的经验(通过域名通过产品的研究),我们开发了一种经过验证的机器学习开发和部署的系统工程方法。我们的“机器学习技术准备水平”(MLTRL)框架定义了一个原则的过程,以确保强大,可靠和负责的系统,同时为ML工作流程流线型,包括来自传统软件工程的关键区别。 MLTRL甚至更多,MLTRL为跨团队和组织的人们定义了一个人工智能和机器学习技术的人员。在这里,我们描述了通过生产化和部署在医学诊断,消费者计算机视觉,卫星图像和粒子物理学等领域,以通过生产和部署在基本研究中开发ML方法的几个现实世界使用情况的框架和阐明。
translated by 谷歌翻译
由于不断增长的计算要求,深度学习(DL)的能源消耗和碳足迹的增加已成为引起人们关注的原因。在这项工作中,我们关注开发医学图像分析模型(MIA)的碳足迹,其中处理了高空间分辨率的体积图像。在这项研究中,我们介绍并比较了文献中四种工具的特征,以量化DL的碳足迹。使用这些工具之一,我们估计了医学图像分割管道的碳足迹。我们选择NNU-NET作为医疗图像分割管道的代理,并在三个常见数据集上进行实验。在我们的工作中,我们希望告知MIA产生的能源成本不断增加。我们讨论了削减环境影响的简单策略,以使模型选择和培训过程更加有效。
translated by 谷歌翻译
我们通过将系统的任务性能以及系统开发和部署产生的时间和资源成本纳入整体框架来重新构架AI中的进度分析。这些成本包括:数据,专家知识,人类监督,软件资源,计算周期,硬件和网络设施以及(什么样的)时间。这些成本分配在系统的生命周期中,并可能对不同的开发人员和用户提出不同的需求。我们提出的多维性能和成本空间可以折叠成单个公用事业指标,该指标衡量了对不同利益相关者的系统价值。即使没有单个效用函数,AI的进步也可以通过它们是否扩展帕累托表面来评估。我们将这些类型的成本标记为被忽视的AI进度维度,并使用四个案例研究探索它们:Alpha*(GO,国际象棋和其他棋盘游戏),ALE(Atari Games),Imagenet(图像分类)和虚拟个人助理( Siri,Alexa,Cortana和Google Assistant)。 AI中的这种更广泛的进步模型将导致估计AI系统潜在的社会使用和影响的新颖方法,以及建立里程碑以实现未来的进步。
translated by 谷歌翻译
如今,由于最近在人工智能(AI)和机器学习(ML)中的近期突破,因此,智能系统和服务越来越受欢迎。然而,机器学习不仅满足软件工程,不仅具有有希望的潜力,而且还具有一些固有的挑战。尽管最近的一些研究努力,但我们仍然没有明确了解开发基于ML的申请和当前行业实践的挑战。此外,目前尚不清楚软件工程研究人员应将其努力集中起来,以更好地支持ML应用程序开发人员。在本文中,我们报告了一个旨在了解ML应用程序开发的挑战和最佳实践的调查。我们合成从80名从业者(以不同的技能,经验和应用领域)获得的结果为17个调查结果;概述ML应用程序开发的挑战和最佳实践。参与基于ML的软件系统发展的从业者可以利用总结最佳实践来提高其系统的质量。我们希望报告的挑战将通知研究界有关需要调查的主题,以改善工程过程和基于ML的申请的质量。
translated by 谷歌翻译
Machine Learning for Source Code (ML4Code) is an active research field in which extensive experimentation is needed to discover how to best use source code's richly structured information. With this in mind, we introduce JEMMA, an Extensible Java Dataset for ML4Code Applications, which is a large-scale, diverse, and high-quality dataset targeted at ML4Code. Our goal with JEMMA is to lower the barrier to entry in ML4Code by providing the building blocks to experiment with source code models and tasks. JEMMA comes with a considerable amount of pre-processed information such as metadata, representations (e.g., code tokens, ASTs, graphs), and several properties (e.g., metrics, static analysis results) for 50,000 Java projects from the 50KC dataset, with over 1.2 million classes and over 8 million methods. JEMMA is also extensible allowing users to add new properties and representations to the dataset, and evaluate tasks on them. Thus, JEMMA becomes a workbench that researchers can use to experiment with novel representations and tasks operating on source code. To demonstrate the utility of the dataset, we also report results from two empirical studies on our data, ultimately showing that significant work lies ahead in the design of context-aware source code models that can reason over a broader network of source code entities in a software project, the very task that JEMMA is designed to help with.
translated by 谷歌翻译
Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new selfsupervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.
translated by 谷歌翻译