The ability to dynamically adapt neural networks to newly-available data without performance deterioration would revolutionize deep learning applications. Streaming learning (i.e., learning from one data example at a time) has the potential to enable such real-time adaptation, but current approaches i) freeze a majority of network parameters during streaming and ii) are dependent upon offline, base initialization procedures over large subsets of data, which damages performance and limits applicability. To mitigate these shortcomings, we propose Cold Start Streaming Learning (CSSL), a simple, end-to-end approach for streaming learning with deep networks that uses a combination of replay and data augmentation to avoid catastrophic forgetting. Because CSSL updates all model parameters during streaming, the algorithm is capable of beginning streaming from a random initialization, making base initialization optional. Going further, the algorithm's simplicity allows theoretical convergence guarantees to be derived using analysis of the Neural Tangent Random Feature (NTRF). In experiments, we find that CSSL outperforms existing baselines for streaming learning in experiments on CIFAR100, ImageNet, and Core50 datasets. Additionally, we propose a novel multi-task streaming learning setting and show that CSSL performs favorably in this domain. Put simply, CSSL performs well and demonstrates that the complicated, multi-step training pipelines adopted by most streaming methodologies can be replaced with a simple, end-to-end learning approach without sacrificing performance.
translated by 谷歌翻译
我们为神经网络提出了一种新颖,结构化修剪算法 - 迭代,稀疏结构修剪算法,称为I-Spasp。从稀疏信号恢复的思想启发,I-Spasp通过迭代地识别网络内的较大的重要参数组(例如,滤波器或神经元),这些参数组大多数对修剪和密集网络输出之间的残差贡献,然后基于这些组阈值以较小的预定定义修剪比率。对于具有Relu激活的双层和多层网络架构,我们展示了通过多项式修剪修剪诱导的错误,该衰减是基于密集网络隐藏表示的稀疏性任意大的。在我们的实验中,I-Spasp在各种数据集(即MNIST和ImageNet)和架构(即馈送前向网络,Resnet34和MobileNetv2)中进行评估,其中显示用于发现高性能的子网和改进经过几种数量级的可提供基线方法的修剪效率。简而言之,I-Spasp很容易通过自动分化实现,实现强大的经验结果,具有理论收敛保证,并且是高效的,因此将自己区分开作为少数几个计算有效,实用,实用,实用,实用,实用,实用,实用,实用和可提供的修剪算法之一。
translated by 谷歌翻译
神经网络修剪对于在预训练的密集网络架构中发现有效,高性能的子网有用。然而,更常见的是,它涉及三步过程 - 预先训练,修剪和重新训练 - 这是计算昂贵的,因为必须完全预先训练的密集模型。幸运的是,已经经过了多种作品,证明可以通过修剪发现高性能的子网,而无需完全预先训练密集网络。旨在理论上分析修剪网络表现良好的密集网络预培训量,我们发现在两层全连接网络上的SGD预训练迭代数量中发现了一个理论界限,超出了由此进行修剪贪婪的前瞻性选择产生了一个达到良好训练错误的子网。该阈值显示在对数上依赖于数据集的大小,这意味着具有较大数据集的实验需要更好地训练通过修剪以执行良好执行的子网。我们经验展示了我们在各种架构和数据集中的理论结果的有效性,包括在Mnist上培训的全连接网络以及在CIFAR10和ImageNet上培训的几个深度卷积神经网络(CNN)架构。
translated by 谷歌翻译
The availability of frequent and cost-free satellite images is in growing demand in the research world. Such satellite constellations as Landsat 8 and Sentinel-2 provide a massive amount of valuable data daily. However, the discrepancy in the sensors' characteristics of these satellites makes it senseless to use a segmentation model trained on either dataset and applied to another, which is why domain adaptation techniques have recently become an active research area in remote sensing. In this paper, an experiment of domain adaptation through style-transferring is conducted using the HRSemI2I model to narrow the sensor discrepancy between Landsat 8 and Sentinel-2. This paper's main contribution is analyzing the expediency of that approach by comparing the results of segmentation using domain-adapted images with those without adaptation. The HRSemI2I model, adjusted to work with 6-band imagery, shows significant intersection-over-union performance improvement for both mean and per class metrics. A second contribution is providing different schemes of generalization between two label schemes - NALCMS 2015 and CORINE. The first scheme is standardization through higher-level land cover classes, and the second is through harmonization validation in the field.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels.
translated by 谷歌翻译
Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on how to turn it into one that can be productively studied empirically. We first present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment following meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.
translated by 谷歌翻译
多语言转移技术通常改善低资源机器翻译(MT)。这些技术中的许多是不考虑数据特征的情况下应用的。我们在海地对英语翻译的背景下显示,转移效率与知识共享语言之间的培训数据和关系数量相关。我们的实验表明,对于超出真实数据阈值的某些语言,反向翻译的增强方法是适得其反的,而从足够相关的语言中的跨语言转移则是优选的。我们通过贡献了基于规则的法国人行曲拼字和句法引擎以及一种新颖的语音嵌入方法来补充这一发现。当与多语言技术一起使用时,拼字法转换使对常规方法的统计学显着改善。在非常低的牙买加MT中,用传输语言进行矫正相似的代码转换可产生6.63的BLEU点优势。
translated by 谷歌翻译
我们介绍了一种新颖的深度学习方法,用于使用高分辨率的多光谱空中图像在城市环境中检测单个树木。我们使用卷积神经网络来回归一个置信图,指示单个树的位置,该位置是使用峰查找算法本地化的。我们的方法通过检测公共和私人空间中的树木来提供完整的空间覆盖范围,并可以扩展到很大的区域。在我们的研究区域,跨越南加州的五个城市,我们的F评分为0.735,RMSE为2.157 m。我们使用我们的方法在加利福尼亚城市森林中生产所有树木的地图,这表明我们有可能在前所未有的尺度上支持未来的城市林业研究。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译