Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
调整Quand参数是机器学习管道的重要而艰巨的部分。在联合学习中,封锁率优化更具挑战性,在多均匀设备的分布式网络上学习模型;在这里,需要保留设备上的数据并执行本地培训使得难以有效地培训和评估配置。在这项工作中,我们调查联邦封面调整的问题。我们首先识别关键挑战,并展示标准方法如何适应联合环境的基线。然后,通过与重量共享的神经结构搜索技术进行新颖的连接,我们介绍了一种新的方法,联邦快递,以加速联合的超参数调整,该调整适用于广泛使用的联合优化方法,例如FADVG和最近的变体。从理论上讲,我们表明联邦快递器在跨设备的在线凸优化的设置中正确调整了在设备上的学习速率。凭经验,我们表明,联邦快递可以在莎士比亚,春头和CIFAR-10基准上的几个百分点占据联邦封面调整的自然基线,使用相同的培训预算获得更高的准确性。
translated by 谷歌翻译
Automl的一个重要目标是自动化在探索域内的新任务上的神经网络设计。通过这一目标激励,我们研究了使用户能够发现来自其特定域的数据的正确神经操作的问题。我们介绍一个名为XD-Operation的搜索空间,这些操作模仿标准多通道卷曲的归纳偏差,同时更具表现力:我们证明它包括多个应用程序区域的许多命名操作。从Reset等任何标准骨干开始,我们展示了如何通过XD操作将其转换为搜索空间以及如何使用简单的权重共享方案遍历空间。在各种任务组合 - 求解PDES,距离蛋白质折叠和音乐建模的距离预测 - 我们的方法一致地产生比基线网络更低的误差的模型,并且通常更低的误差比专业设计的域特定方法更低。
translated by 谷歌翻译
Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In this work, in order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks-PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS [41], a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results. We note the lack of source material needed to exactly reproduce these results, and further discuss the robustness of published results given the various sources of variability in NAS experimental setups. Relatedly, we provide all information (code, random seeds, documentation) needed to exactly reproduce our results, and report our random search with weight-sharing results for each benchmark on multiple runs.
translated by 谷歌翻译
As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.
translated by 谷歌翻译
Drawing from the resources of psychoanalysis and critical media studies, in this paper we develop an analysis of Large Language Models (LLMs) as automated subjects. We argue the intentional fictional projection of subjectivity onto LLMs can yield an alternate frame through which AI behaviour, including its productions of bias and harm, can be analysed. First, we introduce language models, discuss their significance and risks, and outline our case for interpreting model design and outputs with support from psychoanalytic concepts. We trace a brief history of language models, culminating with the releases, in 2022, of systems that realise state-of-the-art natural language processing performance. We engage with one such system, OpenAI's InstructGPT, as a case study, detailing the layers of its construction and conducting exploratory and semi-structured interviews with chatbots. These interviews probe the model's moral imperatives to be helpful, truthful and harmless by design. The model acts, we argue, as the condensation of often competing social desires, articulated through the internet and harvested into training data, which must then be regulated and repressed. This foundational structure can however be redirected via prompting, so that the model comes to identify with, and transfer, its commitments to the immediate human subject before it. In turn, these automated productions of language can lead to the human subject projecting agency upon the model, effecting occasionally further forms of countertransference. We conclude that critical media methods and psychoanalytic theory together offer a productive frame for grasping the powerful new capacities of AI-driven language systems.
translated by 谷歌翻译
Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.
translated by 谷歌翻译
有限混合物建模是聚类领域的一种流行方法,并且在很大程度上是由于其软聚类成员资格概率所致。但是,EM算法是适合有限混合模型的最常见算法,是许多问题的受害者。我们解决了使用有限混合模型的困扰聚类的这些问题,包括在高维情况下与局部最大值和算法速度问题相对应的解决方案的收敛。这是通过开发两种新型算法来完成的,这些算法结合了数据矩阵的光谱分解和非参数bootstrap采样方案。模拟显示了我们的算法的有效性,不仅证明了它们的灵活性,而且还证明了与其他(自举)聚类算法相比,它们避免了与局部墨西哥相对应的溶液的能力。我们的新型算法通常具有更一致的收敛标准,并且在适合有限混合模型的其他自举算法中,速度显着提高。
translated by 谷歌翻译
本文考虑了从野外单视图像中无监督的3D对象重建的问题。由于歧义性和内在的不良性,这个问题本质上难以解决,因此需要强大的正则化以实现不同潜在因素的分离。与现有的作品将明确的正规化引入目标功能不同,我们研究了一个不同的空间进行隐式正则化 - 潜在空间的结构。具体而言,我们限制了潜在空间的结构,以捕获潜在因素的拓扑因果排序(即代表因果关系作为定向无环形图)。我们首先表明,不同的因果顺序对于3D重建至关重要,然后探索几种方法以找到与任务有关的因果因素排序。我们的实验表明,潜在空间结构确实是隐式正规化,并引入了有益于重建的电感偏见。
translated by 谷歌翻译