Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
深度神经网络用于图像识别任务(例如预测笑脸)的性能会以代表性不足的敏感属性类别降低。我们通过基于人口统计学奇偶校验,均衡赔率和新型的联合会措施的批估计估计来引入公平意识的正规化损失来解决这个问题。对Celeba,UTKFACE和SIIM-ISIC黑色素瘤分类挑战的面部和医学图像进行的实验表明,我们提出的公平性损失对偏置缓解的有效性,因为它们可以改善模型公平,同时保持高分类性能。据我们所知,我们的工作是首次尝试将这些类型的损失纳入端到端培训方案,以减轻视觉属性预测指标的偏见。我们的代码可在https://github.com/nish03/fvap上找到。
translated by 谷歌翻译
在现代的NLP应用程序中,单词嵌入是一个至关重要的主链,可以在许多任务中很容易共享。但是,随着文本分布的变化和单词语义随着时间的推移而发展,如果单词表示不符合数据漂移,则使用嵌入的下游应用程序可能会受到影响。因此,将单词嵌入与基础数据分布保持一致是关键问题。在这项工作中,我们解决了这个问题,并提出了Transdrift,这是一个基于变压器的嵌入预测模型。利用变压器的灵活性,我们的模型准确地了解了嵌入漂移的动力学,并预测了未来的嵌入。在实验中,我们与现有方法进行了比较,并表明我们的模型比基线更准确地预测了嵌入一词的准确预测。至关重要的是,通过将预测的嵌入作为下游分类任务的骨架,我们表明,与先前的方法相比,我们的嵌入会导致卓越的性能。
translated by 谷歌翻译
Tryongan是最近的一个虚拟试验方法,它产生了高度现实的图像和优于最先前的方法。在本文中,我们沿着不同角度重现了Tryongan实施和探测:转移学习的影响,调节图像生成的变体与潜空间插值的性质。其中一些方面从未在文学中探讨过。我们发现转移最初有助于培训,但由于模型火车更长并且通过连接构成调节的延长而损失的收益表现更好。潜伏空间自我解除姿势和风格的功能,并横跨姿势传输。我们的代码和型号可在开源中提供。
translated by 谷歌翻译
近似组合优化已成为量子计算机最有前途的应用领域之一,特别是近期的应用领域之一。在这项工作中,我们专注于求解最大切割问题的量子近似优化算法(QAOA)。具体而言,我们解决了QAOA中的两个问题,如何选择初始参数,以及如何随后培训参数以找到最佳解决方案。对于前者来说,我们将图形神经网络(GNN)作为QAOA参数的初始化例程,在热启动技术中添加到文献。我们不仅显示了GNN方法概括,而且不仅可以增加图形尺寸,还可以增加图形大小,这是其他热启动技术无法使用的功能。为了培训QAOA,我们测试了几个优化员以获得MaxCut问题。这些包括在文献中提出的量子感知/不可知论者,我们还包括机器学习技术,如加强和元学习。通过纳入这些初始化和优化工具包,我们展示了如何培训QAOA作为端到端可分散的管道。
translated by 谷歌翻译
归一化流是突出的深层生成模型,提供了易诊的概率分布和有效密度估计。但是,众所周知,在检测到分配(OOD)输入时,它们是众所周知的,因为它们直接在其潜在空间中对输入表示的本地特征进行了编码。在本文中,我们通过演示流动,如果通过注意机制延伸,可以通过表明流动,可以可靠地检测到包括对抗攻击的异常值。我们的方法不需要对培训的异常数据,并通过在多样化的实验设置中报告最先进的性能来展示我们的ood检测方法的效率。代码在https://github.com/computationalradiationphysphysics/inflow上提供。
translated by 谷歌翻译
检测,预测和减轻交通拥堵是针对改善运输网络的服务水平的目标。随着对更高分辨率的更大数据集的访问,深度学习对这种任务的相关性正在增加。近年来几篇综合调查论文总结了运输领域的深度学习应用。然而,运输网络的系统动态在非拥挤状态和拥塞状态之间变化大大变化 - 从而需要清楚地了解对拥堵预测特异性特异性的挑战。在这项调查中,我们在与检测,预测和缓解拥堵相关的任务中,介绍了深度学习应用的当前状态。重复和非经常性充血是单独讨论的。我们的调查导致我们揭示了当前研究状态的固有挑战和差距。最后,我们向未来的研究方向提出了一些建议,因为所确定的挑战的答案。
translated by 谷歌翻译
Cooperative multi-agent reinforcement learning (MARL) has achieved significant results, most notably by leveraging the representation-learning abilities of deep neural networks. However, large centralized approaches quickly become infeasible as the number of agents scale, and fully decentralized approaches can miss important opportunities for information sharing and coordination. Furthermore, not all agents are equal -- in some cases, individual agents may not even have the ability to send communication to other agents or explicitly model other agents. This paper considers the case where there is a single, powerful, \emph{central agent} that can observe the entire observation space, and there are multiple, low-powered \emph{local agents} that can only receive local observations and are not able to communicate with each other. The central agent's job is to learn what message needs to be sent to different local agents based on the global observations, not by centrally solving the entire problem and sending action commands, but by determining what additional information an individual agent should receive so that it can make a better decision. In this work we present our MARL algorithm \algo, describe where it would be most applicable, and implement it in the cooperative navigation and multi-agent walker domains. Empirical results show that 1) learned communication does indeed improve system performance, 2) results generalize to heterogeneous local agents, and 3) results generalize to different reward structures.
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.
translated by 谷歌翻译