一个沿着城市街道行走的人试图对世界各个方面进行建模,这很快就会被许多商店,汽车和人们遵循自己的复杂且难以理解的动态所淹没。在这种环境中的探索和导航是一项日常任务,不需要大量精神资源。是否可以将这种感官信息的消防软管转变为最小的潜在状态,这是代理在世界上成功采取行动的必要和足够的?我们具体地提出了这个问题,并提出了可控制的状态发现算法(AC-State),该算法具有理论保证,并且实际上被证明可以发现\ textit {最小可控的潜在状态},其中包含所有用于控制控制的信息代理,同时完全丢弃所有无关的信息。该算法由一个具有信息瓶颈的多步逆模型(预测遥远观察结果的动作)组成。 AC-State可以在没有奖励或示威的情况下实现本地化,探索和导航。我们证明了在三个领域中发现可控潜在状态的发现:将机器人组分散注意力(例如,照明条件和背景变化),与其他代理商一起在迷宫中进行探索,并在Matterport House Simulator中导航。
translated by 谷歌翻译
顺序决策中的一个核心问题是开发实用且计算上有效的算法,但支持灵活的通用模型的使用。关注上下文匪徒问题,最近的进度在可能的替代品数量(“动作”)很小时提供了可证明的有效算法,并具有很强的经验性能,但是在大型,连续的行动空间中进行决策的保证仍然难以捉摸,导致了重要的重要性理论与实践之间的差距。我们介绍了具有连续线性结构化作用空间的上下文匪徒的第一个有效的通用算法。我们的算法利用了(i)监督学习的计算序列,以及(ii)在动作空间上进行优化,并实现样本复杂性,运行时和内存,独立于动作空间的大小。此外,这是简单而实用的。我们进行大规模的经验评估,并表明我们的方法通常比标准基准相比具有较高的性能和效率。
translated by 谷歌翻译
考虑互动学习的问题设定(IGL),其中学习者的目标是与环境进行最佳互动,而无需明确的奖励以依靠其政策。代理商观察上下文向量,采取行动并接收反馈向量,并使用此信息有效地优化潜在奖励功能的策略。当反馈向量包含该动作时,事先分析的方法失败了,这在许多潜在方案中显着限制了IGL的成功,例如脑部计算机界面(BCI)或人类计算机界面(HCI)应用程序。我们通过创建算法和分析来解决这一问题,该算法和分析即使反馈向量包含以任何方式编码的动作,允许IGL起作用。我们根据监督数据集提供理论保证和大规模实验,以证明新方法的有效性。
translated by 谷歌翻译
在现实世界的强化学习应用中,学习者的观察空间无处不在,有关手头任务的相关信息和无关紧要。从高维观察中学习一直是监督学习和统计数据(例如,通过稀疏性)进行广泛研究的主题,但是即使在有限的状态/行动(表格)领域,也不能很好地理解强化学习中的类似问题。我们引入了一个新的问题设置,用于增强学习,即马尔可夫决策过程(EXOMDP),其中状态空间将(未知)分解成一个小的(或内源性)组件,并且很大的无关(或外源)组件;外源成分独立于学习者的行为,但以任意的,时间相关的方式演变。我们提供了一种新的算法Exorl,该算法学习了一种近乎最佳的政策,其样品复杂性在内源性组件的大小中多项式,几乎独立于外源成分的大小,从而提供了一个双重指数的改进算法。我们的结果首次突出了在存在外源信息的情况下首次可以进行样品高效的增强学习,并为未来的调查提供了简单,用户友好的基准。
translated by 谷歌翻译
大规模的机器学习系统通常涉及分布在用户集合中的数据。联合学习算法通过将模型更新传达给中央服务器而不是整个数据集来利用此结构。在本文中,我们研究了一个个性化联合学习设置的随机优化算法,涉及符合用户级别(联合)差异隐私的本地和全球模型。在学习私人全球模型的同时,促进了隐私成本,但本地学习是完全私人的。我们提供概括保证,表明与私人集中学习协调本地学习可以产生一种普遍有用和改进的精度和隐私之间的权衡。我们通过有关合成和现实世界数据集的实验来说明我们的理论结果。
translated by 谷歌翻译
We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier with the lowest (empirical) error subject to the desired constraints. We introduce two reductions that work for any representation of the cost-sensitive classifier and compare favorably to prior baselines on a variety of data sets, while overcoming several of their disadvantages.
translated by 谷歌翻译
This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank , that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings. Our second contribution is a new reinforcement learning algorithm that engages in systematic exploration to learn contextual decision processes with low Bellman rank. Our algorithm provably learns near-optimal behavior with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for reinforcement learning with function approximation.
translated by 谷歌翻译
Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation.In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译