Human and robot partners increasingly need to work together to perform tasks as a team. Robots designed for such collaboration must reason about how their task-completion strategies interplay with the behavior and skills of their human team members as they coordinate on achieving joint goals. Our goal in this work is to develop a computational framework for robot adaptation to human partners in human-robot team collaborations. We first present an algorithm for autonomously recognizing available task-completion strategies by observing human-human teams performing a collaborative task. By transforming team actions into low dimensional representations using hidden Markov models, we can identify strategies without prior knowledge. Robot policies are learned on each of the identified strategies to construct a Mixture-of-Experts model that adapts to the task strategies of unseen human partners. We evaluate our model on a collaborative cooking task using an Overcooked simulator. Results of an online user study with 125 participants demonstrate that our framework improves the task performance and collaborative fluency of human-agent teams, as compared to state of the art reinforcement learning methods.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
虽然多代理学习的进步使得能够培训越来越复杂的代理商,但大多数现有技术都产生了最终政策,该政策不旨在适应新的合作伙伴的战略。但是,我们希望我们的AI代理商根据周围的战略来调整他们的战略。在这项工作中,我们研究了有条件的多代理模仿学习问题,我们可以在培训时间访问联合轨迹演示,我们必须在测试时间与新合作伙伴进行互动并适应新伙伴。这种环境是具有挑战性的,因为我们必须推断新的合作伙伴的战略并使我们的政策适应该战略,而不是了解环境奖励或动态。我们将该条件多代理模仿学习的问题正式化,提出了一种解决可扩展性和数据稀缺的困难的新方法。我们的主要洞察力是,多种代理游戏的合作伙伴的变化通常很高,并且可以通过低秩子空间来表示。利用张量分解的工具,我们的模型在EGO和合作伙伴代理战略上学习了低秩子空间,然后是infers并通过插值在子空间中互动到新的合作伙伴策略。我们用混合协作任务的实验,包括匪徒,粒子和Hanabi环境。此外,我们还测试我们对超级烹饪游戏的用户学习中的真实人体合作​​伙伴的条件政策。与基线相比,我们的模型更好地适应新的合作伙伴,并强大地处理各种设置,从离散/持续的动作和静态/在线评估与AI / Lean Partners。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
When robots interact with humans in homes, roads, or factories the human's behavior often changes in response to the robot. Non-stationary humans are challenging for robot learners: actions the robot has learned to coordinate with the original human may fail after the human adapts to the robot. In this paper we introduce an algorithmic formalism that enables robots (i.e., ego agents) to co-adapt alongside dynamic humans (i.e., other agents) using only the robot's low-level states, actions, and rewards. A core challenge is that humans not only react to the robot's behavior, but the way in which humans react inevitably changes both over time and between users. To deal with this challenge, our insight is that -- instead of building an exact model of the human -- robots can learn and reason over high-level representations of the human's policy and policy dynamics. Applying this insight we develop RILI: Robustly Influencing Latent Intent. RILI first embeds low-level robot observations into predictions of the human's latent strategy and strategy dynamics. Next, RILI harnesses these predictions to select actions that influence the adaptive human towards advantageous, high reward behaviors over repeated interactions. We demonstrate that -- given RILI's measured performance with users sampled from an underlying distribution -- we can probabilistically bound RILI's expected performance across new humans sampled from the same distribution. Our simulated experiments compare RILI to state-of-the-art representation and reinforcement learning baselines, and show that RILI better learns to coordinate with imperfect, noisy, and time-varying agents. Finally, we conduct two user studies where RILI co-adapts alongside actual humans in a game of tag and a tower-building task. See videos of our user studies here:
translated by 谷歌翻译
Interaction and cooperation with humans are overarching aspirations of artificial intelligence (AI) research. Recent studies demonstrate that AI agents trained with deep reinforcement learning are capable of collaborating with humans. These studies primarily evaluate human compatibility through "objective" metrics such as task performance, obscuring potential variation in the levels of trust and subjective preference that different agents garner. To better understand the factors shaping subjective preferences in human-agent cooperation, we train deep reinforcement learning agents in Coins, a two-player social dilemma. We recruit participants for a human-agent cooperation study and measure their impressions of the agents they encounter. Participants' perceptions of warmth and competence predict their stated preferences for different agents, above and beyond objective performance metrics. Drawing inspiration from social science and biology research, we subsequently implement a new "partner choice" framework to elicit revealed preferences: after playing an episode with an agent, participants are asked whether they would like to play the next round with the same agent or to play alone. As with stated preferences, social perception better predicts participants' revealed preferences than does objective performance. Given these results, we recommend human-agent interaction researchers routinely incorporate the measurement of social perception and subjective preferences into their studies.
translated by 谷歌翻译
translated by 谷歌翻译
机器学习的最新进展导致人们对可解释的AI(XAI)的兴趣越来越大,使人类能够深入了解机器学习模型的决策。尽管最近有这种兴趣,但XAI技术的实用性尚未在人机组合中得到特征。重要的是,XAI提供了增强团队情境意识(SA)和共享心理模型发展的希望,这是有效的人机团队的关键特征。快速开发这种心理模型在临时人机团队中尤其重要,因为代理商对他人的决策策略没有先验知识。在本文中,我们提出了两个新颖的人类受试者实验,以量化在人机组合场景中部署XAI技术的好处。首先,我们证明XAI技术可以支持SA($ P <0.05)$。其次,我们研究了通过协作AI政策抽象诱导的不同SA级别如何影响临时人机组合绩效。重要的是,我们发现XAI的好处不是普遍的,因为对人机团队的组成有很大的依赖。新手受益于XAI提供增加的SA($ P <0.05 $),但容易受到认知开销的影响($ P <0.05 $)。另一方面,专家性能随着基于XAI的支持($ p <0.05 $)而降低,这表明关注XAI的成本超过了从提供的其他信息中获得的收益以增强SA所获得的收益。我们的结果表明,研究人员必须通过仔细考虑人机团队组成以及XAI方法如何增强SA来故意在正确的情况下设计和部署正确的XAI技术。
translated by 谷歌翻译
合作多代理设置中的标准问题设置是自我播放(SP),其目标是训练一个很好地合作的代理团队。但是,最佳SP政策通常包含任意惯例(“握手”),并且与其他受独立训练的代理商或人类不兼容。后者的Desiderata最近由Hu等人正式化。 2020年作为零射击协调(ZSC)设置,并以其其他游戏(OP)算法进行了部分解决,该算法在纸牌游戏Hanabi中显示出改进的ZSC和人类表现。 OP假设访问环境的对称性,并防止代理在训练过程中以相互不相容的方式破坏它们。但是,正如作者指出的那样,发现给定环境的对称性是一个计算困难的问题。取而代之的是,我们通过简单的K级推理(KLR)Costa Gomes等人表明。 2006年,我们可以同步训练所有级别,我们可以在哈纳比(Hanabi)获得竞争性的ZSC和临时团队表现,包括与类似人类的代理机器人配对。我们还引入了一种具有最佳响应(SYKLRBR)的新方法,即同步的K级推理,该方法通过共同培训最佳响应来进一步提高同步KLR的性能。
translated by 谷歌翻译
translated by 谷歌翻译
Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years, however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations; without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction and computer games to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this paper, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications and highlight current and future research directions.
translated by 谷歌翻译
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.
translated by 谷歌翻译
当机器人与人类伴侣互动时,这些合作伙伴通常会因机器人而改变其行为。一方面,这是具有挑战性的,因为机器人必须学会与动态合作伙伴进行协调。但是,另一方面 - 如果机器人理解这些动态 - 它可以利用自己的行为,影响人类,并指导团队进行有效的协作。先前的研究使机器人能够学会影响其他机器人或模拟药物。在本文中,我们将这些学习方法扩展到现在影响人类。使人类特别难影响的原因是 - 人类不仅对机器人做出反应 - 而且单个用户对机器人的反应可能会随着时间而改变,而且不同的人类会以不同的方式对相同的机器人行为做出反应。因此,我们提出了一种强大的方法,该方法学会影响不断变化的伴侣动态。我们的方法首先在重复互动中与一组合作伙伴进行训练,并学会根据以前的状态,行动和奖励来预测当前伙伴的行为。接下来,我们通过对机器人与原始合作伙伴学习的轨迹进行采样轨迹迅速适应了新合作伙伴,然后利用这些现有行为来影响新的合作伙伴动态。我们将最终的算法与跨模拟环境和用户研究进行比较,并在其中进行了机器人和参与者协作建造塔楼的用户研究。我们发现,即使合作伙伴遵循新的或意外的动态,我们的方法也优于替代方案。用户研究的视频可在此处获得:
translated by 谷歌翻译
当人类彼此合作时,他们经常通过观察他人来做出决定,并考虑到他们的行为可能在整个团队中的后果,而不是贪婪地做到最好的事情。我们希望我们的AI代理商通过捕获其合作伙伴的模型来有效地以类似的方式协作。在这项工作中,我们提出并分析了分散的多武装强盗(MAB)问题,耦合奖励作为更一般的多代理协作的抽象。我们展示了当申请分散的强盗团队时单代理最佳MAB算法的NA \“IVE扩展失败。相反,我们提出了一个合作伙伴感知策略,用于联合连续决策,这些策略扩展了众所周知的单王子的上置信度算法。我们分析表明,我们的拟议战略达到了对数遗憾,并提供了涉及人类AI和人机协作的广泛实验,以验证我们的理论发现。我们的结果表明,拟议的合作伙伴感知策略优于其他已知方法,以及我们的人类主题研究表明人类宁愿与实施我们合作伙伴感知战略的AI代理商合作。
translated by 谷歌翻译
translated by 谷歌翻译