Humans spend a remarkable fraction of waking life engaged in acts of "mentaltime travel". We dwell on our actions in the past and experience satisfactionor regret. More than merely autobiographical storytelling, we use these eventrecollections to change how we will act in similar scenarios in the future.This process endows us with a computationally important ability to link actionsand consequences across long spans of time, which figures prominently inaddressing the problem of long-term temporal credit assignment; in artificialintelligence (AI) this is the question of how to evaluate the utility of theactions within a long-duration behavioral sequence leading to success orfailure in a task. Existing approaches to shorter-term credit assignment in AIcannot solve tasks with long delays between actions and consequences. Here, weintroduce a new paradigm for reinforcement learning where agents use recall ofspecific memories to credit actions from the past, allowing them to solveproblems that are intractable for existing algorithms. This paradigm broadensthe scope of problems that can be investigated in AI and offers a mechanisticaccount of behaviors that may inspire computational models in neuroscience,psychology, and behavioral economics.
translated by 谷歌翻译