We introduce a new RL problem where the agent is required to generalize to apreviously-unseen environment characterized by a subtask graph which describesa set of subtasks and their dependencies. Unlike existing hierarchicalmultitask RL approaches that explicitly describe what the agent should do at ahigh level, our problem only describes properties of subtasks and relationshipsamong them, which requires the agent to perform complex reasoning to find theoptimal subtask to execute. To solve this problem, we propose a neural subtaskgraph solver (NSGS) which encodes the subtask graph using a recursive neuralnetwork embedding. To overcome the difficulty of training, we propose a novelnon-parametric gradient-based policy, graph reward propagation, to pre-trainour NSGS agent and further finetune it through actor-critic method. Theexperimental results on two 2D visual domains show that our agent can performcomplex reasoning to find a near-optimal way of executing the subtask graph andgeneralize well to the unseen subtask graphs. In addition, we compare our agentwith a Monte-Carlo tree search (MCTS) method showing that our method is muchmore efficient than MCTS, and the performance of NSGS can be further improvedby combining it with MCTS.
translated by 谷歌翻译