We propose a distributed architecture for deep reinforcement learning atscale, that enables agents to learn effectively from orders of magnitude moredata than previously possible. The algorithm decouples acting from learning:the actors interact with their own instances of the environment by selectingactions according to a shared neural network, and accumulate the resultingexperience in a shared experience replay memory; the learner replays samples ofexperience and updates the neural network. The architecture relies onprioritized experience replay to focus only on the most significant datagenerated by the actors. Our architecture substantially improves the state ofthe art on the Arcade Learning Environment, achieving better final performancein a fraction of the wall-clock training time.
translated by 谷歌翻译