The mutual information is a core statistical quantity that has applicationsin all areas of machine learning, whether this is in training of density modelsover multiple data modalities, in maximising the efficiency of noisytransmission channels, or when learning behaviour policies for exploration byartificial agents. Most learning algorithms that involve optimisation of themutual information rely on the Blahut-Arimoto algorithm --- an enumerativealgorithm with exponential complexity that is not suitable for modern machinelearning applications. This paper provides a new approach for scalableoptimisation of the mutual information by merging techniques from variationalinference and deep learning. We develop our approach by focusing on the problemof intrinsically-motivated learning, where the mutual information forms thedefinition of a well-known internal drive known as empowerment. Using avariational lower bound on the mutual information, combined with convolutionalnetworks for handling visual input streams, we develop a stochasticoptimisation algorithm that allows for scalable information maximisation andempowerment-based reasoning directly from pixels to actions.
translated by 谷歌翻译