Pixel-level labelling tasks, such as semantic segmentation, play a centralrole in image understanding. Recent approaches have attempted to harness thecapabilities of deep learning techniques for image recognition to tacklepixel-level labelling tasks. One central issue in this methodology is thelimited capacity of deep learning techniques to delineate visual objects. Tosolve this problem, we introduce a new form of convolutional neural networkthat combines the strengths of Convolutional Neural Networks (CNNs) andConditional Random Fields (CRFs)-based probabilistic graphical modelling. Tothis end, we formulate mean-field approximate inference for the ConditionalRandom Fields with Gaussian pairwise potentials as Recurrent Neural Networks.This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain adeep network that has desirable properties of both CNNs and CRFs. Importantly,our system fully integrates CRF modelling with CNNs, making it possible totrain the whole deep network end-to-end with the usual back-propagationalgorithm, avoiding offline post-processing methods for object delineation. Weapply the proposed method to the problem of semantic image segmentation,obtaining top results on the challenging Pascal VOC 2012 segmentationbenchmark.
translated by 谷歌翻译