We present an active detection model for localizing objects in scenes. Themodel is class-specific and allows an agent to focus attention on candidateregions for identifying the correct location of a target object. This agentlearns to deform a bounding box using simple transformation actions, with thegoal of determining the most specific location of target objects followingtop-down reasoning. The proposed localization agent is trained using deepreinforcement learning, and evaluated on the Pascal VOC 2007 dataset. We showthat agents guided by the proposed model are able to localize a single instanceof an object after analyzing only between 11 and 25 regions in an image, andobtain the best detection results among systems that do not use objectproposals for object localization.
translated by 谷歌翻译