Tracking-by-detection methods have demonstrated competitive performance inrecent years. In these approaches, the tracking model heavily relies on thequality of the training set. Due to the limited amount of labeled trainingdata, additional samples need to be extracted and labeled by the trackeritself. This often leads to the inclusion of corrupted training samples, due toocclusions, misalignments and other perturbations. Existingtracking-by-detection methods either ignore this problem, or employ a separatecomponent for managing the training set. We propose a novel generic approach for alleviating the problem of corruptedtraining samples in tracking-by-detection frameworks. Our approach dynamicallymanages the training set by estimating the quality of the samples. Contrary toexisting approaches, we propose a unified formulation by minimizing a singleloss over both the target appearance model and the sample quality weights. Thejoint formulation enables corrupted samples to be down-weighted whileincreasing the impact of correct ones. Experiments are performed on threebenchmarks: OTB-2015 with 100 videos, VOT-2015 with 60 videos, and Temple-Colorwith 128 videos. On the OTB-2015, our unified formulation significantlyimproves the baseline, with a gain of 3.8% in mean overlap precision. Finally,our method achieves state-of-the-art results on all three datasets. Code andsupplementary material are available athttp://www.cvl.isy.liu.se/research/objrec/visualtracking/decontrack/index.html .
translated by 谷歌翻译