Human visual systems are robust to a wide range of image transformations thatare challenging for artificial networks. We present the first study of imagemodel robustness to the minute transformations found across video frames, whichwe term "natural robustness". Compared to previous studies on adversarialexamples and synthetic distortions, natural robustness captures a more diverseset of common image transformations that occur in the natural environment. Ourstudy across a dozen model architectures shows that more accurate models aremore robust to natural transformations, and that robustness to synthetic colordistortions is a good proxy for natural robustness. In examining brittleness invideos, we find that majority of the brittleness found in videos lies outsidethe typical definition of adversarial examples (99.9\%). Finally, weinvestigate training techniques to reduce brittleness and find that no singletechnique systematically improves natural robustness across twelve testedarchitectures.
translated by 谷歌翻译