In this work we present a framework for the recognition of natural scenetext. Our framework does not require any human-labelled data, and performs wordrecognition on the whole image holistically, departing from the character basedrecognition systems of the past. The deep neural network models at the centreof this framework are trained solely on data produced by a synthetic textgeneration engine -- synthetic data that is highly realistic and sufficient toreplace real data, giving us infinite amounts of training data. This excess ofdata exposes new possibilities for word recognition models, and here weconsider three models, each one "reading" words in a different way: via 90k-waydictionary encoding, character sequence encoding, and bag-of-N-grams encoding.In the scenarios of language based and completely unconstrained textrecognition we greatly improve upon state-of-the-art performance on standarddatasets, using our fast, simple machinery and requiring zero data-acquisitioncosts.
translated by 谷歌翻译