Cross-language learning allows us to use training data from one language tobuild models for a different language. Many approaches to bilingual learningrequire that we have word-level alignment of sentences from parallel corpora.In this work we explore the use of autoencoder-based methods for cross-languagelearning of vectorial word representations that are aligned between twolanguages, while not relying on word-level alignments. We show that by simplylearning to reconstruct the bag-of-words representations of aligned sentences,within and between languages, we can in fact learn high-quality representationsand do without word alignments. Since training autoencoders on wordobservations presents certain computational issues, we propose and comparedifferent variations adapted to this setting. We also propose an explicitcorrelation maximizing regularizer that leads to significant improvement in theperformance. We empirically investigate the success of our approach on theproblem of cross-language test classification, where a classifier trained on agiven language (e.g., English) must learn to generalize to a different language(e.g., German). These experiments demonstrate that our approaches arecompetitive with the state-of-the-art, achieving up to 10-14 percentage pointimprovements over the best reported results on this task.
translated by 谷歌翻译