In this paper, we analyze neural network-based dialogue systems trained in an end-to-end manner using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words 1. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering. We provide baselines in two different environments: one where models are trained to select the correct next response from a list of candidate responses, and one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation. These are both evaluated on a recall task that we call next utterance classification (NUC), and using vector-based metrics that capture the topicality of the responses. We observe that current end-to-end models are 1. This work is an extension of a paper appearing in SIGDIAL (Lowe et al., 2015). This paper further includes results on generative dialogue models, more extensive evaluation of the retrieval models using vector-based generative metrics, and a qualitative examination of responses from the generative models and classification errors made by the Dual Encoder model. Experiments are performed on a new version of the corpus, the Ubuntu Dialogue Corpus v2, which is publicly available: https://github.com/rkadlec/ubuntu-ranking-dataset-creator. The early dataset has been updated to add features and fix bugs, which are detailed in Section 3. This is an open-access article distributed under the terms of a Creative Commons Attribution License (http : //creativecommons.org/licenses/by/3.0/). LOWE, POW, SERBAN, CHARLINN, LIU AND PINEAU unable to completely solve these tasks; thus, we provide a qualitative error analysis to determine the primary causes of error for end-to-end models evaluated on NUC, and examine sample utterances from the generative models. As a result of this analysis, we suggest some promising directions for future research on the Ubuntu Dialogue Corpus, which can also be applied to end-to-end dialogue systems in general.
translated by 谷歌翻译