Esports, a sports competition using video games, has become one of the most important sporting events in recent years. Although the amount of esports data is increasing than ever, only a small fraction of those data accompanies text commentaries for the audience to retrieve and understand the plays. Therefore, in this study, we introduce a task of generating game commentaries from structured data records to address the problem. We first build a large-scale esports data-to-text dataset using structured data and commentaries from a popular esports game, League of Legends. On this dataset, we devise several data preprocessing methods including linearization and data splitting to augment its quality. We then introduce several baseline encoder-decoder models and propose a hierarchical model to generate game commentaries. Considering the characteristics of esports commentaries, we design evaluation metrics including three aspects of the output: correctness, fluency, and strategic depth. Experimental results on our large-scale esports dataset confirmed the advantage of the hierarchical model, and the results revealed several challenges of this novel task.
translated by 谷歌翻译
Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer') highlights videos with transcribed live commentaries in English. As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding. We also provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation. Results show that SOTA models perform reasonably well in most tasks. We discuss the implications of these results and suggest new tasks for which GOAL can be used. Our codebase is available at: https://gitlab.com/grounded-sport-convai/goal-baselines.
translated by 谷歌翻译
具有复制机制的最近神经序列到序列模型在各种文本生成任务中取得了显着的进展。这些模型解决了词汇问题,并促进了稀有词的产生。然而,如先前的复制模型所观察到的,难以产生的,难以产生和缺乏抽象,难以识别。在本文中,我们提出了一种副本网络的新颖监督方法,该方法可帮助模型决定需要复制哪些单词并需要生成。具体而言,我们重新定义目标函数,它利用源序列和目标词汇表作为复制的指导。关于数据到文本生成和抽象总结任务的实验结果验证了我们的方法提高了复制质量,提高了抽象程度。
translated by 谷歌翻译
体育比赛摘要旨在从实时评论产生体育新闻。但是,现有数据集全部通过自动收集和清洁过程构建,导致大量噪音。此外,目前的作品忽视了现场评论和体育新闻之间的知识差距,这限制了体育比赛摘要的表现。在本文中,我们介绍了K-Sportssum,一个具有两个特征的新数据集:(1)K-Sportssum从大规模游戏中收集大量数据。它有7,854个评论新闻性对。为了提高质量,K-Sportssum采用手动清洁过程; (2)与现有数据集不同,为了缩小知识缺口,K-Sportssum进一步提供了一个大型知识语料库,其中包含523名运动队和14,724名体育运动者的信息。此外,我们还介绍了一个知识增强的摘要,它利用实时评论和知识来生成体育新闻。关于K-Sportssum和Sportssum数据集的广泛实验表明,我们的模型实现了新的最先进的表演。定性分析和人类研究进一步验证我们的模型产生更具信息丰富的体育新闻。
translated by 谷歌翻译
本文研究了体育视频上自动化机器描述的建模,最近取得了很多进展。尽管如此,最新的方法还没有捕捉人类专家如何分析体育场景。有几个主要原因:(1)使用的数据集是从非官方提供商那里收集的,该数据集自然会在这些数据集和现实世界应用程序训练的模型之间造成差距; (2)先前提出的方法需要广泛的注释工作(即,像素级别的玩家和球分割)在本地化有用的视觉特征上以产生可接受的结果; (3)很少有公共数据集可用。在本文中,我们提出了一个新颖的大型NBA数据集,用于体育视频分析(NSVA),重点是字幕,以应对上述挑战。我们还设计了一种统一的方法,将原始视频处理成一堆有意义的功能,并以最小的标签工作进行了处理,这表明使用变压器体系结构对此类功能进行交叉建模会导致强大的性能。此外,我们通过解决了另外两个任务,即精细的运动动作识别和显着的球员识别,证明了NSVA的广泛应用。代码和数据集可在https://github.com/jackwu502/nsva上找到。
translated by 谷歌翻译
与人类沟通对AIS有挑战性,因为它需要对世界的共同理解,复杂的语义(例如,隐喻或类似物),并且在多码模态手势(例如,指向手指,或图中的箭头)。我们在基于图案的基础上的绘画和猜测的语境中调查了这些挑战,这对研究界构成了一种新的挑战。在ICONARY中,猜测者试图通过编写图标来识别抽屉绘制的短语,以及抽屉迭代地修改绘图以帮助猜测响应的猜测。这次来回经常使用规范场景,视觉隐喻或图标组成来表达具有挑战性的词语,使其成为AI中混合语言和视觉/象征性通信的理想测试。我们提出模型进行图标,并在人类球员之间的55,000多场比赛中培训。我们的型号是熟练的玩家,能够在语言模型中雇用世界知识,以便在训练期间与看不见的文字一起玩。精英人类球员优于我们的模型,特别是在绘图任务中,留下了未来研究的重要缺口。我们将数据集,代码和评估设置释放为对社区的挑战http://www.github.com/allenai/conary。
translated by 谷歌翻译
体育由于其全球影响力和影响力丰富的预测任务,是部署机器学习模型的令人兴奋的领域。但是,由于其规模,准确性和可访问性,传统运动的数据通常不适合研究使用。为了解决这些问题,我们转向电子竞技,这是一个越来越多的域,它涵盖了类似于传统体育的视频游戏。由于电子竞技数据是通过服务器日志而不是外围传感器获取的,因此电子竞技提供了一个独特的机会来获得大量清洁和详细的时空数据,类似于传统运动中收集的数据。为了解析电子竞技数据,我们开发了AWPY,这是一个开源电子竞技游戏日志解析库,可以从游戏日志中提取玩家轨迹和动作。使用AWPY,我们可以从1,558个游戏日志中解析86万动作,79万游戏帧和417K轨迹,从专业的反击比赛中创建电子竞技轨迹和动作(ESTA)数据集。埃斯塔(ESTA)是迄今为止最大,最颗粒状的公共运动数据集之一。我们使用ESTA来开发基准,以使用特定于玩家的信息进行赢得预测。 ESTA数据可在https://github.com/pnxenopoulos/esta上获得,并且AWPY通过PYPI公开。
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
这项研究旨在实现两个目标:第一个目标是策划一个大型且信息丰富的数据集,其中包含有关球员的行动和位置的关键和简洁的摘要,以及在专业和NCAA中排球的来回旅行模式Div-i室内排球游戏。尽管几项先前的研究旨在为其他运动创建类似的数据集(例如羽毛球和足球),但尚未实现为室内排球创建这样的数据集。第二个目标是引入排球描述性语言,以充分描述游戏中的集会过程并将语言应用于我们的数据集。基于精选的数据集和我们的描述性运动语言,我们使用我们的数据集介绍了三项用于自动化排球行动和战术分析的任务:(1)排球拉力赛预测,旨在预测集会的结果,并帮助球员和教练改善决策制定决策在实践中,(2)设置类型和命中类型预测,以帮助教练和球员更有效地为游戏做准备,以及(3)排球策略和进攻区统计,以提供高级排球统计数据,并帮助教练了解游戏和对手的策略更好的。我们进行了案例研究,以展示实验结果如何为排球分析社区提供见解。此外,基于现实世界数据的实验评估为我们的数据集和语言的未来研究和应用建立了基准。这项研究弥合了室内排球场与计算机科学之间的差距。
translated by 谷歌翻译
上下文:堆栈溢出对于寻求编程问题答案的软件开发人员非常有帮助。先前的研究表明,越来越多的问题质量低,因此从潜在的答案者那里获得了更少的关注。 Gao等。提出了一个基于LSTM的模型(即BilstM-CC),以自动从代码片段中生成问题标题,以提高问题质量。但是,只有在问题主体中使用代码段无法为标题生成提供足够的信息,而LSTMS无法捕获令牌之间的远程依赖性。目的:本文提出了基于深度学习的新型模型CCBERT,旨在通过充分利用整个问题主体的双模式信息来增强问题标题生成的性能。方法:CCBERT遵循编码器范式范式,并使用Codebert将问题主体编码为隐藏的表示形式,堆叠的变压器解码器以生成预测的代币,以及附加的复制注意层来完善输出分布。编码器和解码器都执行多头自我注意操作,以更好地捕获远程依赖性。本文构建了一个数据集,该数据集包含大约200,000个高质量问题,该数据从Stack Overflow正式发布的数据中滤除,以验证CCBERT模型的有效性。结果:CCBERT优于数据集上的所有基线模型。对仅代码和低资源数据集进行的实验表明,CCBERT的优势性能较小。人类评估还显示了CCBERT关于可读性和相关标准的出色表现。
translated by 谷歌翻译
基于文本的游戏(TBG)是复杂的环境,允许用户或计算机代理进行文本交互并实现游戏目标。为基于文本的游戏构建面向目标的计算机代理是一项挑战,尤其是当我们使用逐步反馈作为模型的唯一文本输入时。此外,代理商很难通过从更大的文本输入空间中评估灵活的长度和形式。在本文中,我们对应用于基于文本的游戏字段的深度学习方法进行了广泛的分析。
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
由于免费的在线百科全书具有大量内容,因此Wikipedia和Wikidata是许多自然语言处理(NLP)任务的关键,例如信息检索,知识基础构建,机器翻译,文本分类和文本摘要。在本文中,我们介绍了Wikides,这是一个新颖的数据集,用于为文本摘要问题提供Wikipedia文章的简短描述。该数据集由6987个主题上的80K英语样本组成。我们设置了一种两阶段的摘要方法 - 描述生成(I阶段)和候选排名(II阶段)作为一种依赖于转移和对比学习的强大方法。对于描述生成,与其他小规模的预训练模型相比,T5和BART表现出了优越性。通过将对比度学习与Beam Search的不同输入一起应用,基于度量的排名模型优于直接描述生成模型,在主题独立拆分和独立于主题的独立拆分中,最高可达22个胭脂。此外,第II期中的结果描述得到了人类评估的支持,其中45.33%以上,而I阶段的23.66%则支持针对黄金描述。在情感分析方面,生成的描述无法有效地从段落中捕获所有情感极性,同时从黄金描述中更好地完成此任务。自动产生的新描述减少了人类为创建它们的努力,并丰富了基于Wikidata的知识图。我们的论文对Wikipedia和Wikidata产生了实际影响,因为有成千上万的描述。最后,我们预计Wikides将成为从短段落中捕获显着信息的相关作品的有用数据集。策划的数据集可公开可用:https://github.com/declare-lab/wikides。
translated by 谷歌翻译
Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.
translated by 谷歌翻译
The highest grossing media franchise of all times, with over \$90 billion in total revenue, is Pokemon. The video games belong to the class of Japanese Role Playing Games (J-RPG). Developing a powerful AI agent for these games is very hard because they present big challenges to MinMax, Monte Carlo Tree Search and statistical Machine Learning, as they are vastly different from the well explored in AI literature games. An AI agent for one of these games means significant progress in AI agents for the entire class. Further, the key principles of such work can hopefully inspire approaches to several domains that require excellent teamwork under conditions of extreme uncertainty, including managing a team of doctors, robots or employees in an ever changing environment, like a pandemic stricken region or a war-zone. In this paper we first explain the mechanics of the game and we perform a game analysis. We continue by proposing unique AI algorithms based on our understanding that the two biggest challenges in the game are keeping a balanced team and dealing with three sources of uncertainty. Later on, we describe why evaluating the performance of such agents is challenging and we present the results of our approach. Our AI agent performed significantly better than all previous attempts and peaked at the 33rd place in the world, in one of the most popular battle formats, while running on only 4 single socket servers.
translated by 谷歌翻译
尽管深度学习已被广​​泛用于视频分析,例如视频分类和动作检测,但与体育视频的快速移动主题进行密集的动作检测仍然具有挑战性。在这项工作中,我们发布了另一个体育视频数据集$ \ textbf {p $^2 $ a} $ for $ \ usewessline {p} $ \ in $ \ usepline {p} $ ong- $ \ $ \ usepline {a} $ ction ction ction检测,由2,721个视频片段组成,这些视频片段从世界乒乓球锦标赛和奥林匹克运动会的专业乒乓球比赛的广播视频中收集。我们与一批乒乓球专业人士和裁判员合作,以获取出现在数据集中的每个乒乓球动作,并提出两组动作检测问题 - 行动定位和行动识别。我们使用$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ fextbf {p $^2 $^2 $^2 $ a^2 $^2 $ a^2 $^2 $ a^2 $ a^2 $^$^2 $ a^2 $^2 $ a^2 $^2 $ a^2 $^2 $ a^2 $^2 $^2 $ a^2 $^2 $ a^2 $^2 $^2 $^2 $^2 $^2 $^2 $ a在各种设置下,这两个问题的$} $。这些模型只能在AR-AN曲线下实现48%的面积,以进行本地化,而识别次数为82%,因为Ping-Pong的动作密集具有快速移动的主题,但广播视频仅为25 fps。结果证实,$ \ textbf {p $^2 $ a} $仍然是一项具有挑战性的任务,可以用作视频中动作检测的基准。
translated by 谷歌翻译
通常通过过去的选择来告知机器学习中的评估,例如要使用哪些数据集或指标。该标准化可以使用排行榜对平等基础进行比较,但是随着出现更好的替代方案,评估选择变得不佳。这个问题在自然语言生成中尤其相关,该语言需要不断改善的数据集,指标和人类评估以提出确定性的主张。为了使遵循最佳模型评估实践更加容易,我们介绍了GEMV2。新版本的一代,评估和指标基准为数据集,模型和指标开发人员提供了模块化基础架构,以使彼此受益。GEMV2支持40种记录的数据集中51种语言。所有数据集的模型都可以在线评估,我们的交互式数据卡创建和渲染工具使得在Living Benchmark中添加新数据集变得更加容易。
translated by 谷歌翻译
通过基于文本的符号表示棋盘游戏及其位置,可以实现NLP应用程序的可能性。语言模型可以帮助您深入了解各种有趣的问题,例如游戏的无监督学习规则,检测玩家的行为模式,玩家归因,并最终学习游戏以击败最新技术。在这项研究中,我们将BERT模型应用于简单的NIM游戏,以在噪音的存在下进行几次学习架构的噪声分析。我们通过三个虚拟玩家,即Nim Guru,Random Player和Q-Learner分析了模型性能。在第二部分中,我们将游戏学习语言模型应用于国际象棋游戏,以及一系列带有详尽百科全书开口的大师游戏。最后,我们已经表明,模型实际上可以学习国际象棋游戏的规则,并且可以在类别的评分级别上与Stockfish一起生存。
translated by 谷歌翻译
The internet has had a dramatic effect on the healthcare industry, allowing documents to be saved, shared, and managed digitally. This has made it easier to locate and share important data, improving patient care and providing more opportunities for medical studies. As there is so much data accessible to doctors and patients alike, summarizing it has become increasingly necessary - this has been supported through the introduction of deep learning and transformer-based networks, which have boosted the sector significantly in recent years. This paper gives a comprehensive survey of the current techniques and trends in medical summarization
translated by 谷歌翻译
定义生成任务旨在自动在特定上下文中生成一个单词的定义。但是,由于缺乏针对不同复杂性的数据集,模型产生的定义往往会保持相同的复杂度。本文提出了为具有可控复杂性级别的单词生成定义的新任务。相应地,我们介绍了编译,一个数据集给出了有关中国定义的详细信息,并且每个定义都标有其复杂性级别。编译数据集包括74,303个单词和106,882个定义。据我们所知,它是中国定义生成任务的最大数据集。我们选择各种代表性生成方法作为此任务的基准和进行评估,这说明我们的数据集在协助模型生成不同的复杂性级别定义方面发挥了出色的作用。我们认为,编译数据集将使复杂性可控定义生成的进一步研究受益。
translated by 谷歌翻译