智能论文笔记

Modern French Poetry Generation with RoBERTa and GPT-2

Mika Hämäläinen , Khalid Alnajjar , Thierry Poibeau

分类：自然语言处理

2022-12-06

We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generation performance of GPT-2. Our evaluation shows that the model can create French poetry successfully. On a 5 point scale, the lowest score of 3.57 was given by human judges to typicality and emotionality of the output poetry while the best score of 3.79 was given to understandability.

translated by 谷歌翻译

Emotion Conditioned Creative Dialog Generation

Khalid Alnajjar , Mika Hämäläinen

分类：自然语言处理

2022-12-06

We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise. Our model is capable of producing a contextually apt response given an input sentence and a desired emotion label. Our model is capable of expressing the desired emotion with an accuracy of 0.6. The best performing emotions are neutral, fear and disgust. When measuring the strength of the expressed emotion, we find that anger, fear and disgust are expressed in the most strong fashion by the model.

translated by 谷歌翻译

Automatic Generation of Factual News Headlines in Finnish

Maximilian Koppatz , Khalid Alnajjar , Mika Hämäläinen , Thierry Poibeau

分类：自然语言处理

2022-12-05

We present a novel approach to generating news headlines in Finnish for a given news story. We model this as a summarization task where a model is given a news article, and its task is to produce a concise headline describing the main topic of the article. Because there are no openly available GPT-2 models for Finnish, we will first build such a model using several corpora. The model is then fine-tuned for the headline generation task using a massive news corpus. The system is evaluated by 3 expert journalists working in a Finnish media house. The results showcase the usability of the presented approach as a headline suggestion tool to facilitate the news production process.

translated by 谷歌翻译

Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

Mika Hämäläinen , Khalid Alnajjar , Thierry Poibeau

分类：自然语言处理

2022-12-05

We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game developers have preannotated every line of dialog in the game in one of the 8 different sentiments: \textit{anger, disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game has been translated into English, Spanish, German, French and Italian. We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models. In our experiments, multilingual BERT outperformed XLMRoBERTa for most of the languages, also language specific models were slightly better than multilingual BERT for most of the languages. The best overall accuracy was 54\% and it was achieved by using multilingual BERT on Spanish data. The extracted data set presents a challenging task for sentiment analysis. We have released the data, including the testing and training splits, openly on Zenodo. The data set has been shuffled for copyright reasons.

translated by 谷歌翻译

Multilingual Persuasion Detection: Video Games as an Invaluable Data Source for NLP

Teemu Pöyhönen , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2022-07-10

角色扮演游戏（RPG）在视频游戏对话中具有相当多的文本。游戏开发人员经常将此文本半通知。在本文中，我们从几个RPG中提取了有说服力对话的多语言数据集。我们使用称为BERT的自然语言处理（NLP）模型来显示该数据在构建说服检测系统中的生存能力。我们认为，作为各种NLP任务的数据源，视频游戏具有许多未使用的潜力。本文中描述的代码和数据可在Zenodo上找到。

translated by 谷歌翻译

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

Niko Partanen , Jack Rueter , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2021-12-28

该研究形成了由芬兰民族学家和语言学家，Matthias Alexander Castr \'en（1813-1852）收集和出版的材料进行的各种任务的技术报告。 Finno-Ugrian社会正在将Castr \'en的稿件作为新的关键和数字版本出版，同时不同的研究团体也关注这些材料。我们讨论了所用的工作流程和技术基础设施，并考虑如何创建有利于不同计算任务的数据集以进一步提高这些材料的可用性，并帮助进一步处理类似的归档集合。我们专注于以一种方式处理的集合的部分，这些集合可以在更提高其在更多技术应用中的可用性，补充较早的这些材料的文化和语言方面的工作。大多数这些数据集在Zenodo公开使用。该研究指出需要进一步研究的特定区域，并为文本识别任务提供基准。

translated by 谷歌翻译

TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language

Quan Duong , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2021-12-23

测量不同文本的语义相似性在数字人文研究中具有许多重要应用，例如信息检索，文档聚类和文本摘要。不同方法的性能取决于文本，域和语言的长度。本研究侧重于试验一些目前的芬兰方法，这是一种形态学丰富的语言。与此同时，我们提出了一种简单的方法TFW2V，它在处理长文本文档和有限的数据时显示出高效率。此外，我们设计了一种客观评估方法，可以用作基准标记文本相似性方法的框架。

translated by 谷歌翻译

Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

Mika Hämäläinen , Pattama Patpong , Khalid Alnajjar , Niko Partanen , Jack Rueter

分类：自然语言处理

2021-11-08

我们介绍了泰国抑郁症的第一个公开的有用的语料库。我们的语料库由几个在线博客中的抑郁症的专家验证案例编制。我们试验两种不同的基于LSTM的模型和两种不同的基于伯特模型。我们在检测抑郁症时达到77.53 \％的准确性。这为同一语料库的未来研究人员建立了一个很好的基准。此外，我们确定需要在比维基百科更多种多样的语料库培训的泰国嵌入。我们的语料库，代码和培训的型号在Zenodo上公开发布。

translated by 谷歌翻译

Finnish Dialect Identification: The Effect of Audio and Text

Mika Hämäläinen , Khalid Alnajjar , Niko Partanen , Jack Rueter

分类：自然语言处理

2021-11-06

芬兰语是一种具有多种方言的语言，不仅在口音（发音）方面彼此不同，而且在形态形式和词汇选择方面也不同。我们介绍了基于方言转录器和转录器自动检测扬声器方言的方法，以及由23个不同方言组成的数据集中的音频录制。我们的结果表明，通过组合两个模式来接收最佳精度，因为文本只达到57 \％的整体准确性，其中文本和音频达到85 \％。我们的代码，模型和数据在Github和Zenodo上公开发布。

translated by 谷歌翻译

Meta-learning generalizable dynamics from trajectories

Qiaofeng Li , Tianyi Wang , Vwani Roychowdhury , M. Khalid Jawed

分类：机器学习

2023-01-03

We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.

translated by 谷歌翻译