智能论文笔记

An Opinion Mining of Text in COVID-19 Issues along with Comparative Study in ML, BERT & RNN

Md. Mahadi Hasan Sany , Mumenunnesa Keya , Sharun Akter Khushbu , Akm Shahariar Azad Rabby , Abu Kaisar Mohammad Masum

分类：神经与进化计算 | 自然语言处理

2022-01-06

全球世界正在穿越大流行形势，这是一个灾难性的呼吸综合征爆发被认为是Covid-19。这是212个国家的全球威胁，即人们每天都会遇到强大的情况。相反，成千上万的受感染的人居住丰富的山脉。心理健康也受到全球冠状病毒情况的影响。由于这种情况，在线消息来源使普通人在任何议程中分享他们的意见。如受影响的新闻相关的积极和消极，财务问题，国家和家庭危机，缺乏进出口盈利系统等。不同的情况是最近在任何地方的时尚新闻。因此，在瞬间内产生了大量的文本，在次大陆领域，与其他国家的情况相同，以及文本的人民意见和情况也是相同的，但语言是不同的。本文提出了一些具体的投入以及来自个别来源的孟加拉文本评论，可以确保插图的目标，即机器学习结果能够建立辅助系统。意见挖掘辅助系统可能以可能的所有语言偏好有影响。据我们所知，文章预测了Covid-19问题上的Bangla输入文本，提出了ML算法和深度学习模型分析还通过比较分析检查未来可达性。比较分析规定了关于文本预测精度的报告与ML算法和79％以及深度学习模型以及79％的报告。

translated by 谷歌翻译

A CNN-LSTM-based hybrid deep learning approach to detect sentiment polarities on Monkeypox tweets

Krishna Kumar Mohbey , Gaurav Meena , Sunil Kumar , K Lokesh

分类：计算机视觉 | 人工智能 | 机器学习

2022-08-25

人们最近开始通过社交网站上用户生成的多媒体材料来传达自己的思想和观点。此信息可以是图像，文本，视频或音频。近年来，这种模式的发生频率有所增加。 Twitter是最广泛使用的社交媒体网站之一，它也是最好的地点之一，可以使人们对与蒙基波疾病有关的事件有一种了解。这是因为Twitter上的推文被缩短并经常更新，这两者都促成了平台的角色。这项研究的基本目标是对人们对这种情况的存在的各种反应进行更深入的理解。这项研究重点是找出个人对猴蛋白酶疾病的看法，该疾病介绍了基于CNN和LSTM的混合技术。我们已经考虑了用户推文的所有三个可能的极性：正，负和中立。使用CNN和LSTM构建的架构来确定预测模型的准确性。推荐模型的准确性在Monkeypox Tweet数据集上为94％。其他性能指标（例如准确性，召回和F1得分）也用于测试我们的模型和最大程度和资源有效的方式。然后将发现与更传统的机器学习方法进行比较。这项研究的发现有助于提高对普通人群中蒙基托感染的认识。

translated by 谷歌翻译

An ensemble deep learning technique for detecting suicidal ideation from posts in social media platforms

Shini Renjith , Annie Abraham , Surya B. Jyothi , Lekshmi Chandran , Jincy Thomson

分类：自然语言处理 | 机器学习

2021-12-17

社交媒体的自杀意图检测是一种不断发展的研究，挑战了巨大的挑战。许多有自杀倾向的人通过社交媒体平台分享他们的思想和意见。作为许多研究的一部分，观察到社交媒体的公开职位包含有价值的标准，以有效地检测有自杀思想的个人。防止自杀的最困难的部分是检测和理解可能导致自杀的复杂风险因素和警告标志。这可以通过自动识别用户行为的突然变化来实现。自然语言处理技术可用于收集社交媒体交互的行为和文本特征，这些功能可以传递给特殊设计的框架，以检测人类交互中的异常，这是自杀意图指标。我们可以使用深度学习和/或基于机器学习的分类方法来实现快速检测自杀式思想。出于这种目的，我们可以采用LSTM和CNN模型的组合来检测来自用户的帖子的这种情绪。为了提高准确性，一些方法可以使用更多数据进行培训，使用注意模型提高现有模型等的效率。本文提出了一种LSTM-Incription-CNN组合模型，用于分析社交媒体提交，以检测任何潜在的自杀意图。在评估期间，所提出的模型的准确性为90.3％，F1分数为92.6％，其大于基线模型。

translated by 谷歌翻译

Mental Illness Classification on Social Media Texts using Deep Learning and Transfer Learning

Iqra Ameer , Muhammad Arif , Grigori Sidorov , Helena Gòmez-Adorno , Alexander Gelbukh

分类：机器学习 | 自然语言处理

2022-07-03

鉴于当前全球的社交距离限制，大多数人现在使用社交媒体作为其主要交流媒介。因此，数百万患有精神疾病的人被孤立了，他们无法亲自获得帮助。他们越来越依赖在线场地，以表达自己并寻求有关处理精神障碍的建议。根据世界卫生组织（WHO）的说法，大约有4.5亿人受到影响。精神疾病（例如抑郁，焦虑等）非常普遍，并影响了个体的身体健康。最近提出了人工智能（AI）方法，以帮助基于患者的真实信息（例如，医疗记录，行为数据，社交媒体利用等），包括精神病医生和心理学家在内的心理健康提供者。 AI创新表明，在从计算机视觉到医疗保健的众多现实应用应用程序中，主要执行。这项研究分析了REDDIT平台上的非结构化用户数据，并分类了五种常见的精神疾病：抑郁，焦虑，双相情感障碍，ADHD和PTSD。我们培训了传统的机器学习，深度学习和转移学习多级模型，以检测个人的精神障碍。这项工作将通过自动化检测过程并告知适当当局需要紧急援助的人来使公共卫生系统受益。

translated by 谷歌翻译

Troll Tweet Detection Using Contextualized Word Representations

Seyhmus Yilmaz , Sultan Zavrak

分类：自然语言处理 | 人工智能

2022-07-17

近年来，已经出现了许多巨魔帐户来操纵社交媒体的意见。对于社交网络平台而言，检测和消除巨魔是一个关键问题，因为企业，滥用者和民族国家赞助的巨魔农场使用虚假和自动化的帐户。 NLP技术用于从社交网络文本中提取数据，例如Twitter推文。在许多文本处理应用程序中，诸如BERT之类的单词嵌入表示方法的执行效果要好于先前的NLP技术，从而为各种任务提供了新颖的突破，以精确理解和分类社交网络工作信息。本文实施并比较了九个基于深度学习的巨魔推文检测体系结构，每个bert，elmo和手套词嵌入模型的三个模型。精度，召回，F1分数，AUC和分类精度用于评估每个体系结构。从实验结果中，大多数使用BERT模型的架构改进了巨魔推文检测。具有GRU分类器的基于自定义的基于ELMO的体系结构具有检测巨魔消息的最高AUC。所提出的体系结构可以由各种基于社会的系统用于未来检测巨魔消息。

translated by 谷歌翻译

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

Hind Saleh , Areej Alhothali , Kawthar Moria

分类：自然语言处理

2021-11-02

在网络和社交媒体上生成的大量数据增加了检测在线仇恨言论的需求。检测仇恨言论将减少它们对他人的负面影响和影响。在自然语言处理（NLP）域中的许多努力旨在宣传仇恨言论或检测特定的仇恨言论，如宗教，种族，性别或性取向。讨厌的社区倾向于使用缩写，故意拼写错误和他们的沟通中的编码词来逃避检测，增加了讨厌语音检测任务的更多挑战。因此，词表示将在检测仇恨言论中发挥越来越关的作用。本文研究了利用基于双向LSTM的深度模型中嵌入的域特定词语的可行性，以自动检测/分类仇恨语音。此外，我们调查转移学习语言模型（BERT）对仇恨语音问题作为二进制分类任务。实验表明，与双向LSTM基于LSTM的深层模型嵌入的域特异性词嵌入了93％的F1分数，而BERT在可用仇恨语音数据集中的组合平衡数据集上达到了高达96％的F1分数。

translated by 谷歌翻译

Turkish Sentiment Analysis Using Machine Learning Methods: Application on Online Food Order Site Reviews

Özlem Aktaş , Berkay Coşkuner , İlker Soner

分类：自然语言处理 | 机器学习

2022-01-11

满意度测量，在今天的每个部门都出现，是许多公司的一个非常重要的因素。在本研究中，旨在通过使用yemek Sepeti的数据和该数据的变化来达到各种机器学习算法的最高精度率。每种算法的精度值都与所使用的各种自然语言处理方法一起计算。在计算这些精度值时，尝试优化使用的算法的参数。在本研究中培训的模型可以在未标记的数据上使用，并且可以在衡量客户满意度时给公司一个想法。观察到施加的3种不同的自然语言处理方法导致大部分开发模型中的大约5％的精度增加。

translated by 谷歌翻译

Utilizing distilBert transformer model for sentiment classification of COVID-19's Persian open-text responses

Fatemeh Sadat Masoumi , Mohammad Bahrani

分类：自然语言处理

2022-12-16

The COVID-19 pandemic has caused drastic alternations in human life in all aspects. The government's laws in this regard affected the lifestyle of all people. Due to this fact studying the sentiment of individuals is essential to be aware of the future impacts of the coming pandemics. To contribute to this aim, we proposed an NLP (Natural Language Processing) model to analyze open-text answers in a survey in Persian and detect positive and negative feelings of the people in Iran. In this study, a distilBert transformer model was applied to take on this task. We deployed three approaches to perform the comparison, and our best model could gain accuracy: 0.824, Precision: 0.824, Recall: 0.798, and F1 score: 0.804.

translated by 谷歌翻译

A Comparative Study on COVID-19 Fake News Detection Using Different Transformer Based Models

Sajib Kumar Saha Joy , Dibyo Fabian Dofadar , Riyo Hayat Khan , Md. Sabbir Ahmed , Rafeed Rahman

分类：自然语言处理 | 机器学习

2022-08-02

社交网络的快速发展以及互联网可用性的便利性加剧了虚假新闻和社交媒体网站上的谣言的泛滥。在共同19的流行病中，这种误导性信息通过使人们的身心生命处于危险之中，从而加剧了这种情况。为了限制这种不准确性的传播，从在线平台上确定虚假新闻可能是第一步。在这项研究中，作者通过实施了五个基于变压器的模型，例如Bert，Bert没有LSTM，Albert，Roberta和Bert＆Albert的混合体，以检测Internet的Covid 19欺诈新闻。Covid 19假新闻数据集已用于培训和测试模型。在所有这些模型中，Roberta模型的性能优于其他模型，通过在真实和虚假类中获得0.98的F1分数。

translated by 谷歌翻译

A Comparison of Automatic Labelling Approaches for Sentiment Analysis

Sumana Biswas , Karen Young , Josephine Griffith

分类：自然语言处理 | 机器学习

2022-11-05

Labelling a large quantity of social media data for the task of supervised machine learning is not only time-consuming but also difficult and expensive. On the other hand, the accuracy of supervised machine learning models is strongly related to the quality of the labelled data on which they train, and automatic sentiment labelling techniques could reduce the time and cost of human labelling. We have compared three automatic sentiment labelling techniques: TextBlob, Vader, and Afinn to assign sentiments to tweets without any human assistance. We compare three scenarios: one uses training and testing datasets with existing ground truth labels; the second experiment uses automatic labels as training and testing datasets; and the third experiment uses three automatic labelling techniques to label the training dataset and uses the ground truth labels for testing. The experiments were evaluated on two Twitter datasets: SemEval-2013 (DS-1) and SemEval-2016 (DS-2). Results show that the Afinn labelling technique obtains the highest accuracy of 80.17% (DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model. These findings imply that automatic text labelling could provide significant benefits, and suggest a feasible alternative to the time and cost of human labelling efforts.

translated by 谷歌翻译

Explainable and High-Performance Hate and Offensive Speech Detection

Marzieh Babaeianjelodar , Gurram Poorna Prudhvi , Stephen Lorenz , Keyu Chen , Sumona Mondal , Soumyabrata Dey , Navin Kumar

分类：自然语言处理 | 机器学习

2022-06-26

信息通过社交媒体平台的传播可以创造可能对弱势社区的环境和社会中某些群体的沉默。为了减轻此类情况，已经开发了几种模型来检测仇恨和冒犯性言论。由于在社交媒体平台中检测仇恨和冒犯性演讲可能会错误地将个人排除在社交媒体平台之外，从而减少信任，因此有必要创建可解释和可解释的模型。因此，我们基于在Twitter数据上培训的XGBOOST算法建立了一个可解释且可解释的高性能模型。对于不平衡的Twitter数据，XGBoost在仇恨言语检测上的表现优于LSTM，Autogluon和ULMFIT模型，F1得分为0.75，而0.38和0.37分别为0.37和0.38。当我们将数据放到三个单独的类别的大约5000个推文中时，XGBoost的性能优于LSTM，Autogluon和Ulmfit；仇恨言语检测的F1分别为0.79和0.69、0.77和0.66。 XGBOOST在下采样版本中的进攻性语音检测中的F1得分分别为0.83和0.88、0.82和0.79，XGBOOST的表现也比LSTM，Autogluon和Ulmfit更好。我们在XGBoost模型的输出上使用Shapley添加说明（SHAP），以使其与Black-Box模型相比，与LSTM，Autogluon和Ulmfit相比，它可以解释和解释。

translated by 谷歌翻译

Computational Sarcasm Analysis on Social Media: A Systematic Review

Faria Binte Kader , Nafisa Hossain Nujat , Tasmia Binte Sogir , Mohsinul Kabir , Hasan Mahmud , Kamrul Hasan

分类：自然语言处理

2022-09-13

讽刺可以被定义为说或写讽刺与一个人真正想表达的相反，通常是为了侮辱，刺激或娱乐某人。由于文本数据中讽刺性的性质晦涩难懂，因此检测到情感分析研究社区的困难和非常感兴趣。尽管讽刺检测的研究跨越了十多年，但最近已经取得了一些重大进步，包括在多模式环境中采用了无监督的预训练的预训练的变压器，并整合了环境以识别讽刺。在这项研究中，我们旨在简要概述英语计算讽刺研究的最新进步和趋势。我们描述了与讽刺有关的相关数据集，方法，趋势，问题，挑战和任务，这些数据集，趋势，问题，挑战和任务是无法检测到的。我们的研究提供了讽刺数据集，讽刺特征及其提取方法以及各种方法的性能分析，这些表可以帮助相关领域的研究人员了解当前的讽刺检测中最新实践。

translated by 谷歌翻译

Twitter Data Analysis: Izmir Earthquake Case

Özgür Agrali , Hakan Sökün , Enis Karaarslan

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-02

T\"urkiye is located on a fault line; earthquakes often occur on a large and small scale. There is a need for effective solutions for gathering current information during disasters. We can use social media to get insight into public opinion. This insight can be used in public relations and disaster management. In this study, Twitter posts on Izmir Earthquake that took place on October 2020 are analyzed. We question if this analysis can be used to make social inferences on time. Data mining and natural language processing (NLP) methods are used for this analysis. NLP is used for sentiment analysis and topic modelling. The latent Dirichlet Allocation (LDA) algorithm is used for topic modelling. We used the Bidirectional Encoder Representations from Transformers (BERT) model working with Transformers architecture for sentiment analysis. It is shown that the users shared their goodwill wishes and aimed to contribute to the initiated aid activities after the earthquake. The users desired to make their voices heard by competent institutions and organizations. The proposed methods work effectively. Future studies are also discussed.

translated by 谷歌翻译

Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian Tweets

Douglas Raevan Faisal , Rahmad Mahendra

分类：自然语言处理

2022-06-30

自2020年初以来，Covid-19-19造成了全球重大影响。这给社会带来了很多困惑，尤其是由于错误信息通过社交媒体传播。尽管已经有几项与在社交媒体数据中发现错误信息有关的研究，但大多数研究都集中在英语数据集上。印度尼西亚的COVID-19错误信息检测的研究仍然很少。因此，通过这项研究，我们收集和注释印尼语的数据集，并通过考虑该推文的相关性来构建用于检测COVID-19错误信息的预测模型。数据集构造是由一组注释者进行的，他们标记了推文数据的相关性和错误信息。在这项研究中，我们使用印度培训预培训的语言模型提出了两阶段分类器模型，以进行推文错误信息检测任务。我们还尝试了其他几种基线模型进行文本分类。实验结果表明，对于相关性预测，BERT序列分类器的组合和用于错误信息检测的BI-LSTM的组合优于其他机器学习模型，精度为87.02％。总体而言，BERT利用率有助于大多数预测模型的更高性能。我们发布了高质量的Covid-19错误信息推文语料库，用高通道一致性表示。

translated by 谷歌翻译

ReDDIT: Regret Detection and Domain Identification from Text

Fazlourrahman Balouchzahi , Sabur Butt , Grigori Sidorov , Alexander Gelbukh

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-14

In this paper, we present a study of regret and its expression on social media platforms. Specifically, we present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret. We then use this dataset to investigate the language used to express regret on Reddit and to identify the domains of text that are most commonly associated with regret. Our findings show that Reddit users are most likely to express regret for past actions, particularly in the domain of relationships. We also found that deep learning models using GloVe embedding outperformed other models in all experiments, indicating the effectiveness of GloVe for representing the meaning and context of words in the domain of regret. Overall, our study provides valuable insights into the nature and prevalence of regret on social media, as well as the potential of deep learning and word embeddings for analyzing and understanding emotional language in online text. These findings have implications for the development of natural language processing algorithms and the design of social media platforms that support emotional expression and communication.

translated by 谷歌翻译

Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Bharathi Raja Chakravarthi , Ruba Priyadharshini , Rahul Ponnusamy , Prasanna Kumar Kumaresan , Kayalvizhi Sampath , Durairaj Thenmozhi , Sathiyaraj Thangasamy , Rajendran Nallathambi , John Phillip McCrae

分类：自然语言处理

2021-09-01

社交媒体平台上的滥用内容的增长增加对在线用户的负面影响。对女同性恋，同性恋者，跨性别或双性恋者的恐惧，不喜欢，不适或不疑虑被定义为同性恋/转铁症。同性恋/翻译语音是一种令人反感的语言，可以总结为针对LGBT +人的仇恨语音，近年来越来越受到兴趣。在线同性恋恐惧症/ Transphobobia是一个严重的社会问题，可以使网上平台与LGBT +人有毒和不受欢迎，同时还试图消除平等，多样性和包容性。我们为在线同性恋和转鸟以及专家标记的数据集提供了新的分类分类，这将允许自动识别出具有同种异体/传递内容的数据集。我们受过教育的注释器并以综合的注释规则向他们提供，因为这是一个敏感的问题，我们以前发现未受训练的众包注释者因文化和其他偏见而诊断倡导性的群体。数据集包含15,141个注释的多语言评论。本文介绍了构建数据集，数据的定性分析和注册间协议的过程。此外，我们为数据集创建基线模型。据我们所知，我们的数据集是第一个已创建的数据集。警告：本文含有明确的同性恋，转基因症，刻板印象的明确陈述，这可能对某些读者令人痛苦。

translated by 谷歌翻译

Improving Sentiment Analysis By Emotion Lexicon Approach on Vietnamese Texts

An Long Doan , Son T. Luu

分类：自然语言处理

2022-10-05

The sentiment analysis task has various applications in practice. In the sentiment analysis task, words and phrases that represent positive and negative emotions are important. Finding out the words that represent the emotion from the text can improve the performance of the classification models for the sentiment analysis task. In this paper, we propose a methodology that combines the emotion lexicon with the classification model to enhance the accuracy of the models. Our experimental results show that the emotion lexicon combined with the classification model improves the performance of models.

translated by 谷歌翻译

PolyHope: Two-Level Hope Speech Detection from Tweets

Fazlourrahman Balouchzahi , Grigori Sidorov , Alexander Gelbukh

分类：自然语言处理 | 人工智能 | 机器学习

2022-10-25

Hope is characterized as openness of spirit toward the future, a desire, expectation, and wish for something to happen or to be true that remarkably affects human's state of mind, emotions, behaviors, and decisions. Hope is usually associated with concepts of desired expectations and possibility/probability concerning the future. Despite its importance, hope has rarely been studied as a social media analysis task. This paper presents a hope speech dataset that classifies each tweet first into "Hope" and "Not Hope", then into three fine-grained hope categories: "Generalized Hope", "Realistic Hope", and "Unrealistic Hope" (along with "Not Hope"). English tweets in the first half of 2022 were collected to build this dataset. Furthermore, we describe our annotation process and guidelines in detail and discuss the challenges of classifying hope and the limitations of the existing hope speech detection corpora. In addition, we reported several baselines based on different learning approaches, such as traditional machine learning, deep learning, and transformers, to benchmark our dataset. We evaluated our baselines using weighted-averaged and macro-averaged F1-scores. Observations show that a strict process for annotator selection and detailed annotation guidelines enhanced the dataset's quality. This strict annotation process resulted in promising performance for simple machine learning classifiers with only bi-grams; however, binary and multiclass hope speech detection results reveal that contextual embedding models have higher performance in this dataset.

translated by 谷歌翻译

Comparative Study of Sentiment Analysis for Multi-Sourced Social Media Platforms

Keshav Kapur , Rajitha Harikrishnan

分类：自然语言处理 | 人工智能

2022-12-09

There is a vast amount of data generated every second due to the rapidly growing technology in the current world. This area of research attempts to determine the feelings or opinions of people on social media posts. The dataset we used was a multi-source dataset from the comment section of various social networking sites like Twitter, Reddit, etc. Natural Language Processing Techniques were employed to perform sentiment analysis on the obtained dataset. In this paper, we provide a comparative analysis using techniques of lexicon-based, machine learning and deep learning approaches. The Machine Learning algorithm used in this work is Naive Bayes, the Lexicon-based approach used in this work is TextBlob, and the deep-learning algorithm used in this work is LSTM.

translated by 谷歌翻译

CovidMis20: COVID-19 Misinformation Detection System on Twitter Tweets using Deep Learning Models

Aos Mulahuwaish , Manish Osti , Kevin Gyorick , Majdi Maabreh , Ajay Gupta , Basheer Qolomany

分类：机器学习 | 自然语言处理

2022-09-13

在线新闻和信息来源是方便且可访问的方法来了解当前问题。例如，超过3亿人在全球Twitter上参与帖子，这提供了传播误导信息的可能性。在许多情况下，由于虚假新闻，已经犯了暴力犯罪。这项研究介绍了Covidmis20数据集（Covid-19误导2020数据集），该数据集由2月至2020年7月收集的1,375,592条推文组成。Covidmis20可以自动更新以获取最新新闻，并在以下网址公开，网址为：HTTPPS://GITHUB.COM./github.com./github.com。/一切guy/covidmis20。这项研究是使用BI-LSTM深度学习和合奏CNN+BI-GRU进行假新闻检测进行的。结果表明，测试精度分别为92.23％和90.56％，集合CNN+BI-GRU模型始终提供了比BI-LSTM模型更高的精度。

translated by 谷歌翻译