建模政治参与者的思想视角是许多下游任务中的应用中的计算政治科学的重要任务。现有方法通常限于文本数据和投票记录,而他们忽视了丰富的社会背景和对整体评价的宝贵专家知识。在本文中,我们提出了一个代表性学习框架,政治行为者共同利用了社会背景和专家知识。具体而言,我们检索和提取关于立法者的事实陈述,以利用社会背景信息。然后,我们构建异构信息网络以合并社会背景并使用关系图形神经网络来学习立法器表示。最后,我们用三个目标训练我们的模型,以与专家知识,模型意识形态阶段一致性,模拟回声室现象的表现学习。广泛的实验表明,我们的学到的陈述在三个下游任务中成功地推动了最先进的。进一步分析证明了学到的立法者代表与各种社会政治因素之间的相关性,以及建立了建模政治行动者的社会背景和专业知识的必要性。
translated by 谷歌翻译
识别新闻媒体的政治观点已成为政治评论的快速增长和日益极化的政治意识形态的重要任务。以前的方法专注于文本内容,留出富裕的社会和政治背景,这在论证挖掘过程中至关重要。为了解决这一限制,我们提出了一种政治透视检测方法,包括外部域知识。具体而言,我们构建一个政治知识图形,以作为特定于域的外部知识。然后我们利用异质信息网络来代表新闻文件,共同模仿新闻文本和外部知识。最后,我们采用关系图神经网络,并作为图形级分类进行政治视角检测。广泛的实验表明,我们的方法始终如一地实现了两个现实世界的透视检测基准的最佳性能。消融研究进一步承担了外部知识的必要性以及我们基于图形的方法的有效性。
translated by 谷歌翻译
Twitter机器人检测已成为打击错误信息,促进社交媒体节制并保持在线话语的完整性的越来越重要的任务。最先进的机器人检测方法通常利用Twitter网络的图形结构,在面对传统方法无法检测到的新型Twitter机器人时,它们表现出令人鼓舞的性能。但是,现有的Twitter机器人检测数据集很少是基于图形的,即使这些基于图形的数据集也遭受有限的数据集量表,不完整的图形结构以及低注释质量。实际上,缺乏解决这些问题的大规模基于图的Twitter机器人检测基准,严重阻碍了基于图形的机器人检测方法的开发和评估。在本文中,我们提出了Twibot-22,这是一个综合基于图的Twitter机器人检测基准,它显示了迄今为止最大的数据集,在Twitter网络上提供了多元化的实体和关系,并且与现有数据集相比具有更好的注释质量。此外,我们重新实施35代表性的Twitter机器人检测基线,并在包括Twibot-22在内的9个数据集上进行评估,以促进对模型性能和对研究进度的整体了解的公平比较。为了促进进一步的研究,我们将所有实施的代码和数据集巩固到Twibot-22评估框架中,研究人员可以在其中始终如一地评估新的模型和数据集。 Twibot-22 Twitter机器人检测基准和评估框架可在https://twibot22.github.io/上公开获得。
translated by 谷歌翻译
Nowadays, fake news easily propagates through online social networks and becomes a grand threat to individuals and society. Assessing the authenticity of news is challenging due to its elaborately fabricated contents, making it difficult to obtain large-scale annotations for fake news data. Due to such data scarcity issues, detecting fake news tends to fail and overfit in the supervised setting. Recently, graph neural networks (GNNs) have been adopted to leverage the richer relational information among both labeled and unlabeled instances. Despite their promising results, they are inherently focused on pairwise relations between news, which can limit the expressive power for capturing fake news that spreads in a group-level. For example, detecting fake news can be more effective when we better understand relations between news pieces shared among susceptible users. To address those issues, we propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism. Experiments based on two benchmark datasets show that our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
translated by 谷歌翻译
图形神经网络(GNN)在解决图形结构数据(即网络)方面的各种分析任务方面已广受欢迎。典型的gnns及其变体遵循一种消息的方式,该方式通过网络拓扑沿网络拓扑的特征传播过程获得网络表示,然而,它们忽略了许多现实世界网络中存在的丰富文本语义(例如,局部单词序列)。现有的文本丰富网络方法通过主要利用内部信息(例如主题或短语/单词)来整合文本语义,这些信息通常无法全面地挖掘文本语义,从而限制了网络结构和文本语义之间的相互指导。为了解决这些问题,我们提出了一个具有外部知识(TEKO)的新型文本富裕的图形神经网络,以充分利用文本丰富的网络中的结构和文本信息。具体而言,我们首先提出一个灵活的异质语义网络,该网络结合了文档和实体之间的高质量实体和互动。然后,我们介绍两种类型的外部知识,即结构化的三胞胎和非结构化实体描述,以更深入地了解文本语义。我们进一步为构建的异质语义网络设计了互惠卷积机制,使网络结构和文本语义能够相互协作并学习高级网络表示。在四个公共文本丰富的网络以及一个大规模的电子商务搜索数据集上进行了广泛的实验结果,这说明了Teko优于最先进的基线。
translated by 谷歌翻译
本文介绍了SocialVEC,这是一种从社交网络引出社会世界知识的一般框架,并将此框架应用于Twitter。 SocialVEC了解流行账户的低维嵌入,这代表了一般兴趣的实体,基于其账户内的共同发生模式,然后是个别用户,从而在社会人口统计术语中建模实体相似性。类似于Word Embeddings,这促进了涉及文本处理的任务,我们预计社会实体嵌入将使社会味道的任务受益。我们从推特网络的样本中学习了大约200,000个受欢迎的帐户的社交嵌入,其中包括超过130万用户和他们遵循的帐户,并在两个不同的任务中评估结果嵌入。第一个任务涉及从社交媒体简介中自动推动用户的个人特征。在另一个研究中,我们利用SocialVEC嵌入来衡量Twitter中新闻来源的政治偏见。在这两种情况下,与现有实体嵌入方案相比,我们证明SocialVEC嵌入是有利的。我们将公开为社会顾客实体嵌入而挪用,以支持在Twitter中反映的社会世界知识进一步探索。
translated by 谷歌翻译
异质图卷积网络在解决异质网络数据的各种网络分析任务方面已广受欢迎,从链接预测到节点分类。但是,大多数现有作品都忽略了多型节点之间的多重网络的关系异质性,而在元路径中,元素嵌入中关系的重要性不同,这几乎无法捕获不同关系跨不同关系的异质结构信号。为了应对这一挑战,这项工作提出了用于异质网络嵌入的多重异质图卷积网络(MHGCN)。我们的MHGCN可以通过多层卷积聚合自动学习多重异质网络中不同长度的有用的异质元路径相互作用。此外,我们有效地将多相关结构信号和属性语义集成到学习的节点嵌入中,并具有无监督和精选的学习范式。在具有各种网络分析任务的五个现实世界数据集上进行的广泛实验表明,根据所有评估指标,MHGCN与最先进的嵌入基线的优势。
translated by 谷歌翻译
Graph neural networks (GNNs) have been utilized for various natural language processing (NLP) tasks lately. The ability to encode corpus-wide features in graph representation made GNN models popular in various tasks such as document classification. One major shortcoming of such models is that they mainly work on homogeneous graphs, while representing text datasets as graphs requires several node types which leads to a heterogeneous schema. In this paper, we propose a transductive hybrid approach composed of an unsupervised node representation learning model followed by a node classification/edge prediction model. The proposed model is capable of processing heterogeneous graphs to produce unified node embeddings which are then utilized for node classification or link prediction as the downstream task. The proposed model is developed to classify stock market technical analysis reports, which to our knowledge is the first work in this domain. Experiments, which are carried away using a constructed dataset, demonstrate the ability of the model in embedding extraction and the downstream tasks.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Users' involvement in creating and propagating news is a vital aspect of fake news detection in online social networks. Intuitively, credible users are more likely to share trustworthy news, while untrusted users have a higher probability of spreading untrustworthy news. In this paper, we construct a dual-layer graph (i.e., the news layer and the user layer) to extract multiple relations of news and users in social networks to derive rich information for detecting fake news. Based on the dual-layer graph, we propose a fake news detection model named Us-DeFake. It learns the propagation features of news in the news layer and the interaction features of users in the user layer. Through the inter-layer in the graph, Us-DeFake fuses the user signals that contain credibility information into the news features, to provide distinctive user-aware embeddings of news for fake news detection. The training process conducts on multiple dual-layer subgraphs obtained by a graph sampler to scale Us-DeFake in large scale social networks. Extensive experiments on real-world datasets illustrate the superiority of Us-DeFake which outperforms all baselines, and the users' credibility signals learned by interaction relation can notably improve the performance of our model.
translated by 谷歌翻译
注意机制使图形神经网络(GNN)能够学习目标节点与其单跳邻居之间的注意力权重,从而进一步提高性能。但是,大多数现有的GNN都针对均匀图,其中每一层只能汇总单跳邻居的信息。堆叠多层网络引入了相当大的噪音,并且很容易导致过度平滑。我们在这里提出了一种多跃波异质邻域信息融合图表示方法(MHNF)。具体而言,我们提出了一个混合元自动提取模型,以有效提取多ihop混合邻居。然后,我们制定了一个跳级的异质信息聚合模型,该模型在同一混合Metapath中选择性地汇总了不同的跳跃邻域信息。最后,构建了分层语义注意融合模型(HSAF),该模型可以有效地整合不同的互动和不同的路径邻域信息。以这种方式,本文解决了汇总MultiHop邻里信息和学习目标任务的混合元数据的问题。这减轻了手动指定Metapaths的限制。此外,HSAF可以提取Metapaths的内部节点信息,并更好地整合存在不同级别的语义信息。真实数据集的实验结果表明,MHNF在最先进的基准中取得了最佳或竞争性能,仅1/10〜1/100参数和计算预算。我们的代码可在https://github.com/phd-lanyu/mhnf上公开获取。
translated by 谷歌翻译
每时每刻都生产出许多不同质量的物品,因此将这些数据筛选为质量文章并将其投入到社交媒体上是一项非常紧迫的任务。值得注意的是,高质量的文章具有许多特征,例如相关性,文本质量,直接,多面,背景,新颖性和情感。因此,纯粹使用文章的内容来识别其质量是不够的。因此,我们计划使用外部知识互动来完善性能,并根据百度百科全书提出知识图增强文章质量标识数据集(KGEA)。我们通过7个维度量化了这些文章,并使用文章和百度百科全书之间实体的同时出现,以构建每篇文章的知识图。我们还比较了一些文本分类基线,发现外部知识可以将文章引导到与图神经网络更具竞争力的分类。
translated by 谷歌翻译
由于学术和工业领域的异质图无处不在,研究人员最近提出了许多异质图神经网络(HGNN)。在本文中,我们不再采用更强大的HGNN模型,而是有兴趣设计一个多功能的插件模块,该模块解释了从预先训练的HGNN中提取的关系知识。据我们所知,我们是第一个在异质图上提出高阶(雇用)知识蒸馏框架的人,无论HGNN的模型体系结构如何,它都可以显着提高预测性能。具体而言,我们的雇用框架最初执行一阶节点级知识蒸馏,该蒸馏曲线及其预测逻辑编码了老师HGNN的语义。同时,二阶关系级知识蒸馏模仿了教师HGNN生成的不同类型的节点嵌入之间的关系相关性。在各种流行的HGNN模型和三个现实世界的异质图上进行了广泛的实验表明,我们的方法获得了一致且相当大的性能增强,证明了其有效性和泛化能力。
translated by 谷歌翻译
We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias. This is an important but under-studied problem related to disinformation and "fake news" detection, but it addresses the issue at a coarser granularity compared to looking at an individual article or an individual claim. This is useful as it allows to profile entire media outlets in advance. Unlike previous work, which has focused primarily on text (e.g.,~on the text of the articles published by the target website, or on the textual description in their social media profiles or in Wikipedia), here our main focus is on modeling the similarity between media outlets based on the overlap of their audience. This is motivated by homophily considerations, i.e.,~the tendency of people to have connections to people with similar interests, which we extend to media, hypothesizing that similar types of media would be read by similar kinds of users. In particular, we propose GREENER (GRaph nEural nEtwork for News mEdia pRofiling), a model that builds a graph of inter-media connections based on their audience overlap, and then uses graph neural networks to represent each medium. We find that such representations are quite useful for predicting the factuality and the bias of news media outlets, yielding improvements over state-of-the-art results reported on two datasets. When augmented with conventionally used representations obtained from news articles, Twitter, YouTube, Facebook, and Wikipedia, prediction accuracy is found to improve by 2.5-27 macro-F1 points for the two tasks.
translated by 谷歌翻译
假新闻,虚假或误导性信息作为新闻,对社会的许多方面产生了重大影响,例如在政治或医疗域名。由于假新闻的欺骗性,仅将自然语言处理(NLP)技术应用于新闻内容不足。多级社会上下文信息(新闻出版商和社交媒体的参与者)和用户参与的时间信息是假新闻检测中的重要信息。然而,正确使用此信息,介绍了三个慢性困难:1)多级社会上下文信息很难在没有信息丢失的情况下使用,2)难以使用时间信息以及多级社会上下文信息,3 )具有多级社会背景和时间信息的新闻表示难以以端到端的方式学习。为了克服所有三个困难,我们提出了一种新颖的假新闻检测框架,杂扫描。我们使用元路径在不损失的情况下提取有意义的多级社会上下文信息。 COMA-PATO,建议连接两个节点类型的复合关系,以捕获异构图中的语义。然后,我们提出了元路径实例编码和聚合方法,以捕获用户参与的时间信息,并生成新闻代表端到端。根据我们的实验,杂扫不断的性能改善了最先进的假新闻检测方法。
translated by 谷歌翻译
美国的意识形态分裂在日常交流中变得越来越突出。因此,关于政治两极分化的许多研究,包括最近采取计算观点的许多努力。通过检测文本语料库中的政治偏见,可以尝试描述和辨别该文本的两极分性。从直觉上讲,命名的实体(即,用作名词的名词和短语)和文本中的标签经常带有有关政治观点的信息。例如,使用“支持选择”一词的人可能是自由的,而使用“亲生生命”一词的人可能是保守的。在本文中,我们试图揭示社交媒体文本数据中的政治极性,并通过将极性得分分配给实体和标签来量化这些极性。尽管这个想法很简单,但很难以可信赖的定量方式进行这种推论。关键挑战包括少数已知标签,连续的政治观点,以及在嵌入单词媒介中的极性得分和极性中性语义含义的保存。为了克服这些挑战,我们提出了极性感知的嵌入多任务学习(PEM)模型。该模型包括(1)自制的上下文保护任务,(2)基于注意力的推文级别的极性推导任务,以及(3)对抗性学习任务,可促进嵌入式的极性维度及其语义之间的独立性方面。我们的实验结果表明,我们的PEM模型可以成功学习极性感知的嵌入。我们检查了各种应用,从而证明了PEM模型的有效性。我们还讨论了我们的工作的重要局限性,并在将PEM模型应用于现实世界情景时的压力谨慎。
translated by 谷歌翻译
Fake news detection has become a research area that goes way beyond a purely academic interest as it has direct implications on our society as a whole. Recent advances have primarily focused on textbased approaches. However, it has become clear that to be effective one needs to incorporate additional, contextual information such as spreading behaviour of news articles and user interaction patterns on social media. We propose to construct heterogeneous social context graphs around news articles and reformulate the problem as a graph classification task. Exploring the incorporation of different types of information (to get an idea as to what level of social context is most effective) and using different graph neural network architectures indicates that this approach is highly effective with robust results on a common benchmark dataset.
translated by 谷歌翻译
用于异质图嵌入的图形神经网络是通过探索异质图的异质性和语义来将节点投射到低维空间中。但是,一方面,大多数现有的异质图嵌入方法要么不足以对特定语义下的局部结构进行建模,要么在汇总信息时忽略异质性。另一方面,来自多种语义的表示形式未全面整合以获得多功能节点嵌入。为了解决该问题,我们通过引入多视图表示学习的概念,提出了一个具有多视图表示学习(名为MV-HETGNN)的异质图神经网络(称为MV-HETGNN)。所提出的模型由节点特征转换,特定于视图的自我图编码和自动多视图融合,以彻底学习复杂的结构和语义信息,以生成全面的节点表示。在三个现实世界的异质图数据集上进行的广泛实验表明,所提出的MV-HETGNN模型始终优于各种下游任务中所有最新的GNN基准,例如节点分类,节点群集和链接预测。
translated by 谷歌翻译
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequence as input and output some good results by fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic aspect of text (e.g., coherence) and sentence-level structures. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. Inspired by the distinctiveness and permanence properties of linguistic feature, we represent text as a coherence graph to capture its entity consistency, which is further encoded by the pretrained model and graph neural network. To tackle the challenges of data limitations, we employ a contrastive learning framework and propose an improved contrastive loss for making full use of hard negative samples in training stage. The experiment results on two public datasets prove our approach outperforms the state-of-art methods significantly.
translated by 谷歌翻译
Twitter机器人检测是一项重要且有意义的任务。现有的基于文本的方法可以深入分析用户推文内容,从而实现高性能。但是,新颖的Twitter机器人通过窃取真正的用户的推文并用良性推文稀释恶意内容来逃避这些检测。这些新颖的机器人被认为以语义不一致的特征。此外,最近出现了利用Twitter图结构的方法,显示出巨大的竞争力。但是,几乎没有一种方法使文本和图形模式深入融合并进行了交互,以利用优势并了解两种方式的相对重要性。在本文中,我们提出了一个名为BIC的新型模型,该模型使文本和图形模式深入互动并检测到推文语义不一致。具体而言,BIC包含一个文本传播模块,一个图形传播模块,可分别在文本和图形结构上进行机器人检测,以及可证明有效的文本互动模块,以使两者相互作用。此外,BIC还包含一个语义一致性检测模块,以从推文中学习语义一致性信息。广泛的实验表明,我们的框架在全面的Twitter机器人基准上优于竞争基准。我们还证明了拟议的相互作用和语义一致性检测的有效性。
translated by 谷歌翻译