Media has a substantial impact on the public perception of events. A one-sided or polarizing perspective on any topic is usually described as media bias. One of the ways how bias in news articles can be introduced is by altering word choice. Biased word choices are not always obvious, nor do they exhibit high context-dependency. Hence, detecting bias is often difficult. We propose a Transformer-based deep learning architecture trained via Multi-Task Learning using six bias-related data sets to tackle the media bias detection problem. Our best-performing implementation achieves a macro $F_{1}$ of 0.776, a performance boost of 3\% compared to our baseline, outperforming existing methods. Our results indicate Multi-Task Learning as a promising alternative to improve existing baseline models in identifying slanted reporting.
translated by 谷歌翻译
媒体报道对公众对事件的看法具有重大影响。尽管如此,媒体媒体经常有偏见。偏见新闻文章的一种方法是改变选择一词。通过单词选择对偏见的自动识别是具有挑战性的,这主要是由于缺乏黄金标准数据集和高环境依赖性。本文介绍了Babe,这是由训练有素的专家创建的强大而多样化的数据集,用于媒体偏见研究。我们还分析了为什么专家标签在该域中至关重要。与现有工作相比,我们的数据集提供了更好的注释质量和更高的通知者协议。它由主题和插座之间平衡的3,700个句子组成,其中包含单词和句子级别上的媒体偏见标签。基于我们的数据,我们还引入了一种自动检测新闻文章中偏见的句子的方法。我们最佳性能基于BERT的模型是在由遥远标签组成的较大语料库中进行预训练的。对我们提出的监督数据集进行微调和评估模型,我们达到了0.804的宏F1得分,表现优于现有方法。
translated by 谷歌翻译
媒体覆盖率对公众对事件的看法具有实质性影响。尽管如此,媒体网点往往偏见。偏见新闻文章的一种方法是改变单词选择。单词选择自动识别偏差是具有挑战性的,主要是由于缺乏金标准数据集和高上下文依赖性。在本研究项目中,我旨在设计数据集和方法来识别媒体偏差。为实现这一目标,我计划使用自然语言处理和深度学习的研究方法,同时使用模型,并使用心理学和语言学的分析概念。第一个结果表明了跨学科研究方法的有效性。我的愿景是设计一个系统,帮助新闻读者了解偏见造成的媒体覆盖差异。到目前为止,我最好的基于BERT的模型是在较大的标签组成的较大核查上进行预先培训,表明远程监管有可能成为偏向偏差困难任务的解决方案。
translated by 谷歌翻译
当客观报告代替主观写作时,诸如百科全书和新闻文章的参考文本可以表现出偏见的语言。现有方法检测偏差主要依赖于带注释的数据来训练机器学习模型。但是,低注释员协议和可比性是可用媒体偏见Corpora的实质性缺点。为了评估数据收集选项,我们收集和比较从两个流行的众包平台获得的标签。我们的结果展示了现有的众包缺乏数据质量,强调了培训的专家框架的需要收集更可靠的数据集。通过创建此类框架并收集第一个数据集,我们能够将Krippendorff的$ \ Alpha $ = 0.144(众群标签)提升为$ \ Alpha $ = 0.419(专家标签)。我们得出结论,详细的注释培训提高了数据质量,提高了现有偏置检测系统的性能。我们将来继续扩展我们的数据集。
translated by 谷歌翻译
媒体覆盖范围对公众对事件的看法具有实质性影响。媒体框架事件的方式可以显着改变对社会的信仰和看法。尽管如此,众所周知,几乎所有媒体网点都以偏见的方式报告新闻。虽然可以通过改变单词选择或省略信息来引入这种偏差,但是偏差的感知也很大程度上取决于读者的个人背景。因此,媒体偏差是一个非常复杂的构造,用于识别和分析。尽管媒体偏见是许多研究的主题,但之前的评估策略过于简化,缺乏重叠和实证评估。因此,本研究旨在开发一种可以用作可靠标准来评估物品偏差的规模。为了命名一个例子:如果我们要问,打算衡量新闻文章中的偏见,“文章有多偏见?”或者我们应该改用,“文章是如何对待美国总统的?”。我们进行了文献搜索,以查找有关先前对该主题的文本看法的相关问题。在一个多迭代过程中,我们首先总结并缩小了这些问题,以结束关于偏见的完整和代表可能的问题类型。最终组由25个问题组成,答案格式不同,使用语义差异的17个问题,以及六个感受评级。我们在190条文章中测试了每个问题,总体上有663名参与者来确定问题衡量文章的感知偏见的程度。我们的研究结果表明,21项最终物品适合,可靠,以测量媒体偏差的看法。我们在http://bias -question-tree.gipplab.org/上发布最后一组问题。
translated by 谷歌翻译
我们提供了一个免费的和开源工具,用于创建基于Web的调查,包括文本注释任务。现有工具提供文本注释或调查功能,但并不是两者。结合两个输入类型对于调查读者对文本的看法特别相关,这也取决于读者的背景,例如年龄,性别和教育。我们的工具主要迎合了图书馆和信息科学,社会科学和人文学科对研究的研究人员的需求,他们将内容分析进行调查,例如媒体偏见,政治交流或假新闻。
translated by 谷歌翻译
倾斜的新闻报道,也称为媒体偏见,可以严重影响新闻消费者的解释和对新闻作出反应。要自动识别偏见语言,我们提出了一种比较相关词语的上下文的探索方法。我们训练两个嵌入模型,一个在左翼的文本上,另一个在右翼新闻网点上。我们的假设是,嵌入空格中的单词的表示与非偏见的单词比偏见的单词更相似。潜在的想法是,不同新闻网点中的偏置词的背景比非偏见的单词更强烈地变化,因为根据其上下文,偏置单词的感知是不同的。虽然我们没有发现统计学意义要接受假设,但结果表明了这种方法的有效性。例如,在单词嵌入空间的线性映射之后,31%的单词具有最大距离可能导致偏差。为了改善结果,我们发现数据集需要明显更大,我们将进一步的方法作为未来的研究方向推出。据我们所知,本文介绍了第一个深入看,通过Word Embeddings测量的偏置词语的背景。
translated by 谷歌翻译
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
translated by 谷歌翻译
We introduce a machine-learning (ML)-based weather simulator--called "GraphCast"--which outperforms the most accurate deterministic operational medium-range weather forecasting system in the world, as well as all previous ML baselines. GraphCast is an autoregressive model, based on graph neural networks and a novel high-resolution multi-scale mesh representation, which we trained on historical weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF)'s ERA5 reanalysis archive. It can make 10-day forecasts, at 6-hour time intervals, of five surface variables and six atmospheric variables, each at 37 vertical pressure levels, on a 0.25-degree latitude-longitude grid, which corresponds to roughly 25 x 25 kilometer resolution at the equator. Our results show GraphCast is more accurate than ECMWF's deterministic operational forecasting system, HRES, on 90.0% of the 2760 variable and lead time combinations we evaluated. GraphCast also outperforms the most accurate previous ML-based weather forecasting model on 99.2% of the 252 targets it reported. GraphCast can generate a 10-day forecast (35 gigabytes of data) in under 60 seconds on Cloud TPU v4 hardware. Unlike traditional forecasting methods, ML-based forecasting scales well with data: by training on bigger, higher quality, and more recent data, the skill of the forecasts can improve. Together these results represent a key step forward in complementing and improving weather modeling with ML, open new opportunities for fast, accurate forecasting, and help realize the promise of ML-based simulation in the physical sciences.
translated by 谷歌翻译
Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).
translated by 谷歌翻译