Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections. Multiple iterations, however, increase the risk that information other than the target is negatively affected. We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space. Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.
translated by 谷歌翻译
接受文本数据培训的现代神经模型取决于没有直接监督的预先训练的表示。由于这些表示越来越多地用于现实世界应用中,因此无法\ emph {Control}它们的内容成为一个越来越重要的问题。我们制定了与给定概念相对应的线性子空间的问题,以防止线性预测因子恢复概念。我们将此问题建模为受约束的线性最小游戏,并表明现有解决方案通常不是最佳的此任务。我们为某些目标提供了封闭式的解决方案,并提出了凸松弛的R-Lace,对他人效果很好。当在二元性别删除的背景下进行评估时,该方法恢复了一个低维子空间,其去除通过内在和外在评估会减轻偏见。我们表明,尽管是线性的,但该方法是高度表达性的,有效地减轻了深度非线性分类器中的偏见,同时保持拖延性和解释性。
translated by 谷歌翻译
在过去几年中,Word和句嵌入式已建立为各种NLP任务的文本预处理,并显着提高了性能。不幸的是,还表明这些嵌入物从训练数据中继承了各种偏见,从而通过了社会中存在的偏差到NLP解决方案。许多论文试图在单词或句子嵌入中量化偏差,以评估脱叠方法或比较不同的嵌入模型,通常具有基于余弦的指标。然而,最近有些作品对这些指标提出了疑虑,表明即使这些指标报告低偏见,其他测试仍然显示偏差。事实上,文献中提出了各种各样的偏差指标或测试,而没有任何关于最佳解决方案的共识。然而,我们缺乏评估理论级别的偏见度量或详细阐述不同偏差度量的优缺点的作品。在这项工作中,我们将探索基于余弦的偏差指标。我们根据以前的作品和偏见度量的推导条件的思想形式化偏差定义。此外,我们彻底调查了现有的基于余弦的指标及其限制,以表明为什么这些度量可以在某些情况下报告偏差。最后,我们提出了一个新的公制,同样地解决现有度量的缺点,以及数学上证明的表现相同。
translated by 谷歌翻译
随着日常生活中的自然语言处理(NLP)的部署扩大,来自NLP模型的继承的社会偏见变得更加严重和有问题。以前的研究表明,在人生成的Corpora上培训的单词嵌入式具有强烈的性别偏见,可以在下游任务中产生鉴别结果。以前的脱叠方法主要侧重于建模偏差,并且仅隐含地考虑语义信息,同时完全忽略偏置和语义组件之间的复杂潜在的因果结构。为了解决这些问题,我们提出了一种新的方法,利用了因果推断框架来有效消除性别偏见。所提出的方法允许我们构建和分析促进性别信息流程的复杂因果机制,同时保留单词嵌入中的Oracle语义信息。我们的综合实验表明,该方法达到了最先进的性别脱叠任务。此外,我们的方法在字相似性评估和各种外在下游NLP任务中产生了更好的性能。
translated by 谷歌翻译
语言可以用作再现和执行有害刻板印象和偏差的手段,并被分析在许多研究中。在本文中,我们对自然语言处理中的性别偏见进行了304篇论文。我们分析了社会科学中性别及其类别的定义,并将其连接到NLP研究中性别偏见的正式定义。我们调查了在对性别偏见的研究中应用的Lexica和数据集,然后比较和对比方法来检测和减轻性别偏见。我们发现对性别偏见的研究遭受了四个核心限制。 1)大多数研究将性别视为忽视其流动性和连续性的二元变量。 2)大部分工作都在单机设置中进行英语或其他高资源语言进行。 3)尽管在NLP方法中对性别偏见进行了无数的论文,但我们发现大多数新开发的算法都没有测试他们的偏见模型,并无视他们的工作的伦理考虑。 4)最后,在这一研究线上发展的方法基本缺陷涵盖性别偏差的非常有限的定义,缺乏评估基线和管道。我们建议建议克服这些限制作为未来研究的指导。
translated by 谷歌翻译
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
translated by 谷歌翻译
现代语言模型中的检测和缓解有害偏见被广泛认为是至关重要的开放问题。在本文中,我们退后一步,研究语言模型首先是如何偏见的。我们使用在英语Wikipedia语料库中训练的LSTM架构,使用相对较小的语言模型。在培训期间的每一步中,在每个步骤中都会更改数据和模型参数,我们可以详细介绍性别表示形式的发展,数据集中的哪些模式驱动器以及模型的内部状态如何与偏差相关在下游任务(语义文本相似性)中。我们发现性别的表示是动态的,并在训练过程中确定了不同的阶段。此外,我们表明,性别信息在模型的输入嵌入中越来越多地表示,因此,对这些性别的态度可以有效地减少下游偏置。监测训练动力学,使我们能够检测出在输入嵌入中如何表示男性和男性性别的不对称性。这很重要,因为这可能会导致幼稚的缓解策略引入新的不良偏见。我们更普遍地讨论了发现与缓解策略的相关性,以及将我们的方法推广到更大语言模型,变压器体系结构,其他语言和其他不良偏见的前景。
translated by 谷歌翻译
通过机器学习模型学到的文本表示通常编码用户的不良人口统计信息。基于这些表示形式的预测模型可以依靠此类信息,从而产生偏见的决策。我们提出了一种新颖的偏见技术,即公平意识的速率最大化(农场),该技术使用速率依赖函数来消除受保护的信息,以表示属于相同受保护的属性类别的实例不相关。Farm能够在有或没有目标任务的情况下进行辩论式表示。还可以适应农场同时删除有关多个受保护属性的信息。经验评估表明,Farm在几个数据集上实现了最新的性能,并且学会的表示形式泄漏了受保护的属性信息明显减少,以防止非线性探测网络攻击。
translated by 谷歌翻译
The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how those representations encode human-interpretable concepts is a fundamental problem. One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations. However, while many linear erasure algorithms are tractable and interpretable, neural networks do not necessarily represent concepts in a linear manner. To identify non-linearly encoded concepts, we propose a kernelization of a linear minimax game for concept erasure. We demonstrate that it is possible to prevent specific non-linear adversaries from predicting the concept. However, the protection does not transfer to different nonlinear adversaries. Therefore, exhaustively erasing a non-linearly encoded concept remains an open problem.
translated by 谷歌翻译
许多现代的机器学习算法通过在与性别或种族等敏感属性相关的粗略定义的群体之间执行公平限制来减轻偏见。但是,这些算法很少说明组内异质性和偏见可能会对组的某些成员产生不成比例。在这项工作中,我们表征了社会规范偏见(Snob),这是一种微妙但因此的算法歧视类型,即使这些系统实现了群体公平目标,也可以通过机器学习模型展示。我们通过职业分类中的性别偏见来研究这个问题。我们通过衡量算法的预测与推断性别规范的一致性相关,来量化势利小人。当预测一个人是否属于男性主导的职业时,该框架表明,“公平”的分类者仍然以与推断的男性规范相符的方式写的传记。我们比较跨算法公平方法的势利小人,并表明它通常是残留的偏见,而后处理方法根本不会减轻这种偏见。
translated by 谷歌翻译
语言语料库中的统计规律将众所周知的社会偏见编码为单词嵌入。在这里,我们专注于性别,以全面分析在互联网语料库中训练的广泛使用的静态英语单词嵌入式(Glove 2014,FastText 2017)。使用单类单词嵌入关联测试,我们证明了性别偏见的广泛流行,这些偏见也显示出:(1)与男性与女性相关的单词频率; (b)与性别相关的单词中的言论部分; (c)与性别相关的单词中的语义类别; (d)性别相关的单词中的价,唤醒和优势。首先,就单词频率而言:我们发现,在词汇量中,有1000个最常见的单词与男性相比,有77%的人与男性相关,这是在英语世界的日常语言中直接证明男性默认的证据。其次,转向言论的部分:顶级男性相关的单词通常是动词(例如,战斗,压倒性),而顶级女性相关的单词通常是形容词和副词(例如,奉献,情感上)。嵌入中的性别偏见也渗透到言论部分。第三,对于语义类别:自下而上,对与每个性别相关的前1000个单词的群集分析。与男性相关的顶级概念包括大技术,工程,宗教,体育和暴力的角色和领域;相比之下,顶级女性相关的概念较少关注角色,包括女性特定的诽谤和性内容以及外观和厨房用语。第四,使用〜20,000个单词词典的人类评级,唤醒和主导地位,我们发现与男性相关的单词在唤醒和优势上较高,而与女性相关的单词在价上更高。
translated by 谷歌翻译
What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process.When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses.We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.
translated by 谷歌翻译
词汇嵌入在很大程度上仅限于个人和独立的社会类别。但是,现实世界中的语料库通常提出可能相互关联或相交的多个社会类别。例如,“头发编织”与非洲裔美国女性刻板印象相关,但非洲裔美国人也不是女性。因此,这项工作研究与多个社会类别相关的偏见:由不同类别和交叉偏见的联合引起的联合偏见,这些偏见与组成类别的偏见不重叠。我们首先从经验上观察到,单个偏见是非琐事相交的(即在一维子空间上)。从社会科学和语言理论中的交叉理论中,我们使用单个偏见的非线性几何形状为多个社会类别构建了一个交叉子空间。经验评估证实了我们方法的功效。数据和实现代码可以在https://github.com/githublucheng/implementation-of-josec-coling-22下载。
translated by 谷歌翻译
表明多语言语言模型允许跨脚本和语言进行非平凡的转移。在这项工作中,我们研究了能够转移的内部表示的结构。我们将重点放在性别区分作为实际案例研究的表示上,并研究在跨不同语言的共享子空间中编码性别概念的程度。我们的分析表明,性别表示由几个跨语言共享的重要组成部分以及特定于语言的组成部分组成。与语言无关和特定语言的组成部分的存在为我们做出的有趣的经验观察提供了解释:虽然性别分类跨语言良好地传递了跨语言,对性别删除的干预措施,对单一语言进行了培训,但不会轻易转移给其他人。
translated by 谷歌翻译
Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language-the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model-namely, the GloVe word embedding-trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.
translated by 谷歌翻译
大语言模型中的表示形式包含多种类型的性别信息。我们专注于英语文本中的两种此类信号:事实性别信息,这是语法或语义属性,以及性别偏见,这是单词和特定性别之间的相关性。我们可以解开模型的嵌入,并识别编码两种类型信息的组件。我们的目标是减少表示形式中的刻板印象偏见,同时保留事实性别信号。我们的过滤方法表明,可以减少性别中立职业名称的偏见,而不会严重恶化能力。这些发现可以应用于语言生成,以减轻对刻板印象的依赖,同时保留核心方面的性别协议。
translated by 谷歌翻译
大型预训练的语言模型通常受到大量互联网数据的培训,其中一些可能包含有毒或滥用语言。因此,语言模型编码有毒信息,这使得这些语言模型有限的真实应用。目前的方法旨在防止出现有毒功能出现生成的文本。我们假设在预训练的语言模型的潜在空间中存在低维毒子空间,其存在表明有毒特征遵循一些底层图案,因此可拆卸。为了构建这种有毒的子空间,我们提出了一种方法来概括潜在空间中的毒性方向。我们还提供了一种使用基于上下文的Word Masking系统构造并行数据集的方法。通过我们的实验,我们表明,当从一组句子表现中删除有毒的子空间时,结果几乎没有毒性表现。我们凭经验证明了使用我们的方法发现的子空间推广到多个毒性Corpora,表明存在低维毒子空间。
translated by 谷歌翻译
我们采用自然语言处理技术来分析“ 200万首歌数据库”语料库中的377808英文歌曲歌词,重点是五十年(1960- 2010年)的性别歧视表达和性别偏见的测量。使用性别歧视分类器,我们比以前的研究使用手动注释的流行歌曲样本来确定性别歧视歌词。此外,我们通过测量在歌曲歌词中学到的单词嵌入中的关联来揭示性别偏见。我们发现性别歧视的内容可以在整个时间内增加,尤其是从男性艺术家和出现在Billboard图表中的流行歌曲。根据表演者的性别,歌曲还包含不同的语言偏见,男性独奏艺术家歌曲包含更多和更强烈的偏见。这是对此类类型的第一个大规模分析,在流行文化的如此有影响力的一部分中,可以深入了解语言使用。
translated by 谷歌翻译
对自然语言处理资源中的偏置模式的提高意识,如BERT,具有许多度量来量化“偏见”和“公平”。但是,如果没有完全不可能,请比较不同指标的结果和评估这些度量的作品仍然困难。我们调查了对预用语言模型的公平度量标准的现有文献,并通过实验评估兼容性,包括语言模型中的偏差,如在其下游任务中。我们通过传统文献调查和相关分析的混合来实现这一目标,以及运行实证评估。我们发现许多指标不兼容,高度依赖于(i)模板,(ii)属性和目标种子和(iii)选择嵌入式。这些结果表明,公平或偏见评估对情境化语言模型仍然具有挑战性,如果不是至少高度主观。为了提高未来的比较和公平评估,我们建议避免嵌入基于的指标并专注于下游任务中的公平评估。
translated by 谷歌翻译
News articles both shape and reflect public opinion across the political spectrum. Analyzing them for social bias can thus provide valuable insights, such as prevailing stereotypes in society and the media, which are often adopted by NLP models trained on respective data. Recent work has relied on word embedding bias measures, such as WEAT. However, several representation issues of embeddings can harm the measures' accuracy, including low-resource settings and token frequency differences. In this work, we study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles. To cover the whole spectrum of political bias in the US, we collect 500k articles and review psychology literature with respect to expected social bias. We then quantify social bias using WEAT along with embedding algorithms that account for the aforementioned issues. We compare how models trained with the algorithms on news articles represent the expected social bias. Our results suggest that the standard way to quantify bias does not align well with knowledge from psychology. While the proposed algorithms reduce the~gap, they still do not fully match the literature.
translated by 谷歌翻译