随着非二元人在西方社会的关注越来越多,性别对语言的策略开始摆脱二进制(仅女性/男性)性别概念。然而,到目前为止,几乎没有任何将这些身份考虑到机器翻译模型中的方法。缺乏对此类技术的社会技术意义的理解,可能会进一步再现压迫和贴标记的语言机制。在本文中,我们描述了关于性别对语言和语言技术研讨会的方法和结果,该研讨会由Tu Wien,St.P \“ Olten UAS,FH UAS,FH校园Wien和Vienna大学的十位研究人员领导和组织并于2021年秋季在维也纳举行。邀请了广泛的利益集团及其代表确保可以整体处理该主题。因此,我们的目的是包括翻译人员,机器翻译专家和非二元个人(如社区专家”)在平等的基础上。我们的分析表明,机器翻译中的性别需要高度的上下文敏感性,因此,这种技术的开发人员需要在仍在社会谈判中的过程中谨慎地定位自己,并且灵活的方法似乎最适合目前。然后,我们说明了从性别面对语言技术领域的结果遵循的步骤,以便技术发展可以充分地排列U P具有社会进步。 - [德语摘要由Arxiv Admins手动添加]
translated by 谷歌翻译
算法决策支持(AD)在社会各个领域的各种不同背景和结构中逐渐使用,影响了许多人的生活。它的使用引发了有关问责制,透明度和责任的问题。我们的文章旨在概述与组织环境中与广告,责任和决策相关的中心问题,并确定开放的问题和研究差距。此外,我们描述了一套准则和一种补充数字工具,以协助从业者在组织环境中介绍广告时绘制责任。 - 算法替代决策(算法决策支持,广告),越来越多地用于各种环境和结构,并影响了许多社会领域中许多人的生活。您的使用提出了一些问题,包括会计,透明度和责任。在以后,我们必须就有关广告,责任和决策 - 在组织环境中制定的最重要问题进行一个\“ Uberblick \”,并展示一些开放的问题和研究。实践是我们制定的指南,包括数字工具,这应该有助于帮助用户:尤其是在组织环境中使用广告时的位置和责任分配。
translated by 谷歌翻译
小组讨论的《冲动讲座》的语音手稿(德语 +英语)“梅机器(可以)想到?”在2022年5月28日在斯图加特的第102天天主教日。小组:Winfried Kretschmann(MDL,总理Baden-W \“ Urttemberg,Stuttgart),Ursula Nothehelle Wildfeuer(Freiburg),Michael Resch(Stuttgart),Karsten Wendland(Aalen)(Aalen)。 :Verena Neuhausen(Stuttgart)。
translated by 谷歌翻译
人工智能(AI)是21世纪最有前途的技术之一,对社会和经济产生了明显影响。通过这项工作,我们简要概述了全球趋势,行业应用以及我们在工业和学术界的国际经验和工作中的精选用例。目的是提出全球和地区的积极实践,并就将B&H定位在全球AI场景中定位的现实目标和机会提供明智的意见。
translated by 谷歌翻译
As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.
translated by 谷歌翻译
Advances in image processing and analysis as well as machine learning techniques have contributed to the use of biometric recognition systems in daily people tasks. These tasks range from simple access to mobile devices to tagging friends in photos shared on social networks and complex financial operations on self-service devices for banking transactions. In China, the use of these systems goes beyond personal use becoming a country's government policy with the objective of monitoring the behavior of its population. On July 05th 2021, the Brazilian government announced acquisition of a biometric recognition system to be used nationwide. In the opposite direction to China, Europe and some American cities have already started the discussion about the legality of using biometric systems in public places, even banning this practice in their territory. In order to open a deeper discussion about the risks and legality of using these systems, this work exposes the vulnerabilities of biometric recognition systems, focusing its efforts on the face modality. Furthermore, it shows how it is possible to fool a biometric system through a well-known presentation attack approach in the literature called morphing. Finally, a list of ten concerns was created to start the discussion about the security of citizen data and data privacy law in the Age of Artificial Intelligence (AI).
translated by 谷歌翻译
我们引入了翻译误差校正(TEC),这是自动校正人类生成的翻译的任务。机器翻译(MT)的瑕疵具有长期的动机系统,可以通过自动编辑后改善变化后的转换。相比之下,尽管人类直觉上犯了不同的错误,但很少有人注意自动纠正人类翻译的问题,从错别字到翻译约定的矛盾之处。为了调查这一点,我们使用三个TEC数据集构建和释放ACED语料库。我们表明,与自动后编辑数据集中的MT错误相比,TEC中的人类错误表现出更加多样化的错误,翻译流利性误差要少得多,这表明需要专门用于纠正人类错误的专用TEC模型。我们表明,基于人类错误的合成错误的预训练可将TEC F-SCORE提高多达5.1点。我们通过九名专业翻译编辑进行了人类的用户研究,发现我们的TEC系统的帮助使他们产生了更高质量的修订翻译。
translated by 谷歌翻译
This paper describes the system developed at the Universitat Polit\`ecnica de Catalunya for the Workshop on Machine Translation 2022 Sign Language Translation Task, in particular, for the sign-to-text direction. We use a Transformer model implemented with the Fairseq modeling toolkit. We have experimented with the vocabulary size, data augmentation techniques and pretraining the model with the PHOENIX-14T dataset. Our system obtains 0.50 BLEU score for the test set, improving the organizers' baseline by 0.38 BLEU. We remark the poor results for both the baseline and our system, and thus, the unreliability of our findings.
translated by 谷歌翻译
世界各地的数百万人无法访问网络上的内容,因为大多数内容都没有用他们的语言提供。机器翻译(MT)系统有可能改变这种语言。目前的MT系统为高资源语言对提供了非常准确的结果,例如德语和英语。但是,对于许多低资源语言,MT仍在积极研究中。关键挑战是缺少数据集来构建这些系统。我们呈现Lesan,一个用于低资源语言的MT系统。我们的管道通过利用在线和离线来源来解决低资源MT的关键瓶颈,是埃塞俄比亚的自定义OCR系统和自动对准模块。管道中的最终步骤是序列模型的序列,它将并将语料库与输入进行并联,给我们一个翻译模型。 Lesan的翻译模型是基于变压器架构。构建基础模型后,返回转换,用于利用单旋语。目前莱森支持Tigrinya,Amharic和英语的翻译。我们执行广泛的人类评估,并表明Lesan优于最先进的系统,例如谷歌翻译和全部六对的微软翻译。莱森自由地提供,迄今为止已达到超过1000万译本。目前,只有217个Tigrinya和15,009个Amharic Wikipedia文章。我们相信莱森将通过MT为数百万人民促进对网络的进入。
translated by 谷歌翻译
Gender-inclusive language is important for achieving gender equality in languages with gender inflections, such as German. While stirring some controversy, it is increasingly adopted by companies and political institutions. A handful of tools have been developed to help people use gender-inclusive language by identifying instances of the generic masculine and providing suggestions for more inclusive reformulations. In this report, we define the underlying tasks in terms of natural language processing, and present a dataset and measures for benchmarking them. We also present a model that implements these tasks, by combining an inclusive language database with an elaborate sequence of processing steps via standard pre-trained models. Our model achieves a recall of 0.89 and a precision of 0.82 in our benchmark for identifying exclusive language; and one of its top five suggestions is chosen in real-world texts in 44% of cases. We sketch how the area could be further advanced by training end-to-end models and using large language models; and we urge the community to include more gender-inclusive texts in their training data in order to not present an obstacle to the adoption of gender-inclusive language. Through these efforts, we hope to contribute to restoring justice in language and, to a small extent, in reality.
translated by 谷歌翻译
While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.
translated by 谷歌翻译
Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. However, it is unclear how the choice of these in-context examples and their ordering impacts the output translation quality. In this work, we aim to understand the properties of good in-context examples for MT in both in-domain and out-of-domain settings. We show that the translation quality and the domain of the in-context examples matter and that 1-shot noisy unrelated example can have a catastrophic impact on output quality. While concatenating multiple random examples reduces the effect of noise, a single good prompt optimized to maximize translation quality on the development dataset can elicit learned information from the pre-trained language model. Adding similar examples based on an n-gram overlap with the test source significantly and consistently improves the translation quality of the outputs, outperforming a strong kNN-MT baseline in 2 out of 4 out-of-domain datasets.
translated by 谷歌翻译
End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions. The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASR-dependent metrics including ASR-SENTBLEU in all translation directions and ASR-COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics.
translated by 谷歌翻译
在本文中,作为一个案例研究,我们在与谷歌翻译的机器翻译中对性别偏差进行了系统研究。我们翻译了包含匈牙利语的职业名称的句子,这是一种与性别中性代词的语言,进入英语。我们的目标是通过将翻译与最佳非偏见翻译者进行比较来提出偏见的公平措施。在评估偏见时,我们使用以下参考点:(1)源和目标语言国家的职业中的男女分布,以及(2)匈牙利调查结果,审查某些工作是通常被认为是女性化或男性化的。我们还研究了如何将句子扩展到职业的形容词效应了翻译代词的性别。因此,我们发现对双方的偏见,但对女性的偏见结果更频繁。翻译更接近我们对客观职业统计的看法。最后,职业对翻译产生了更大的效果而不是形容词。
translated by 谷歌翻译
最近的神经机翻译研究探索了灵活的发行订单,作为左右一代的替代品。然而,培训非单调模型带来了新的并发症:如何在同一最终结果到达的订单组合爆炸时搜索良好的订单?此外,这些自动排序如何与人类翻译的实际行为进行比较?目前的模型依靠手动构建的偏见或留下自己的所有可能性。在本文中,我们分析了人工后编辑所产生的排序,并使用它们培训自动编辑后系统。我们将生成的系统与由左右和随机编辑排序训练的人进行比较。我们观察到人类倾向于遵循几乎左右的顺序,而是有趣的偏差,例如首选通过纠正标点符号或动词而开始。
translated by 谷歌翻译
我们在本文中介绍了我们认为是视频游戏机翻译的首次尝试之一。我们的研究表明,只有有限的内域数据训练的模型超出了可公开可用的系统,随后的人类评估揭示了最终翻译中的有趣发现。本文的第一部分介绍了视频游戏翻译的一些挑战,一些现有文献以及本实验中使用的系统和数据集。最后一节讨论了我们对所得翻译的分析以及这种自动化系统的潜在好处。一个这样的发现突出了该模型学习从英语到法语的视频游戏翻译的典型规则和模式的能力。因此,我们的结论表明,鉴于令人鼓舞的结果,工作的高度重复性以及翻译人员在该领域中通常不良的工作条件,视频游戏机译的具体情况可能非常有用。但是,与文化部门中MT的其他用例一样,我们认为这在很大程度上取决于该工具的适当实施,该工具应与人类翻译人员进行交互方式来刺激创造力,而不是为了生产力而不是原始的后编辑。
translated by 谷歌翻译
培训语音翻译(ST)模型需要大量和高质量的数据集。必C是使用最广泛的ST基准数据集之一。它包含大约400个小时的语音转录翻译数据,用于八个翻译方向。该数据集在创建过程中通过了几个质量控制过滤器。但是,我们发现必须使用三个主要质量问题:音频文本未对准,不准确的翻译和不必要的演讲者的名字。这些数据质量问题对模型开发和评估有什么影响?在本文中,我们提出了一种自动方法,以英语 - 德语(EN-DE)翻译为例,以解决或过滤上述质量问题。我们的实验表明,ST模型在干净的测试集上的表现更好,并且在不同的测试集中,提出的模型的排名保持一致。此外,简单地从训练集中删除未对准的数据点并不会导致更好的ST模型。
translated by 谷歌翻译
与简单英语的德国同行“莱希特·斯普拉奇(Leichte Sprache)”是一种旨在促进复杂的书面语言的受监管语言,否则不同的人群将无法访问。我们为简单德语 - 德语提供了一个新的与句子一致的单语语料库。它包含多个使用自动句子对准方法对齐的文档对准源。我们根据手动标记的对齐文档子集评估我们的对齐方式。通过F1得分衡量的句子对齐质量超过了先前的工作。我们根据CC BY-SA和MIT许可证的随附代码发布数据集。
translated by 谷歌翻译
We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing "bias" in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of "bias"-i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements-and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities. Anne H. Charity Hudley. 2017. Language and Racialization. In Ofelia García, Nelson Flores, and Massimiliano Spotti, editors, The Oxford Handbook of Language and Society. Oxford University Press. Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. 2019. On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the Workshop on Gender Bias in Natural Language Processing, pages 173-181, Florence, Italy.
translated by 谷歌翻译
测量,评估和减少性别偏见已经与每隔几个月释放的更新和改进的语言嵌入来源于最前沿。但这种偏见可能从域变为域吗?我们看到很多工作要在各种嵌入式模型中学习这些偏见,但工作已经有限地完成了Debias Oxpan语言。我们的目标是衡量并研究印地语语言的偏差,这是一种高阶语言(性别),参考英语,较低的语言。为此,我们研究域跨域的变化来量化,如果域嵌入式允许我们对这对印度英语模型的性别偏见有所了解。我们将在四个不同的Corpora中生成嵌入式,并通过使用预先训练的艺术指示型翻译模型实现不同的指标,比较结果,这些标准 - 英语翻译模型已经比现有模型更好地执行了更好的NLP任务。
translated by 谷歌翻译