智能论文笔记

Can You Fool AI by Doing a 180? $\unicode{x2013}$ A Case Study on Authorship Analysis of Texts by Arata Osada

Jagna Nieuwazny , Karol Nowakowski , Michal Ptaszynski , Fumito Masui

分类：自然语言处理 | 人工智能 | 机器学习

2022-07-19

本文是我们尝试回答两个问题，涵盖道德和作者资格分析领域的问题。首先，由于用于执行作者身份分析的方法意味着他或她创建的内容可以识别作者，因此我们有兴趣找出作者身份证系统是否有可能正确地将作者归因于作者，如果年来，他们经历了重大的心理过渡。其次，从作者的道德价值观演变的角度来看，我们检查了如果作者归因系统在检测单个作者身份方面遇到困难，这将是什么意思。我们着手使用基于预训练的变压器模型的文本分类器执行二进制作者资格分析任务来回答这些问题，并依靠常规相似性指标来回答这些问题。对于测试套装，我们选择了教育史上的日本教育家和专家Arata Osada的作品，其中一半是在第二次世界大战之前写的书，在1950年代又是一半，在此期间，他进行了转变。政治意见的条款。结果，我们能够确认，在10年以上的时间跨度中，Arata Osada撰写的文本，而分类准确性下降了很大的利润率，并且大大低于其他非虚构的文本作家，预测的信心得分仍然与时间跨度较短的水平相似，这表明分类器在许多情况下被欺骗来决定在多年的时间跨度上写的文本实际上是由两个不同的人编写的，这反过来又使我们相信这种变化会影响作者身份分析，并且历史事件对人的著作中所表达的道德观。

translated by 谷歌翻译

我们使用不同的语言支持特征预处理方法研究特征密度（FD）的有效性，以估计数据集复杂性，这又用于比较估计任何训练之前机器学习（ML）分类器的潜在性能。我们假设估计数据集复杂性允许减少所需实验迭代的数量。这样我们可以优化ML模型的资源密集型培训，这是由于可用数据集大小的增加以及基于深神经网络（DNN）的模型的不断增加的普及而成为一个严重问题。由于训练大规模ML模型引起的令人惊叹的二氧化碳排放量，不断增加对更强大的计算资源需求的问题也在影响环境。该研究是在多个数据集中进行的，包括流行的数据集，例如用于培训典型情感分析模型的Yelp业务审查数据集，以及最近的数据集尝试解决网络欺凌问题，这是一个严重的社会问题，也是一个严重的社会问题一个更复杂的问题，形成了语言代表的观点。我们使用收集多种语言的网络欺凌数据集，即英语，日语和波兰语。数据集的语言复杂性的差异允许我们另外讨论语言备份的单词预处理的功效。

translated by 谷歌翻译

We propose a Cascaded Buffered IoU (C-BIoU) tracker to track multiple objects that have irregular motions and indistinguishable appearances. When appearance features are unreliable and geometric features are confused by irregular motions, applying conventional Multiple Object Tracking (MOT) methods may generate unsatisfactory results. To address this issue, our C-BIoU tracker adds buffers to expand the matching space of detections and tracks, which mitigates the effect of irregular motions in two aspects: one is to directly match identical but non-overlapping detections and tracks in adjacent frames, and the other is to compensate for the motion estimation bias in the matching space. In addition, to reduce the risk of overexpansion of the matching space, cascaded matching is employed: first matching alive tracks and detections with a small buffer, and then matching unmatched tracks and detections with a large buffer. Despite its simplicity, our C-BIoU tracker works surprisingly well and achieves state-of-the-art results on MOT datasets that focus on irregular motions and indistinguishable appearances. Moreover, the C-BIoU tracker is the dominant component for our 2-nd place solution in the CVPR'22 SoccerNet MOT and ECCV'22 MOTComplex DanceTrack challenges. Finally, we analyze the limitation of our C-BIoU tracker in ablation studies and discuss its application scope.

translated by 谷歌翻译

This is our 2nd-place solution for the ECCV 2022 Multiple People Tracking in Group Dance Challenge. Our method mainly includes two steps: online short-term tracking using our Cascaded Buffer-IoU (C-BIoU) Tracker, and, offline long-term tracking using appearance feature and hierarchical clustering. Our C-BIoU tracker adds buffers to expand the matching space of detections and tracks, which mitigates the effect of irregular motions in two aspects: one is to directly match identical but non-overlapping detections and tracks in adjacent frames, and the other is to compensate for the motion estimation bias in the matching space. In addition, to reduce the risk of overexpansion of the matching space, cascaded matching is employed: first matching alive tracks and detections with a small buffer, and then matching unmatched tracks and detections with a large buffer. After using our C-BIoU for online tracking, we applied the offline refinement introduced by ReMOTS.

translated by 谷歌翻译