智能论文笔记

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

Jiangyan Yi , Chenglong Wang , Jianhua Tao , Zhengkun Tian , Cunhang Fan , Haoxin Ma , Ruibo Fu

分类：自然语言处理

2022-11-11

Previous databases have been designed to further the development of fake audio detection. However, fake utterances are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audios. They ignore a fake situation, in which the attacker manipulates an acoustic scene of the original audio with another forgery one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper designs such a dataset for scene fake audio detection (SceneFake). A manipulated audio in the SceneFake dataset involves only tampering the acoustic scene of an utterance by using speech enhancement technologies. We can not only detect fake utterances on a seen test set but also evaluate the generalization of fake detection models to unseen manipulation attacks. Some benchmark results are described on the SceneFake dataset. Besides, an analysis of fake attacks with different speech enhancement technologies and signal-to-noise ratios are presented on the dataset. The results show that scene manipulated utterances can not be detected reliably by the existing baseline models of ASVspoof 2019. Furthermore, the detection of unseen scene manipulation audio is still challenging.

translated by 谷歌翻译

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation

Xinrui Yan , Jiangyan Yi , Jianhua Tao , Chenglong Wang , Haoxin Ma , Zhengkun Tian , Ruibo Fu

分类：人工智能

2022-08-21

进行了许多有效的尝试进行了DeepFake音频检测。但是，他们只能区分真实和假货。对于许多实际的应用程序方案，还需要哪种工具或算法生成DeepFake音频。这提出了一个问题：我们可以检测到DeepFake音频的系统指纹吗？因此，本文进行了初步研究，以检测DeepFake音频的系统指纹。实验是从五个最新的深入学习语音合成系统的DeepFake音频数据集上进行的。结果表明，LFCC功能相对适合系统指纹检测。此外，RESNET在基于LCNN和X-Vector模型中获得了最佳检测结果。T-SNE可视化表明，不同的语音合成系统会生成不同的系统指纹。

translated by 谷歌翻译

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

Xinrui Yan , Jiangyan Yi , Jianhua Tao , Chenglong Wang , Haoxin Ma , Tao Wang , Shiming Wang , Ruibo Fu

分类：人工智能

2022-08-20

已经进行了许多有效的尝试来进行虚假的音频检测。但是，他们只能提供检测结果，但没有对抗这种伤害的对策。对于许多相关的实际应用，也需要哪种模型或算法生成假音频。因此，我们提出了一个新问题，用于检测虚假音频的Vocoder指纹。实验是在由八个最先进的歌手合成的数据集上进行的。我们已经初步探索了功能和模型体系结构。T-SNE可视化表明，不同的Vocoder会生成不同的Vocoder指纹。

translated by 谷歌翻译

Fully Automated End-to-End Fake Audio Detection

Chenglong Wang , Jiangyan Yi , Jianhua Tao , Haiyang Sun , Xun Chen , Zhengkun Tian , Haoxin Ma , Cunhang Fan , Ruibo Fu

分类：人工智能

2022-08-20

现有的假音频检测系统通常依靠专家经验来设计声学功能或手动设计网络结构的超参数。但是，人工调整参数可能会对结果产生相对明显的影响。几乎不可能手动设置最佳参数集。因此，本文提出了一种完全自动化的终端伪造音频检测方法。我们首先使用WAV2VEC预训练模型来获得语音的高级表示。此外，对于网络结构，我们使用了名为Light-Darts的可区分体系结构搜索（飞镖）的修改版本。它学习了深厚的语音表示，同时自动学习和优化包括卷积操作和残留块组成的复杂神经结构。 ASVSPOOF 2019 LA数据集的实验结果表明，我们提出的系统达到的错误率（EER）为1.08％，这表现优于最先进的单个系统。

translated by 谷歌翻译

A Piecewise Monotonic Gait Phase Estimation Model for Controlling a Powered Transfemoral Prosthesis in Various Locomotion Modes

Xinxing Chen , Chuheng Chen , Yuxuan Wang , Bowen Yang , Teng Ma , Yuquan Leng , Chenglong Fu

分类：机器人

2022-07-25

基于步态阶段的控制是步行AID机器人的热门研究主题，尤其是机器人下限假体。步态阶段估计是基于步态阶段控制的挑战。先前的研究使用了人类大腿角的整合或差异来估计步态阶段，但是累积的测量误差和噪声可能会影响估计结果。在本文中，提出了一种更健壮的步态相估计方法，使用各种运动模式的分段单调步态相位大角模型的统一形式。步态相仅根据大腿角度估算，这是一个稳定的变量，避免了相位漂移。基于卡尔曼滤波器的平滑液旨在进一步抑制估计步态阶段的突变。基于提出的步态相估计方法，基于步态阶段的关节角跟踪控制器是为跨股骨假体设计的。提出的步态估计方法，步态相和控制器通过在各种运动模式下的步行数据进行离线分析来评估。基于步态阶段的控制器的实时性能在经际假体的实验中得到了验证。

translated by 谷歌翻译

Vocabulary-informed Zero-shot and Open-set Learning

Yanwei Fu , Xiaomei Wang , Hanze Dong , Yu-Gang Jiang , Meng Wang , Xiangyang Xue , Leonid Sigal

分类：计算机视觉 | 机器学习

2023-01-03

Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.

translated by 谷歌翻译

Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels

Yikai Wang , Yanwei Fu , Xinwei Sun

分类：机器学习 | 计算机视觉

2023-01-02

A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.

translated by 谷歌翻译

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

Ge Zhang , Yizhi Li , Yaoyao Wu , Linyuan Zhang , Chenghua Lin , Jiayi Geng , Shi Wang , Jie Fu

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-01

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

translated by 谷歌翻译

EvidenceCap: Towards trustworthy medical image segmentation via evidential identity cap

Ke Zou , Xuedong Yuan , Xiaojing Shen , Yidi Chen , Meng Wang , Rick Siow Mong Goh , Yong Liu , Huazhu Fu

分类：计算机视觉

2023-01-01

Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.

translated by 谷歌翻译

Self-organization Preserved Graph Structure Learning with Principle of Relevant Information

Qingyun Sun , Jianxin Li , Beining Yang , Xingcheng Fu , Hao Peng , Philip S. Yu

分类：机器学习 | 人工智能

2022-12-30

Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.

translated by 谷歌翻译