Previous databases have been designed to further the development of fake audio detection. However, fake utterances are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audios. They ignore a fake situation, in which the attacker manipulates an acoustic scene of the original audio with another forgery one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper designs such a dataset for scene fake audio detection (SceneFake). A manipulated audio in the SceneFake dataset involves only tampering the acoustic scene of an utterance by using speech enhancement technologies. We can not only detect fake utterances on a seen test set but also evaluate the generalization of fake detection models to unseen manipulation attacks. Some benchmark results are described on the SceneFake dataset. Besides, an analysis of fake attacks with different speech enhancement technologies and signal-to-noise ratios are presented on the dataset. The results show that scene manipulated utterances can not be detected reliably by the existing baseline models of ASVspoof 2019. Furthermore, the detection of unseen scene manipulation audio is still challenging.
translated by 谷歌翻译
进行了许多有效的尝试进行了DeepFake音频检测。但是,他们只能区分真实和假货。对于许多实际的应用程序方案,还需要哪种工具或算法生成DeepFake音频。这提出了一个问题:我们可以检测到DeepFake音频的系统指纹吗?因此,本文进行了初步研究,以检测DeepFake音频的系统指纹。实验是从五个最新的深入学习语音合成系统的DeepFake音频数据集上进行的。结果表明,LFCC功能相对适合系统指纹检测。此外,RESNET在基于LCNN和X-Vector模型中获得了最佳检测结果。T-SNE可视化表明,不同的语音合成系统会生成不同的系统指纹。
translated by 谷歌翻译
已经进行了许多有效的尝试来进行虚假的音频检测。但是,他们只能提供检测结果,但没有对抗这种伤害的对策。对于许多相关的实际应用,也需要哪种模型或算法生成假音频。因此,我们提出了一个新问题,用于检测虚假音频的Vocoder指纹。实验是在由八个最先进的歌手合成的数据集上进行的。我们已经初步探索了功能和模型体系结构。T-SNE可视化表明,不同的Vocoder会生成不同的Vocoder指纹。
translated by 谷歌翻译
现有的假音频检测系统通常依靠专家经验来设计声学功能或手动设计网络结构的超参数。但是,人工调整参数可能会对结果产生相对明显的影响。几乎不可能手动设置最佳参数集。因此,本文提出了一种完全自动化的终端伪造音频检测方法。我们首先使用WAV2VEC预训练模型来获得语音的高级表示。此外,对于网络结构,我们使用了名为Light-Darts的可区分体系结构搜索(飞镖)的修改版本。它学习了深厚的语音表示,同时自动学习和优化包括卷积操作和残留块组成的复杂神经结构。 ASVSPOOF 2019 LA数据集的实验结果表明,我们提出的系统达到的错误率(EER)为1.08%,这表现优于最先进的单个系统。
translated by 谷歌翻译
基于步态阶段的控制是步行AID机器人的热门研究主题,尤其是机器人下限假体。步态阶段估计是基于步态阶段控制的挑战。先前的研究使用了人类大腿角的整合或差异来估计步态阶段,但是累积的测量误差和噪声可能会影响估计结果。在本文中,提出了一种更健壮的步态相估计方法,使用各种运动模式的分段单调步态相位大角模型的统一形式。步态相仅根据大腿角度估算,这是一个稳定的变量,避免了相位漂移。基于卡尔曼滤波器的平滑液旨在进一步抑制估计步态阶段的突变。基于提出的步态相估计方法,基于步态阶段的关节角跟踪控制器是为跨股骨假体设计的。提出的步态估计方法,步态相和控制器通过在各种运动模式下的步行数据进行离线分析来评估。基于步态阶段的控制器的实时性能在经际假体的实验中得到了验证。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译