最近,注意机制已成功应用于基于神经网络的说话者验证系统。将挤压和兴奋的块纳入卷积神经网络中的表现出色。但是,它使用全球平均池(GAP)简单地沿时间和频率维度平均功能,这无法在功能地图中保留足够的扬声器信息。在这项研究中,我们表明GAP是时间频域在数学上仅使用频率分解中最低频率分量的特殊情况。为了增强扬声器信息提取能力,我们建议利用多频信息,并设计两个新颖的有效注意模块,称为单频率单通道(SFSC)注意模块和多频单通道(MFSC)注意模块。提出的注意模块可以根据DCT有效地从多个频率组件中捕获更多扬声器信息。我们在Voxceleb数据集上进行了全面的实验,并对第148个UTD法医语料库进行了探测评估。实验结果表明,我们提出的SFSC和MFSC注意模块可以有效地产生更具歧视性的扬声器表示,并且优于RESNET34-SE和ECAPA-TDNN系统,而EER降低了20.9%和20.2%,而无需添加额外的网络参数。
translated by 谷歌翻译
没有发言者标签的培训扬声器 - 识别和强大的发言者验证系统仍然挑战和值得探索。在这项研究中,我们提出了一种有效的自我监督的学习框架和一种新的正规化策略,以促进自我监督的发言者代表学习。不同于基于对比的自我监督的学习方法,所提出的自我监督正则化(SSREG)专注于正数据对潜在的潜在表示之间的相似性。我们还探讨了替代在线数据增强策略对时域和频域的有效性。凭借我们强大的在线数据增强策略,所提出的SSREG显示了自我监督学习的潜力,而不使用负对对,它可以显着提高自我监督扬声器表示学习与简单的暹罗网络架构的表现。 VOXECEB数据集的综合实验表明,我们提出的自我监督方法通过增加有效的自我监督正则化和胜过其他以前的作品来获得23.4%的相对改善。
translated by 谷歌翻译
There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. UEs typically are generated via a bilevel optimization framework with a surrogate model to remove (minimize) errors from the original samples, and then applied to protect the data against unknown target models. However, existing UE generation methods all rely on an ideal assumption called label-consistency, where the hackers and protectors are assumed to hold the same label for a given sample. In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. E.g., a m-class unlearnable dataset held by the protector may be exploited by the hacker as a n-class dataset. Existing UE generation methods are rendered ineffective in this challenging setting. To tackle this challenge, we present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations. Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains. We empirically verify the effectiveness of our proposed approach under a variety of settings with different datasets, target models, and even commercial platforms Microsoft Azure and Baidu PaddlePaddle.
translated by 谷歌翻译
The findable, accessible, interoperable, and reusable (FAIR) data principles have provided a framework for examining, evaluating, and improving how we share data with the aim of facilitating scientific discovery. Efforts have been made to generalize these principles to research software and other digital products. Artificial intelligence (AI) models -- algorithms that have been trained on data rather than explicitly programmed -- are an important target for this because of the ever-increasing pace with which AI is transforming scientific and engineering domains. In this paper, we propose a practical definition of FAIR principles for AI models and create a FAIR AI project template that promotes adherence to these principles. We demonstrate how to implement these principles using a concrete example from experimental high energy physics: a graph neural network for identifying Higgs bosons decaying to bottom quarks. We study the robustness of these FAIR AI models and their portability across hardware architectures and software frameworks, and report new insights on the interpretability of AI predictions by studying the interplay between FAIR datasets and AI models. Enabled by publishing FAIR AI models, these studies pave the way toward reliable and automated AI-driven scientific discovery.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non-visual components (e.g. action, relations) under the mutual influence of vision and language, we first propose a visual-linguistic (VL) feature. In the proposed VL feature, the scene is modeled by three modalities including (i) a global visual environment; (ii) local visual main agents; (iii) linguistic scene elements. We then introduce an autoregressive Transformer-in-Transformer (TinT) to simultaneously capture the semantic coherence of intra- and inter-event contents within a video. Finally, we present a new VL contrastive loss function to guarantee learnt embedding features are matched with the captions semantics. Comprehensive experiments and extensive ablation studies on ActivityNet Captions and YouCookII datasets show that the proposed Visual-Linguistic Transformer-in-Transform (VLTinT) outperforms prior state-of-the-art methods on accuracy and diversity.
translated by 谷歌翻译
As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.
translated by 谷歌翻译
When reading a story, humans can rapidly understand new fictional characters with a few observations, mainly by drawing analogy to fictional and real people they met before in their lives. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., humans' theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of models' ability of meta-learning of ToM in a realistic narrative understanding scenario. Our benchmark consists of $\sim$1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task; and requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. Our human study verified that humans can solve our problem by inferring characters' mental states based on their previously seen movies; while the state-of-the-art metric-learning and meta-learning approaches adapted to our task lags 30% behind.
translated by 谷歌翻译
语义分割是开发医学图像诊断系统的重要任务。但是,构建注释的医疗数据集很昂贵。因此,在这种情况下,半监督方法很重要。在半监督学习中,标签的质量在模型性能中起着至关重要的作用。在这项工作中,我们提出了一种新的伪标签策略,可提高用于培训学生网络的伪标签的质量。我们遵循多阶段的半监督训练方法,该方法在标记的数据集上训练教师模型,然后使用训练有素的老师将伪标签渲染用于学生培训。通过这样做,伪标签将被更新,并且随着培训的进度更加精确。上一个和我们的方法之间的关键区别在于,我们在学生培训过程中更新教师模型。因此,在学生培训过程中,提高了伪标签的质量。我们还提出了一种简单但有效的策略,以使用动量模型来提高伪标签的质量 - 训练过程中原始模型的慢复制版本。通过应用动量模型与学生培训期间的重新渲染伪标签相结合,我们在五个数据集中平均达到了84.1%的骰子分数(即Kvarsir,CVC-ClinicdB,Etis-laribpolypdb,cvc-colondb,cvc-colondb,cvc-colondb和cvc-300)和CVC-300)只有20%的数据集用作标记数据。我们的结果超过了3%的共同实践,甚至在某些数据集中取得了完全监督的结果。我们的源代码和预培训模型可在https://github.com/sun-asterisk-research/online学习SSL上找到
translated by 谷歌翻译
在拒绝的环境中进行搜索对于群体机器人来说是具有挑战性的,因为不允许GNSS,映射,数据共享和中央处理的帮助。但是,使用嗅觉和听觉像动物一样合作可能是改善群体合作的重要方法。在本文中,提出了一群自主机器人来探索拒绝环境的嗅觉审计算法算法(OA-BUG)。构建了一个模拟环境,以衡量OA-BUG的性能。使用OA-BUG的搜索任务覆盖范围可以达到96.93%,与类似的算法SGBA相比,最大的40.55%提高了40.55%。此外,在实际的群机器人上进行了实验,以证明OA-BUG的有效性。结果表明,OA-BUG可以在被拒绝的环境中改善群体机器人的性能。
translated by 谷歌翻译