In terms of artificial intelligence, there are several security and privacy deficiencies in the traditional centralized training methods of machine learning models by a server. To address this limitation, federated learning (FL) has been proposed and is known for breaking down ``data silos" and protecting the privacy of users. However, FL has not yet gained popularity in the industry, mainly due to its security, privacy, and high cost of communication. For the purpose of advancing the research in this field, building a robust FL system, and realizing the wide application of FL, this paper sorts out the possible attacks and corresponding defenses of the current FL system systematically. Firstly, this paper briefly introduces the basic workflow of FL and related knowledge of attacks and defenses. It reviews a great deal of research about privacy theft and malicious attacks that have been studied in recent years. Most importantly, in view of the current three classification criteria, namely the three stages of machine learning, the three different roles in federated learning, and the CIA (Confidentiality, Integrity, and Availability) guidelines on privacy protection, we divide attack approaches into two categories according to the training stage and the prediction stage in machine learning. Furthermore, we also identify the CIA property violated for each attack method and potential attack role. Various defense mechanisms are then analyzed separately from the level of privacy and security. Finally, we summarize the possible challenges in the application of FL from the aspect of attacks and defenses and discuss the future development direction of FL systems. In this way, the designed FL system has the ability to resist different attacks and is more secure and stable.
translated by 谷歌翻译
对比模式挖掘(CPM)是数据挖掘的重要且流行的子场。传统的顺序模式无法描述不同类别数据之间的对比度信息,而涉及对比概念的对比模式可以描述不同对比条件下数据集之间的显着差异。根据该领域发表的论文数量,我们发现研究人员对CPM的兴趣仍然活跃。由于CPM有许多研究问题和研究方法。该领域的新研究人员很难在短时间内了解该领域的一般状况。因此,本文的目的是为对比模式挖掘的研究方向提供最新的全面概述。首先,我们对CPM提出了深入的理解,包括评估歧视能力的基本概念,类型,采矿策略和指标。然后,我们根据CPM方法根据其特征分类为基于边界的算法,基于树的算法,基于进化模糊的系统算法,基于决策树的算法和其他算法。此外,我们列出了这些方法的经典算法,并讨论它们的优势和缺点。提出了CPM中的高级主题。最后,我们通过讨论该领域的挑战和机遇来结束调查。
translated by 谷歌翻译
人类视频运动转移(HVMT)的目的是鉴于源头的形象,生成了模仿驾驶人员运动的视频。 HVMT的现有方法主要利用生成对抗网络(GAN),以根据根据源人员图像和每个驾驶视频框架估计的流量来执行翘曲操作。但是,由于源头,量表和驾驶人员之间的巨大差异,这些方法始终会产生明显的人工制品。为了克服这些挑战,本文提出了基于gan的新型人类运动转移(远程移动)框架。为了产生逼真的动作,远遥采用了渐进的一代范式:它首先在没有基于流动的翘曲的情况下生成每个身体的零件,然后将所有零件变成驾驶运动的完整人。此外,为了保留自然的全球外观,我们设计了一个全球对齐模块,以根据其布局与驾驶员的规模和位置保持一致。此外,我们提出了一个纹理对准模块,以使人的每个部分都根据纹理的相似性对齐。最后,通过广泛的定量和定性实验,我们的远及以两个公共基准取得了最先进的结果。
translated by 谷歌翻译
由于其广泛的应用,例如自动驾驶,机器人技术等,认识到Point Cloud视频的人类行为引起了学术界和行业的极大关注。但是,当前的点云动作识别方法通常需要大量的数据,其中具有手动注释和具有较高计算成本的复杂骨干网络,这使得对现实世界应用程序不切实际。因此,本文考虑了半监督点云动作识别的任务。我们提出了一个蒙版的伪标记自动编码器(\ textbf {Maple})框架,以学习有效表示,以较少的注释以供点云动作识别。特别是,我们设计了一个新颖有效的\ textbf {de}耦合\ textbf {s} patial- \ textbf {t} emporal trans \ textbf {pert}(\ textbf {destbrof {destformer})作为maple的backbone。在Destformer中,4D点云视频的空间和时间维度被脱钩,以实现有效的自我注意,以学习长期和短期特征。此外,要从更少的注释中学习判别功能,我们设计了一个蒙版的伪标记自动编码器结构,以指导Destformer从可用框架中重建蒙面帧的功能。更重要的是,对于未标记的数据,我们从分类头中利用伪标签作为从蒙版框架重建功能的监督信号。最后,全面的实验表明,枫树在三个公共基准上取得了优异的结果,并且在MSR-ACTION3D数据集上以8.08 \%的精度优于最先进的方法。
translated by 谷歌翻译
对于应用智能,公用事业驱动的模式发现算法可以识别数据库中有见地和有用的模式。但是,在这些用于模式发现的技术中,模式的数量可能很大,并且用户通常只对其中一些模式感兴趣。因此,有针对性的高实数项目集挖掘已成为一个关键的研究主题,其目的是找到符合目标模式约束而不是所有模式的模式的子集。这是一项具有挑战性的任务,因为在非常大的搜索空间中有效找到量身定制的模式需要有针对性的采矿算法。已经提出了一种称为Targetum的第一种算法,该算法采用了类似于使用树结构进行后处理的方法,但是在许多情况下,运行时间和内存消耗都不令人满意。在本文中,我们通过提出一种带有模式匹配机制的新型基于列表的算法(名为Thuim(有针对性的高实用项目集挖掘))来解决此问题,该机制可以在挖掘过程中迅速匹配高实用项,以选择目标模式。在不同的数据集上进行了广泛的实验,以将所提出算法的性能与最新算法进行比较。结果表明,THUIM在运行时和内存消耗方面表现良好,并且与Targetum相比具有良好的可扩展性。
translated by 谷歌翻译
如今,用于行业4.0和物联网(IoT)的智能系统的环境正在经历快速的工业升级。开发了设计制造,事件检测和分类等大数据技术,以帮助制造组织实现智能系统。通过应用数据分析,可以最大化富数据的潜在值,从而帮助制造组织完成另一轮升级。在本文中,我们针对大数据分析提出了两种新算法,即UFC $ _ {gen} $和UFC $ _ {fast} $。两种算法旨在收集三种类型的模式,以帮助人们确定不同产品组合的市场位置。我们将这些算法在各种类型的数据集上进行比较,包括真实和合成。实验结果表明,这两种算法都可以通过基于用户指定的实用程序和频率阈值来利用所有候选模式的三种不同类型的有趣模式来成功实现模式分类。此外,就执行时间和内存消耗而言,基于列表的UFC $ _ {fast} $算法优于基于级别的UFC $ _ {gen} $算法。
translated by 谷歌翻译
视频的行动识别,即将视频分类为预定义的动作类型之一,一直是人工智能,多媒体和信号处理社区中的一个流行话题。但是,现有方法通常考虑一个整体上的输入视频并学习模型,例如卷积神经网络(CNNS),并带有粗糙的视频级别类标签。这些方法只能为视频输出一个动作类,但不能提供可解释的线索来回答为什么视频显示特定的动作。因此,研究人员开始专注于一项新任务,部分级别的动作解析(PAP),该作用不仅旨在预测视频级别的动作,而且还要认识到每个人的框架级别的细粒度的动作或身体部位的相互作用在视频中。为此,我们为这项具有挑战性的任务提出了一个粗到精细的框架。特别是,我们的框架首先预测输入视频的视频级别类别,然后将身体部位定位并预测零件级别的动作。此外,为了平衡部分级别的动作解析的准确性和计算,我们建议通过段级特征识别零件级的操作。此外,为了克服身体部位的歧义,我们提出了一种姿势引导的位置嵌入方法来准确地定位身体部位。通过在大规模数据集(即动力学TPS)上进行的全面实验,我们的框架可以实现最先进的性能,并且超过31.10%的ROC得分的现有方法。
translated by 谷歌翻译
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
translated by 谷歌翻译
Deep metric learning aims to learn an embedding space, where semantically similar samples are close together and dissimilar ones are repelled against. To explore more hard and informative training signals for augmentation and generalization, recent methods focus on generating synthetic samples to boost metric learning losses. However, these methods just use the deterministic and class-independent generations (e.g., simple linear interpolation), which only can cover the limited part of distribution spaces around original samples. They have overlooked the wide characteristic changes of different classes and can not model abundant intra-class variations for generations. Therefore, generated samples not only lack rich semantics within the certain class, but also might be noisy signals to disturb training. In this paper, we propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining and boost metric learning losses. Further, for most datasets that have a few samples within the class, we propose the neighbor correction to revise the inaccurate estimations, according to our correlation discovery where similar classes generally have similar variation distributions. Extensive experiments on five benchmarks show our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%. Our code is available at https://github.com/darkpromise98/IAA
translated by 谷歌翻译
Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers.
translated by 谷歌翻译