Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions. However, most of them deform or generate the whole facial area, leading to non-realistic results. In this work, we delve into the formulation of altering only the mouth shapes of the target person. This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames. To this end, we propose the Audio-Visual Context-Aware Transformer (AV-CAT) framework, which produces accurate lip-sync with photo-realistic quality by predicting the masked mouth shapes. Our key insight is to exploit desired contextual information provided in audio and visual modalities thoroughly with delicately designed Transformers. Specifically, we propose a convolution-Transformer hybrid backbone and design an attention-based fusion strategy for filling the masked parts. It uniformly attends to the textural information on the unmasked regions and the reference frame. Then the semantic audio information is involved in enhancing the self-attention computation. Additionally, a refinement network with audio injection improves both image and lip-sync quality. Extensive experiments validate that our model can generate high-fidelity lip-synced results for arbitrary subjects.
translated by 谷歌翻译
鉴于其广泛的应用,已经对人面部交换的任务进行了许多尝试。尽管现有的方法主要依赖于乏味的网络和损失设计,但它们仍然在源和目标面之间的信息平衡中挣扎,并倾向于产生可见的人工制品。在这项工作中,我们引入了一个名为StylesWap的简洁有效的框架。我们的核心想法是利用基于样式的生成器来增强高保真性和稳健的面部交换,因此可以采用发电机的优势来优化身份相似性。我们仅通过最小的修改来确定,StyleGAN2体系结构可以成功地处理来自源和目标的所需信息。此外,受到TORGB层的启发,进一步设计了交换驱动的面具分支以改善信息的融合。此外,可以采用stylegan倒置的优势。特别是,提出了交换引导的ID反转策略来优化身份相似性。广泛的实验验证了我们的框架会产生高质量的面部交换结果,从而超过了最先进的方法,既有定性和定量。
translated by 谷歌翻译
面部伪造技术的最新进展几乎可以产生视觉上无法追踪的深冰录视频,这些视频可以通过恶意意图来利用。结果,研究人员致力于深泡检测。先前的研究已经确定了局部低级提示和时间信息在追求跨层次方法中概括的重要性,但是,它们仍然遭受鲁棒性问题的影响。在这项工作中,我们提出了基于本地和时间感知的变压器的DeepFake检测(LTTD)框架,该框架采用了局部到全球学习协议,特别关注本地序列中有价值的时间信息。具体而言,我们提出了一个局部序列变压器(LST),该局部序列变压器(LST)对限制空间区域的序列进行了时间一致性,其中低级信息通过学习的3D滤波器的浅层层增强。基于局部时间嵌入,我们然后以全球对比的方式实现最终分类。对流行数据集进行的广泛实验验证了我们的方法有效地发现了本地伪造线索并实现最先进的表现。
translated by 谷歌翻译
旨在生成新的字体的几个示例字体(FFG),由于劳动力成本的显着降低,它引起了人们的关注。典型的FFG管道将标准字体库中的字符视为内容字形,并通过从参考字形中提取样式信息将其转移到新的目标字体中。大多数现有的解决方案明确地删除了全球或组件的参考字形的内容和参考字形的样式。但是,字形的风格主要在于当地细节,即激进,组件和笔触的风格一起描绘了雕文的样式。因此,即使是单个字符也可以包含在空间位置分布的不同样式。在本文中,我们通过学习提出了一种新的字体生成方法1)参考文献中的细粒度局部样式,以及2)内容和参考文字之间的空间对应关系。因此,内容字形中的每个空间位置都可以使用正确的细粒样式分配。为此,我们对内容字形的表示作为查询和参考字形表示作为键和值的跨注意。交叉注意机制无需明确地删除全球或组件建模,而是可以在参考文字中遵循正确的本地样式,并将参考样式汇总为给定内容字形的精细粒度样式表示。实验表明,所提出的方法的表现优于FFG中最新方法。特别是,用户研究还证明了我们方法的样式一致性显着优于以前的方法。
translated by 谷歌翻译
先进的面部交换方法取得了吸引力的结果。但是,这些方法中的大多数具有许多参数和计算,这使得在实时应用程序中应用它们或在移动电话等边缘设备上部署它们的挑战。在这项工作中,通过根据身份信息动态调整模型参数,提出了一种用于主目不可知的人的动态网络(IDN),用于通过动态调整模型参数。特别地,我们通过引入两个动态神经网络技术来设计高效的标识注入模块(IIM),包括权重预测和权重调制。更新IDN后,可以应用于给定任何目标图像或视频的交换面。所呈现的IDN仅包含0.50米的参数,每个框架需要0.33g拖鞋,使其能够在移动电话上运行实时视频面。此外,我们介绍了一种基于知识的蒸馏的方法,用于稳定训练,并且使用损耗重量模块来获得更好的合成结果。最后,我们的方法通过教师模型和其他最先进的方法实现了可比的结果。
translated by 谷歌翻译
遮挡对人重新识别(Reid)构成了重大挑战。现有方法通常依赖于外部工具来推断可见的身体部位,这在计算效率和Reid精度方面可能是次优。特别是,在面对复杂的闭塞时,它们可能会失败,例如行人之间的遮挡。因此,在本文中,我们提出了一种名为M质量感知部分模型(QPM)的新方法,用于遮挡鲁棒Reid。首先,我们建议共同学习零件特征和预测部分质量分数。由于没有提供质量注释,我们介绍了一种自动将低分分配给闭塞体部位的策略,从而削弱了遮挡体零落在Reid结果上的影响。其次,基于预测部分质量分数,我们提出了一种新颖的身份感知空间关注(ISA)模块。在该模块中,利用粗略标识感知功能来突出目标行人的像素,以便处理行人之间的遮挡。第三,我们设计了一种自适应和有效的方法,用于了解来自每个图像对的共同非遮挡区域的全局特征。这种设计至关重要,但经常被现有方法忽略。 QPM有三个关键优势:1)它不依赖于培训或推理阶段的任何外部工具; 2)它处理由物体和其他行人引起的闭塞; 3)它是高度计算效率。对闭塞Reid的四个流行数据库的实验结果证明QPM始终如一地以显着的利润方式优于最先进的方法。 QPM代码将被释放。
translated by 谷歌翻译
光保护综合技术的快速进展达到了真实和操纵图像之间的边界开始模糊的临界点。最近,一个由Mega-Scale Deep Face Forgery DataSet,由290万个图像组成和221,247个视频的伪造网络已被释放。它是迄今为止的数据规模,操纵(7个图像级别方法,8个视频级别方法),扰动(36个独立和更混合的扰动)和注释(630万个分类标签,290万操纵区域注释和221,247个时间伪造段标签)。本文报告了Forgerynet-Face Forgery Analysis挑战2021的方法和结果,它采用了伪造的基准。模型评估在私人测试集上执行离线。共有186名参加比赛的参与者,11名队伍提交了有效的提交。我们将分析排名排名的解决方案,并展示一些关于未来工作方向的讨论。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
translated by 谷歌翻译
Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
translated by 谷歌翻译