translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful nonlinear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC. * Corresponding author (a) Original frame (Bpp/MS-SSIM) (b) H.264 (0.0540Bpp/0.945) (c) H.265 (0.082Bpp/0.960) (d) Ours ( 0.0529Bpp/ 0.961
translated by 谷歌翻译
Face forgery detection plays an important role in personal privacy and social security. With the development of adversarial generative models, high-quality forgery images become more and more indistinguishable from real to humans. Existing methods always regard as forgery detection task as the common binary or multi-label classification, and ignore exploring diverse multi-modality forgery image types, e.g. visible light spectrum and near-infrared scenarios. In this paper, we propose a novel Hierarchical Forgery Classifier for Multi-modality Face Forgery Detection (HFC-MFFD), which could effectively learn robust patches-based hybrid domain representation to enhance forgery authentication in multiple-modality scenarios. The local spatial hybrid domain feature module is designed to explore strong discriminative forgery clues both in the image and frequency domain in local distinct face regions. Furthermore, the specific hierarchical face forgery classifier is proposed to alleviate the class imbalance problem and further boost detection performance. Experimental results on representative multi-modality face forgery datasets demonstrate the superior performance of the proposed HFC-MFFD compared with state-of-the-art algorithms. The source code and models are publicly available at https://github.com/EdWhites/HFC-MFFD.
translated by 谷歌翻译
Network traffic classification is the basis of many network security applications and has attracted enough attention in the field of cyberspace security. Existing network traffic classification based on convolutional neural networks (CNNs) often emphasizes local patterns of traffic data while ignoring global information associations. In this paper, we propose a MLP-Mixer based multi-view multi-label neural network for network traffic classification. Compared with the existing CNN-based methods, our method adopts the MLP-Mixer structure, which is more in line with the structure of the packet than the conventional convolution operation. In our method, the packet is divided into the packet header and the packet body, together with the flow features of the packet as input from different views. We utilize a multi-label setting to learn different scenarios simultaneously to improve the classification performance by exploiting the correlations between different scenarios. Taking advantage of the above characteristics, we propose an end-to-end network traffic classification method. We conduct experiments on three public datasets, and the experimental results show that our method can achieve superior performance.
translated by 谷歌翻译
translated by 谷歌翻译
提供质量恒定流可以同时保证用户体验并防止浪费位率。在本文中,我们提出了一种基于深度学习的新型两通编码器参数预测框架来决定速率因子(RF),编码器可以通过恒定质量输出流。对于视频中的每个单发段,提出的方法首先通过超快速预处理提取空间,时间和预编码功能。基于这些功能,深度神经网络预测了RF参数。视频编码器使用RF作为第一个编码通过来压缩段。然后测量第一个通过编码的VMAF质量。如果质量不符合目标,将执行第二通过的RF预测和编码。借助第一次通过预测的RF和相应的实际质量作为反馈,第二次通过预测将非常准确。实验表明,所提出的方法仅需要平均编码复杂性的1.55倍,同时准确性,压缩视频的实际VMAF在目标VMAF附近的$ \ pm1 $之内,达到98.88%。
translated by 谷歌翻译
面部属性评估在视频监视和面部分析中起着重要作用。尽管基于卷积神经网络的方法取得了长足的进步,但它们不可避免地一次仅与一个当地社区打交道。此外,现有方法主要将面部属性评估视为单个多标签分类任务,而忽略了语义属性和面部身份信息之间的固有关系。在本文中,我们提出了一个小说\ textbf {trans} \ textbf {f} ace \ textbf {a} ttribute评估方法(\ textbf {transfa})的基于\ textbf {f} ace \ textbf {a}的表示,可以有效地增强属性的差异性表示。注意机制的背景。多个分支变压器用于探索类似语义区域中不同属性之间的相互关系以进行属性特征学习。特别是,层次标识构成属性损失旨在训练端到端体系结构,这可以进一步整合面部身份判别信息以提高性能。多个面部属性基准的实验结果表明,与最新方法相比,所提出的Transfa取得了出色的性能。
translated by 谷歌翻译
近年来,随着面部编辑和发电的迅速发展,越来越多的虚假视频正在社交媒体上流传,这引起了极端公众的关注。基于频域的现有面部伪造方法发现,与真实图像相比,GAN锻造图像在频谱中具有明显的网格视觉伪像。但是对于综合视频,这些方法仅局限于单个帧,几乎不关注不同框架之间最歧视的部分和时间频率线索。为了充分利用视频序列中丰富的信息,本文对空间和时间频域进行了视频伪造检测,并提出了一个离散的基于余弦转换的伪造线索增强网络(FCAN-DCT),以实现更全面的时空功能表示。 FCAN-DCT由一个骨干网络和两个分支组成:紧凑特征提取(CFE)模块和频率时间注意(FTA)模块。我们对两个可见光(VIS)数据集Wilddeepfake和Celeb-DF(V2)进行了彻底的实验评估,以及我们的自我构建的视频伪造数据集DeepFakenir,这是第一个近境模式的视频伪造数据集。实验结果证明了我们方法在VIS和NIR场景中检测伪造视频的有效性。
translated by 谷歌翻译