基于会话的新闻推荐系统将下一个新闻推荐给用户,通过建模她/他在会话中嵌入的一系列新闻读/点击的潜在兴趣。通常,用户的兴趣是多种多样的,即有多种兴趣,与不同类型的新闻相对应,例如在会话中,有关独特主题的新闻。 %对这种多重兴趣进行建模对于精确的新闻建议至关重要。但是,大多数现有方法通常忽略了如此重要的特征,因此无法区分和建模用户的潜在多重兴趣,从而阻碍了下一个新闻的准确建议。因此,本文提出了新闻推荐的多功能新闻序列(分钟)模型。在几分钟内,设计了一个基于自我注意的新闻编码器,以学习每个新闻的信息嵌入信息,然后设计出一个新颖的平行兴趣网络,以提取新闻顺序中嵌入的潜在多重兴趣,以准备下一步 - 新建议。现实世界中数据集的实验结果表明,我们的模型可以比最新的模型获得更好的性能。
translated by 谷歌翻译
面部美容预测(FBP)旨在开发一种机器,自动制作面部吸引力评估。在过去的情况下,结果与人类评分高度相关,因此也与注释的偏差相同。由于人工智能可以具有种族主义和歧视性倾向,必须识别数据中偏差的原因。培训数据的开发和对抗偏见信息具有强大的算法是科学家的新挑战。随着审美判断通常偏见,我们希望进一步迈出一步,并为FBP提出一个非偏见的卷积神经网络。虽然可以从道德角度创建可以对脸部的吸引力的网络模型,从道德的角度来看,它同样重要的是要确保模型是无偏的。在这项工作中,我们引入了美学,最先进的吸引力预测网络,这显着优于竞争对手0.9601的Pearson相关性。此外,我们提出了一种新的方法,用于产生无偏见的CNN,以改善机器学习中的公平性。
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译
Face forgery detection plays an important role in personal privacy and social security. With the development of adversarial generative models, high-quality forgery images become more and more indistinguishable from real to humans. Existing methods always regard as forgery detection task as the common binary or multi-label classification, and ignore exploring diverse multi-modality forgery image types, e.g. visible light spectrum and near-infrared scenarios. In this paper, we propose a novel Hierarchical Forgery Classifier for Multi-modality Face Forgery Detection (HFC-MFFD), which could effectively learn robust patches-based hybrid domain representation to enhance forgery authentication in multiple-modality scenarios. The local spatial hybrid domain feature module is designed to explore strong discriminative forgery clues both in the image and frequency domain in local distinct face regions. Furthermore, the specific hierarchical face forgery classifier is proposed to alleviate the class imbalance problem and further boost detection performance. Experimental results on representative multi-modality face forgery datasets demonstrate the superior performance of the proposed HFC-MFFD compared with state-of-the-art algorithms. The source code and models are publicly available at https://github.com/EdWhites/HFC-MFFD.
translated by 谷歌翻译