Domain adaptation enables the learner to safely generalize into novel environments by mitigating domain shifts across distributions. Previous works may not effectively uncover the underlying reasons that would lead to the drastic model degradation on the target task. In this paper, we empirically reveal that the erratic discrimination of the target domain mainly stems from its much smaller feature norms with respect to that of the source domain. To this end, we propose a novel parameter-free Adaptive Feature Norm approach. We demonstrate that progressively adapting the feature norms of the two domains to a large range of values can result in significant transfer gains, implying that those task-specific features with larger norms are more transferable. Our method successfully unifies the computation of both standard and partial domain adaptation with more robustness against the negative transfer issue. Without bells and whistles but a few lines of code, our method substantially lifts the performance on the target task and exceeds state-of-the-arts by a large margin (11.5% on Office-Home [45] and 17.1% on VisDA2017 [31]). We hope our simple yet effective approach will shed some light on the future research of transfer learning. Code is available at https://github.com/jihanyang/AFN .
translated by 谷歌翻译
大量证据表明,深神经网络(DNN)容易受到后门攻击的影响,这激发了后门检测方法的发展。现有的后门检测方法通常是针对具有单个特定类型(例如基于补丁或基于扰动)的后门攻击而定制的。但是,在实践中,对手可能会产生多种类型的后门攻击,这挑战了当前的检测策略。基于以下事实:对抗性扰动与触发模式高度相关,本文提出了自适应扰动生成(APG)框架,以通过自适应注射对抗性扰动来检测多种类型的后门攻击。由于不同的触发模式在相同的对抗扰动下显示出高度多样的行为,因此我们首先设计了全球到本地策略,以通过调整攻击的区域和预算来适应多种类型的后门触发器。为了进一步提高扰动注入的效率,我们引入了梯度引导的掩模生成策略,以寻找最佳区域以进行对抗攻击。在多个数据集(CIFAR-10,GTSRB,Tiny-Imagenet)上进行的广泛实验表明,我们的方法以大幅度优于最先进的基线(+12%)。
translated by 谷歌翻译
在过去的几年中,从面部视频中对心脏脉搏的测量已成为对研究的有趣追求。这主要是由于以非侵入性方式获得个人心率的重要性越来越重要,这对于游戏和医疗行业的应用可能非常有用。在过去的几年中,研究的另一个工具领域是深度学习的出现,并使用深度神经网络来增强任务绩效。在这项工作中,我们建议使用有效的卷积网络来准确测量低分辨率面部视频的用户心率。此外,为了确保我们能够实时获得心律,我们通过修剪深度学习模型来压缩深度学习模型,从而减少其内存足迹。我们在MAHNOB数据集上基准了方法的性能,并在多种方法中比较了其性能。
translated by 谷歌翻译
面部视频中心率的估计在医疗和健身行业中有许多应用。此外,它在游戏领域也变得有用。已经提出了几种方法,可以从面部视频中无缝获得心率,但是这些方法在处理运动和照明工件方面存在问题。在这项工作中,我们使用用户的光谱反射率提出了一个可靠的人力资源估计框架,这使运动和照明干扰变得强大。我们采用基于学习的深度框架,例如更快的RCNNS来执行面部检测,而不是先前方法使用的中提琴琼斯算法。我们在Mahnob HCI数据集上评估了我们的方法,发现所提出的方法能够超越先前的方法。从面部视频中估计心率在医疗和健身行业中有许多应用。此外,它在游戏领域也变得有用。已经提出了几种方法,可以从面部视频中无缝获得心率,但是这些方法在处理运动和照明工件方面存在问题。在这项工作中,我们使用用户的光谱反射率提出了一个可靠的人力资源估计框架,这使运动和照明干扰变得强大。我们采用基于学习的深度框架,例如更快的RCNNS来执行面部检测,而不是先前方法使用的中提琴算法。我们在MAHNOB HCI数据集上评估了我们的方法,发现所提出的方法能够超过以前的方法。
translated by 谷歌翻译
自动摘要方法是有效的,但可能患有低质量。相比之下,手动摘要很昂贵,但质量更高。人类和人工智能可以协作以提高总结性能吗?在类似的文本生成任务(例如机器翻译)中,人类AI合作的形式是“后编辑” AI生成的文本,可减少人类的工作量并提高AI输出的质量。因此,我们探讨了邮政编辑是否提供文本摘要中的优势。具体来说,我们对72名参与者进行了实验,将提供的后编辑摘要与手动摘要进行了摘要,以摘要质量,人为效率和用户在正式新闻(XSUM新闻)和非正式(REDDIT帖子)文本方面进行了比较。这项研究对何时编辑的文本摘要提供了宝贵的见解:在某些情况下(例如,何时参与者缺乏领域知识),但在其他情况下却没有帮助(例如,何时提供的摘要包括不准确的信息)。参与者的不同编辑策略和援助需求为未来的人类摘要系统提供了影响。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译