为了同时朝着对多个下游任务的整体理解,需要提取具有更好可传递性的功能。尽管许多最新的自我监管的预训练方法在普遍的预处理前范式下在各种视觉任务上取得了令人印象深刻的表现,但它们对多任务学习方案的概括能力尚待探索。在本文中,我们在三个下游任务上进行了广泛研究各种类型的自我监督方法的转移性能,例如Moco和Simc​​lr,包括语义细分,可驱动的区域细分和交通对象检测,在大规模驾驶数据集中BDD100K。我们出人意料地发现,他们的表现是最佳的甚至落后于单任务基线的滞后,这可能是由于训练目标和建筑设计的区别在于预处理范式。为了克服这一难题,并避免重新设计资源密集的预培训阶段,我们提出了一种简单而有效的预处理 - 适应性 - 赛范围,用于一般的多任务培训,可以有效地适应现行预审预周态的模型没有增加培训开销。在自适应阶段,我们利用可学习的多尺度适配器来动态调整由多任务目标监督的预验证的模型权重,同时使经过预告片的知识未经触及。此外,我们将视觉语言预训练模型剪辑视为对预处理 - 适应 - 最终范式的强烈补充,并提出了一个名为LV-Adapter的新型适配器,该适配器通过任务特定的提示将语言先验纳入了多任务的模型中和视觉和文本特征之间的对齐。
translated by 谷歌翻译
超声(US)成像通常用于协助诊断和脊柱疾病的干预,而通过手动操作探针进行标准化美国收购需要大量的经验和超声检查的培训。在这项工作中,我们提出了一种新的双代理框架,集成了强化学习(RL)代理和深度学习(DL)代理,以共同确定基于实时超声图像美国探测器的移动,以模拟专家超声检查操作者的决策过程,以实现脊柱超声自主标准视图收购。此外,通过美国传播的性质和脊柱解剖的特性的启发,我们引入一个视图特定的声影奖励利用阴影信息来隐式地引导朝向脊柱的不同标准视图探针的导航。我们的方法在从$ $ 17名志愿者获得的美国经济数据建立了一个模拟环境的定量和定性实验验证。平均导航精度朝向不同的标准视图达到$5.18毫米/ 5.25 ^ \ CIRC $ $和12.87毫米/ 17.49 ^ \ CIRC $在分子内和主体间设置,分别。结果表明,我们的方法可以有效地解释美国的图像和导航探头获取脊柱多种标准的意见。
translated by 谷歌翻译
近年来,对主动无线胶囊内窥镜(WCE)的同时磁力驱动和定位(SMAL)进行了集中研究,以提高检查的效率和准确性。在本文中,我们提出了一种用于主动WCE的自主磁导航框架,其模仿常规结肠镜检查的专家医师的“插入”和“提取”程序,从而使机器人胶囊内窥镜在肠道中有效和准确地进行了最小的用户努力。首先,胶囊通过未知的肠道环境自动推进,并产生一种代表环境的可行路径。然后,胶囊被自主地驶向肠道轨迹上选择的任何点,以便准确和反复检查可疑病变。此外,我们在加入高级Smal算法的机器人系统上实现了导航框架,并在使用幽灵和前体内猪结肠中验证各种管状环境的导航中。我们的结果表明,拟议的自主导航框架可以有效地在未知,复杂的管状环境中导航胶囊,其与手动操作相比具有令人满意的精度,重复性和效率。
translated by 谷歌翻译
目前使用的无线胶囊内窥镜检查(WCE)是在检查时间和柔韧性方面有限的,因为胶囊被蠕动被动地移动,并且不能精确定位。已经提出了基于同时磁力驱动和定位技术的WCE的有效运动来促进不同的方法。在这项工作中,我们研究了在管状环境中旋转磁性致动下的机器人胶囊问题的轨迹,以实现使用无线胶囊内窥镜在给定点对肠道的安全,高效准确地检查肠道。具体而言,基于PD控制器,自适应控制器,模型预测控制器和鲁棒的多级模型预测控制器,开发了四种轨迹之后的策略。此外,我们的方法通过在控制器设计期间模拟肠蠕动和摩擦来考虑肠环境中的不确定性。我们验证了我们在仿真中的方法以及在各种管状环境中的实际实验中,包括具有不同形状和前体内猪结肠的塑料幽灵。结果表明,我们的方法可以有效地致动往复旋转的胶囊,以遵循复杂的管状环境中的所需轨迹,从而具有能够对高质量诊断进行准确和可重复检查的肠道。
translated by 谷歌翻译
视觉变压器在许多视觉任务中都达到了最新的性能。由于自我注意的二次计算和记忆复杂性,最近的作品要么仅将注意力应用于低分辨率输入,要么将接受场限制在小地方区域。为了克服这些局限性,我们提出了仅关键的关注,该关注不包括查询键的成对相互作用,并使用计算有效的显着性门来获得注意力重量,从而在所有阶段进行局部全球相互作用进行建模。仅密钥注意力具有线性计算和内存复杂性W.R.T输入大小。我们使用替代布局来杂交卷积和注意力层,而不是以前的作品所建议的嫁接,以便所有阶段都可以从空间的注意力和卷积中受益。我们利用这些改进来开发一个新的自我注意模型家族Linglos,该家族达到了最新的Imagenet分类基准的参数限制设置的精确度,并且在下游任务中极大地超过了基线,例如COCO对象检测和ADE20K语义细分。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译