我们向连续状态马尔可夫决策过程(MDP)提出了一种扩散近似方法,该方法可用于解决非结构化的越野环境中的自主导航和控制。与呈现完全已知的状态转换模型的大多数决策定理计划框架相比,我们设计了一种方法,该方法消除了这种强烈假设,这些假设通常非常难以在现实中工程师。我们首先采用价值函数的二阶泰勒扩展。然后通过部分微分方程近似贝尔曼的最优性方程,其仅依赖于转换模型的第一和第二矩。通过组合价值函数的内核表示,然后设计一种有效的策略迭代算法,其策略评估步骤可以表示为特征的方程式的线性系统,其特征是由有限组支持状态。我们首先通过大量的仿真以2D美元的$ 2D $避让和2.5d $地形导航问题进行验证。结果表明,拟议的方法在几个基线上导致了卓越的性能。然后,我们开发一个系统,该系统将我们的决策框架整合,与船上感知,并在杂乱的室内和非结构化的户外环境中进行现实世界的实验。物理系统的结果进一步展示了我们在挑战现实世界环境中的方法的适用性。
translated by 谷歌翻译
我们提出了一个深层神经网络,用于从不受约束的肖像图像中删除不良阴影特征,从而恢复基础纹理。我们的培训计划纳入了三种正则化策略:蒙面损失,以强调高频阴影特征;软阴影损失,改善了对照明微妙变化的敏感性;和阴影偏移估计,以监督阴影和纹理的分离。与最先进的方法相比,我们的方法表明了质量和概括的改善。我们进一步展示了我们的愉悦方法如何增强光敏的计算机视觉任务任务(例如面部重新放置和语义解析)的性能,从而使它们能够处理极端的照明条件。
translated by 谷歌翻译
大多数现有的神经体系结构搜索(NAS)基准和算法优先考虑了良好的任务,例如CIFAR或Imagenet上的图像分类。这使得在更多样化的领域的NAS方法的表现知之甚少。在本文中,我们提出了NAS-Bench-360,这是一套基准套件,用于评估超出建筑搜索传统研究的域的方法,并使用它来解决以下问题:最先进的NAS方法在多样化的任务?为了构建基准测试,我们策划了十个任务,这些任务涵盖了各种应用程序域,数据集大小,问题维度和学习目标。小心地选择每个任务与现代CNN的搜索方法互操作,同时可能与其原始开发领域相距遥远。为了加快NAS研究的成本,对于其中两个任务,我们发布了包括标准CNN搜索空间的15,625个体系结构的预定性能。在实验上,我们表明需要对NAS BENCH-360进行更强大的NAS评估,从而表明几种现代NAS程序在这十个任务中执行不一致,并且有许多灾难性差的结果。我们还展示了NAS Bench-360及其相关的预算结果将如何通过测试NAS文献中最近推广的一些假设来实现未来的科学发现。 NAS-Bench-360托管在https://nb360.ml.cmu.edu上。
translated by 谷歌翻译
我们提出和研究内核偶联梯度方法(KCGM),并在可分离的希尔伯特空间上进行最小二乘回归的随机投影。考虑两种类型的随机草图和nyStr \“ {o} m子采样产生的随机投影,我们在适当的停止规则下证明了有关算法的规范变体的最佳统计结果。尤其是我们的结果表明,如果投影维度显示了投影维度与问题的有效维度成正比,带有随机草图的KCGM可以最佳地概括,同时获得计算优势。作为推论,我们在良好条件方面的经典KCGM得出了最佳的经典KCGM,因为目标函数可能不会不会在假设空间中。
translated by 谷歌翻译
在本文中,我们研究了可分离的希尔伯特空间的回归问题,并涵盖了繁殖核希尔伯特空间的非参数回归。我们研究了一类光谱/正则化算法,包括脊回归,主成分回归和梯度方法。我们证明了最佳,高概率的收敛性在研究算法的规范变体方面,考虑到对假设空间的能力假设以及目标函数的一般源条件。因此,我们以最佳速率获得了几乎确定的收敛结果。我们的结果改善并推广了先前的结果,以填补了无法实现的情况的理论差距。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译