无人机尚未完全信任。他们对导航的无线电和摄像机的依赖提高了安全性和隐私问题。这些系统可能会失败,导致事故,或滥用未经授权的录音。考虑到最近的法规,允许商业无人机仅在晚上运营,我们提出了一种从完全新的方法,无人机从人工照明中获得导航信息。在我们的系统中,标准灯泡调制其强度发送信标,无人机用简单的光电二极管解码此信息。该光学信息与无人机中的惯性和高度传感器组合,以提供定位,而无需无线电,GPS或相机。我们的框架是第一个提供3D无人机定位的灯光,我们用一个由四个光标记和迷你无人机组成的试验台来评估它。我们表明,我们的方法允许将无人机定位在实际位置的几个小叠内,并与最先进的定位方法相比,将本地化误差降低42%。
translated by 谷歌翻译
In this paper, we explore the relationship between an individual's writing style and the risk that they will engage in online harmful behaviors (such as cyberbullying). In particular, we consider whether measurable differences in writing style relate to different personality types, as modeled by the Big-Five personality traits and the Dark Triad traits, and can differentiate between users who do or do not engage in harmful behaviors. We study messages from nearly 2,500 users from two online communities (Twitter and Reddit) and find that we can measure significant personality differences between regular and harmful users from the writing style of as few as 100 tweets or 40 Reddit posts, aggregate these values to distinguish between healthy and harmful communities, and also use style attributes to predict which users will engage in harmful behaviors.
translated by 谷歌翻译
学习一种新语言涉及不断比较语音作品与环境的参考作品。在言语获取的早期,孩子们进行了发音调整以符合他们的看护人的言论。一种语言的成年学习者调整他们的演讲以匹配导师参考。本文提出了一种合成产生正确的发音反馈的方法。此外,我们的目标是在保持演讲者的原始声音的同时产生校正后的生产。该系统提示用户发音短语。记录语音,并用与不准确音素相关的样品用零掩盖。该波形是对语音生成器的输入,作为具有U-NET体系结构的深度学习介绍系统实现,并经过培训以输出重建的语音。该训练集由未损坏的适当语音示例组成,并且对发电机进行了训练以重建原始的适当语音。我们评估了系统的性能在音素替代英语以及发音障碍儿童的最小对单词方面的性能。结果表明,人类听众稍微偏爱我们产生的语音,而不是用不同的扬声器的生产来平滑地替换音素。
translated by 谷歌翻译
声带煎炸或吱吱作响的声音是指以不规则的发光开口和低音为特征的语音质量。它以各种语言发生,并且在美国英语中很普遍,不仅可以标记词组结局,还用于社会语言因素和影响。由于其不规则的周期性,吱吱作响的声音挑战自动语音处理和识别系统,尤其是对于经常使用吱吱作响的语言。本文提出了一个深度学习模型,以检测流利的语音中的吱吱作响的声音。该模型由编码器和经过训练的分类器组成。编码器采用原始波形,并使用卷积神经网络学习表示。分类器被实现为多头完全连接的网络,该网络训练有素,可检测吱吱作响的声音,发声和音调,最后两个用于完善吱吱作响的预测。该模型经过对美国英语说话者的言语的培训和测试,并由训练有素的语音家注释。我们使用两个编码器评估了系统的性能:一个是为任务量身定制的,另一个是基于最新的无监督表示。结果表明,与看不见的数据相比,我们表现最佳的系统的回忆和F1得分有所改善。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译