We present HashEncoding, a novel autoencoding architecture that leverages a non-parametric multiscale coordinate hash function to facilitate a per-pixel decoder without convolutions. By leveraging the space-folding behaviour of hashing functions, HashEncoding allows for an inherently multiscale embedding space that remains much smaller than the original image. As a result, the decoder requires very few parameters compared with decoders in traditional autoencoders, approaching a non-parametric reconstruction of the original image and allowing for greater generalizability. Finally, by allowing backpropagation directly to the coordinate space, we show that HashEncoding can be exploited for geometric tasks such as optical flow.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
多模式变压器的最新努力通过合并视觉和文本信息改善了视觉上丰富的文档理解(VRDU)任务。但是,现有的方法主要集中于诸如单词和文档图像贴片之类的细粒元素,这使得他们很难从粗粒元素中学习,包括短语和显着视觉区域(如突出的图像区域)等自然词汇单元。在本文中,我们对包含高密度信息和一致语义的粗粒元素更为重要,这对于文档理解很有价值。首先,提出了文档图来模拟多层次多模式元素之间的复杂关系,其中通过基于群集的方法检测到显着的视觉区域。然后,提出了一种称为mmlayout的多模式变压器,以将粗粒的信息纳入基于图形的现有预训练的细颗粒的多峰变压器中。在mmlayout中,粗粒信息是从细粒度聚集的,然后在进一步处理后,将其融合到细粒度中以进行最终预测。此外,引入常识增强以利用天然词汇单元的语义信息。关于四个任务的实验结果,包括信息提取和文档问答,表明我们的方法可以根据细粒元素改善多模式变压器的性能,并使用更少的参数实现更好的性能。定性分析表明,我们的方法可以在粗粒元素中捕获一致的语义。
translated by 谷歌翻译
估计到达时间(ETA)预测时间(也称为旅行时间估计)是针对各种智能运输应用程序(例如导航,路线规划和乘车服务)的基本任务。为了准确预测一条路线的旅行时间,必须考虑到上下文和预测因素,例如空间 - 周期性的互动,驾驶行为和交通拥堵传播的推断。先前在百度地图上部署的ETA预测模型已经解决了时空相互作用(constgat)和驾驶行为(SSML)的因素。在这项工作中,我们专注于建模交通拥堵传播模式以提高ETA性能。交通拥堵的传播模式建模具有挑战性,它需要考虑到随着时间的推移影响区域的影响区域,以及延迟变化随时间变化的累积影响,这是由于道路网络上的流量事件引起的。在本文中,我们提出了一个实用的工业级ETA预测框架,名为Dueta。具体而言,我们基于交通模式的相关性构建了一个对拥堵敏感的图,并开发了一种路线感知图形变压器,以直接学习路段的长距离相关性。该设计使Dueta能够捕获空间遥远但与交通状况高度相关的路段对之间的相互作用。广泛的实验是在从百度地图收集的大型现实世界数据集上进行的。实验结果表明,ETA预测可以从学习的交通拥堵传播模式中显着受益。此外,Dueta已经在Baidu Maps的生产中部署,每天都有数十亿个请求。这表明Dueta是用于大规模ETA预测服务的工业级和强大的解决方案。
translated by 谷歌翻译
预训练的模型(PTM)已成为自然语言处理和计算机视觉下游任务的基本骨干。尽管通过在BAIDU地图上将通用PTM应用于与地理相关的任务中获得的最初收益,但随着时间的流逝,表现平稳。造成该平稳的主要原因之一是缺乏通用PTM中的可用地理知识。为了解决这个问题,在本文中,我们介绍了Ernie-Geol,这是一个地理和语言预培训模型,设计和开发了用于改善Baidu Maps的地理相关任务。 Ernie-Geol经过精心设计,旨在通过预先培训从包含丰富地理知识的异质图生成的大规模数据来学习地理语言的普遍表示。大规模现实数据集进行的广泛定量和定性实验证明了Ernie-Geol的优势和有效性。自2021年4月以来,Ernie-Geol已经在百度地图上部署在生产中,这显着受益于各种下游任务的性能。这表明Ernie-Geol可以作为各种与地理有关的任务的基本骨干。
translated by 谷歌翻译
近年来,由于图表代表学习的出色表现,图形神经网络(GNN)技术在许多真实情景中获得了相当大的兴趣,例如推荐系统和社交网络。在推荐系统中,主要挑战是从其互动中学习有效的用户/项目表示。但是,由于它们对数据集和评估度量的差异,比较使用GNNS用于推荐系统的GNN的许多出版物。此外,其中许多只提供了一个演示,以对小型数据集进行实验,这很远可在现实世界推荐系统中应用。为了解决这个问题,我们介绍了Graph4Rec,这是一个Universal Toolkit,它统一地将GNN模型培训到以下部分:图表输入,随机步行生成,自我图形生成,对生成和GNNS选择。从这个训练管道,可以通过一些配置轻松建立自己的GNN模型。此外,我们开发了一个大规模的图形引擎和参数服务器,以支持分布式GNN培训。我们进行系统和全面的实验,以比较不同GNN模型在不同规模中的若干场景中的性能。证明了广泛的实验以识别GNN的关键组分。我们还尝试弄清楚稀疏和密集的参数如何影响GNN的性能。最后,我们研究了包括负面采样,自我图形建设顺序和温暖开始策略的方法,以找到更有效和高效的GNNS在推荐系统上做法。我们的工具包基于PGL HTTPS://github.com/paddlePaddle/pgl,并且在https://github.com/paddlepaddle/pgl/tree/main/apps/graph4rec中打开代码。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译