智能论文笔记

Multistep Electric Vehicle Charging Station Occupancy Prediction using Hybrid LSTM Neural Networks

Tai-Yu Ma , Sébastien Faye

分类：机器学习 | 神经与进化计算

2021-06-09

公共收费站占用预测在开发智能充电策略方面发挥了重要意义，以减少电动车辆（EV）操作员和用户不便。然而，现有研究主要基于具有有限的准确度的传统经济学或时间序列方法。我们提出了一种新的混合长期内记忆神经网络，其包括历史充电状态序列和时间相关的特征，用于多步离散充电占用状态预测。与现有的LSTM网络不同，所提出的模型将不同类型的特征分开，并用混合神经网络架构处理它们。该模型与许多最先进的机器学习和深度学习方法进行了比较，基于从英国邓迪市的开放数据门户网站获得的EV充电数据。结果表明，该方法分别产生非常准确的预测（99.99％和81.87％，分别前进（10分钟）和6个步骤（1小时），优于基准接近的（+ 22.4％）前方预测和6步前方的预测和6.2％）。进行灵敏度分析，以评估模型参数对预测精度的影响。

translated by 谷歌翻译

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Chan Hee Song , Jihyung Kil , Tai-Yu Pan , Brian M. Sadler , Wei-Lun Chao , Yu Su

分类：人工智能 | 自然语言处理 | 计算机视觉 | 机器学习 | 机器人

2022-02-14

我们研究了开发自主代理的问题，这些自主代理可以遵循人类的指示来推断和执行一系列行动以完成基础任务。近年来取得了重大进展，尤其是对于短范围的任务。但是，当涉及具有扩展动作序列的长匹马任务时，代理可以轻松忽略某些指令或陷入长长指令中间，并最终使任务失败。为了应对这一挑战，我们提出了一个基于模型的里程碑的任务跟踪器（M-Track），以指导代理商并监视其进度。具体而言，我们提出了一个里程碑构建器，该建筑商通过导航和交互里程碑标记指令，代理商需要逐步完成，以及一个系统地检查代理商当前里程碑的进度并确定何时继续进行下一个的里程碑检查器。在具有挑战性的Alfred数据集上，我们的M轨道在两个竞争基本模型中，未见成功率的相对成功率显着提高了33％和52％。

translated by 谷歌翻译

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Tai-Yu Pan , Cheng Zhang , Yandong Li , Hexiang Hu , Dong Xuan , Soravit Changpinyo , Boqing Gong , Wei-Lun Chao

分类：计算机视觉 | 人工智能 | 机器学习

2021-07-05

Vanilla用于物体检测和实例分割的模型遭受重偏向朝着长尾设置中的频繁对象进行偏向。现有方法主要在培训期间解决此问题，例如，通过重新采样或重新加权。在本文中，我们调查了一个很大程度上被忽视的方法 - 置信分数的后处理校准。我们提出NORCAL，用于长尾对象检测和实例分割的归一化校准校准，简单而简单的配方，通过其训练样本大小重新恢复每个阶级的预测得分。我们展示了单独处理背景类并使每个提案的课程分数标准化是实现卓越性能的键。在LVIS DataSet上，Norcal不仅可以在罕见的课程上有效地改善所有基线模型，也可以在普通和频繁的阶级上改进。最后，我们进行了广泛的分析和消融研究，以了解我们方法的各种建模选择和机制的见解。我们的代码在https://github.com/tydpan/norcal/上公开提供。

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

KoopmanLab: A PyTorch module of Koopman neural operator family for solving partial differential equations

Wei Xiong , Muyuan Ma , Pei Sun , Yang Tian

分类：机器学习

2023-01-03

Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译

P3DC-Shot: Prior-Driven Discrete Data Calibration for Nearest-Neighbor Few-Shot Classification

Shuangmei Wang , Rui Ma , Tieru Wu , Yang Cao

分类：计算机视觉

2023-01-02

Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.

translated by 谷歌翻译

Model-Driven Deep Learning for Non-Coherent Massive Machine-Type Communications

Zhe Ma , Wen Wu , Feifei Gao , Xuemin , Shen

分类：机器学习

2023-01-02

In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.

translated by 谷歌翻译