智能论文笔记

A Hyperspectral and RGB Dataset for Building Facade Segmentation

Nariman Habili , Ernest Kwan , Weihao Li , Christfried Webers , Jeremy Oorloff , Mohammad Ali Armin , Lars Petersson

分类：计算机视觉

2022-12-06

Hyperspectral Imaging (HSI) provides detailed spectral information and has been utilised in many real-world applications. This work introduces an HSI dataset of building facades in a light industry environment with the aim of classifying different building materials in a scene. The dataset is called the Light Industrial Building HSI (LIB-HSI) dataset. This dataset consists of nine categories and 44 classes. In this study, we investigated deep learning based semantic segmentation algorithms on RGB and hyperspectral images to classify various building materials, such as timber, brick and concrete.

translated by 谷歌翻译

Combating Uncertainty and Class Imbalance in Facial Expression Recognition

Jiaxiang Fan , Jian Zhou , Xiaoyu Deng , Huabin Wang , Liang Tao , Hon Keung Kwan

分类：计算机视觉

2022-12-15

Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well.

translated by 谷歌翻译

Edge Computing for Semantic Communication Enabled Metaverse: An Incentive Mechanism Design

Nguyen Cong Luong , Quoc-Viet Pham , Thien Huynh-The , Van-Dinh Nguyen , Derrick Wing Kwan Ng , Symeon Chatzinotas

分类：机器学习

2022-12-13

Semantic communication (SemCom) and edge computing are two disruptive solutions to address emerging requirements of huge data communication, bandwidth efficiency and low latency data processing in Metaverse. However, edge computing resources are often provided by computing service providers and thus it is essential to design appealingly incentive mechanisms for the provision of limited resources. Deep learning (DL)- based auction has recently proposed as an incentive mechanism that maximizes the revenue while holding important economic properties, i.e., individual rationality and incentive compatibility. Therefore, in this work, we introduce the design of the DLbased auction for the computing resource allocation in SemComenabled Metaverse. First, we briefly introduce the fundamentals and challenges of Metaverse. Second, we present the preliminaries of SemCom and edge computing. Third, we review various incentive mechanisms for edge computing resource trading. Fourth, we present the design of the DL-based auction for edge resource allocation in SemCom-enabled Metaverse. Simulation results demonstrate that the DL-based auction improves the revenue while nearly satisfying the individual rationality and incentive compatibility constraints.

translated by 谷歌翻译

Edge-Assisted V2X Motion Planning and Power Control Under Channel Uncertainty

Zongze Li , Shuai Wang , Shiyao Zhang , Miaowen Wen , Kejiang Ye , Yik-Chung Wu , Derrick Wing Kwan Ng

分类：机器人

2022-12-13

Edge-assisted vehicle-to-everything (V2X) motion planning is an emerging paradigm to achieve safe and efficient autonomous driving, since it leverages the global position information shared among multiple vehicles. However, due to the imperfect channel state information (CSI), the position information of vehicles may become outdated and inaccurate. Conventional methods ignoring the communication delays could severely jeopardize driving safety. To fill this gap, this paper proposes a robust V2X motion planning policy that adapts between competitive driving under a low communication delay and conservative driving under a high communication delay, and guarantees small communication delays at key waypoints via power control. This is achieved by integrating the vehicle mobility and communication delay models and solving a joint design of motion planning and power control problem via the block coordinate descent framework. Simulation results show that the proposed driving policy achieves the smallest collision ratio compared with other benchmark policies.

translated by 谷歌翻译

Hierarchical multimodal transformers for Multi-Page DocVQA

Rubèn Tito , Dimosthenis Karatzas , Ernest Valveny

分类：计算机视觉 | 人工智能 | 自然语言处理

2022-12-07

Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.

translated by 谷歌翻译

Portmanteauing Features for Scene Text Recognition

Yew Lee Tan , Ernest Yu Kai Chew , Adams Wai-Kin Kong , Jung-Jae Kim , Joo Hwee Lim

分类：计算机视觉

2022-11-09

Scene text images have different shapes and are subjected to various distortions, e.g. perspective distortions. To handle these challenges, the state-of-the-art methods rely on a rectification network, which is connected to the text recognition network. They form a linear pipeline which uses text rectification on all input images, even for images that can be recognized without it. Undoubtedly, the rectification network improves the overall text recognition performance. However, in some cases, the rectification network generates unnecessary distortions on images, resulting in incorrect predictions in images that would have otherwise been correct without it. In order to alleviate the unnecessary distortions, the portmanteauing of features is proposed. The portmanteau feature, inspired by the portmanteau word, is a feature containing information from both the original text image and the rectified image. To generate the portmanteau feature, a non-linear input pipeline with a block matrix initialization is presented. In this work, the transformer is chosen as the recognition network due to its utilization of attention and inherent parallelism, which can effectively handle the portmanteau feature. The proposed method is examined on 6 benchmarks and compared with 13 state-of-the-art methods. The experimental results show that the proposed method outperforms the state-of-the-art methods on various of the benchmarks.

translated by 谷歌翻译

Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection

Eddie Pei , Ernest Fokoue

分类： (统计)机器学习 | 机器学习

2022-11-04

This paper computationally demonstrates a sharp improvement in predictive performance for $k$ nearest neighbors thanks to an efficient forward selection of the predictor variables. We show both simulated and real-world data that this novel repeatedly approaches outperformance regression models under stepwise selection

translated by 谷歌翻译

Adaptive Bias Correction for Improved Subseasonal Forecasting

Soukayna Mouatadid , Paulo Orenstein , Genevieve Flaspohler , Judah Cohen , Miruna Oprescu , Ernest Fraenkel , Lester Mackey

分类：机器学习 | (统计)机器学习

2022-09-21

季节预测$ \ unicode {x2013} $预测温度和降水量为2至6周$ \ unicode {x2013} $，对于有效的水分配，野火管理，干旱和缓解洪水至关重要。最近的国际研究工作提高了操作动力学模型的亚季节能力，但是温度和降水预测技能仍然很差，部分原因是代表动态模型内大气动力学和物理学的顽固错误。为了应对这些错误，我们引入了一种自适应偏置校正（ABC）方法，该方法将最新的动力学预测与使用机器学习的观察结合在一起。当应用于欧洲中等天气预测中心（ECMWF）的领先的亚季节模型时，ABC将温度预测技能提高了60-90％，在美国的连续美国，降水预测技能提高了40-69％基于Shapley队列的实用工作流程，用于解释ABC技能的提高并根据特定的气候条件识别机遇的高技能窗口。

translated by 谷歌翻译

Deep Learning-Based Rate-Splitting Multiple Access for Reconfigurable Intelligent Surface-Aided Tera-Hertz Massive MIMO

Minghui Wu , Zhen Gao , Yang Huang , Zhenyu Xiao , Derrick Wing Kwan Ng , Zhaoyang Zhang

分类：人工智能 | 机器学习

2022-09-18

可重新配置的智能表面（RIS）可以显着增强TERA-HERTZ大量多输入多输出（MIMO）通信系统的服务覆盖范围。但是，获得有限的飞行员和反馈信号开销的准确高维通道状态信息（CSI）具有挑战性，从而严重降低了常规空间分裂多次访问的性能。为了提高针对CSI缺陷的鲁棒性，本文提出了针对RIS辅助TERA-HERTZ多用户MIMO系统的基于深度学习的（DL）基于速率的多访问（RSMA）方案。具体而言，我们首先提出了基于DL的混合数据模型驱动的RSMA预编码方案，包括RIS的被动预编码以及模拟主动编码和基本站（BS）的RSMA数字活动预码。为了实现RIS的被动预码，我们提出了一个基于变压器的数据驱动的RIS反射网络（RRN）。至于BS的模拟主动编码，我们提出了一个基于匹配器的模拟预编码方案，因为BS和RIS采用了Los-Mimo天线阵列结构。至于BS的RSMA数字活动预码，我们提出了一个低复杂性近似加权的最小均方误差（AWMMSE）数字编码方案。此外，为了更好地编码性能以及较低的计算复杂性，模型驱动的深层展开的主动编码网络（DFAPN）也是通过将所提出的AWMMSE方案与DL相结合的。然后，为了在BS处获得准确的CSI，以实现提高光谱效率的RSMA预编码方案，我们提出了一个CSI采集网络（CAN），具有低飞行员和反馈信号开销，下行链接飞行员的传输，CSI在此处使用CSI的CSI反馈。（UES）和BS处的CSI重建被建模为基于变压器的端到端神经网络。

translated by 谷歌翻译

Adaptive Passivity-Based Pose Tracking Control of Cable-Driven Parallel Robots for Multiple Attitude Parameterizations

Sze Kwan Cheah , Alex Hayes , Ryan J. Caverly

分类：机器人

2022-09-07

拟议的控制方法使用基于自适应的馈电控制器来为CDPR建立一个被动输入输出映射，该映射与线性不变的严格阳性真实反馈控制器一起使用，以确保稳健的闭环输入输出稳定性和渐进式姿势轨迹通过消极定理跟踪。所提出的控制器的新颖性是其配方用于一系列有效载荷态度参数化，包括任何无约束的态度参数化，四元组或方向余弦矩阵（DCM）。通过用刚性和柔性电缆的CDPR进行数值模拟，证明了所提出的控制器的性能和鲁棒性。结果证明了仔细定义CDPR的姿势误差的重要性，CDPR的姿势误差是在使用Quaternion和dcm时以乘法方式执行的，并且在使用不受约束的态度参数时（例如Euler-andle-angle序列）时以特定的添加剂方式执行。

translated by 谷歌翻译