智能论文笔记

Phase function estimation from a diffuse optical image via deep learning

Yuxuan Liang , Chuang Niu , Chen Wei , Shenghan Ren , Wenxiang Cong , Ge Wang

分类：机器学习

2021-11-16

相位函数是Monte Carlo（MC）仿真的光传播模型的关键元件，其通常配备有具有相关参数的分析功能。据报道，据报道，机器学习方法估计特定形式的相位函数的参数，例如Henyey-Greenstein相位功能，但是，对于我们的知识，没有进行研究以确定相位功能的形式。在这里，我们设计卷积神经网络，以估计来自漫反射光图像的相位函数而没有对相位函数的形式进行任何明确的假设。具体地，我们使用高斯混合模型作为示例来表示相位函数，并准确地学习模型参数。选择高斯混合模型，因为它提供了相位函数的分析表达，以便于MC模拟中促进偏转角采样，并且不会显着增加自由参数的数量。我们所提出的方法在典型的生物组织的MC模拟反射图像上使用不同的各向异性因子进行典型生物组织的MC模拟反射图像。分析了视野（FOV）和空间分辨率对误差的影响以优化估计方法。相位函数的平均平方误差为0.01，各向异性因子的相对误差为3.28％。

translated by 谷歌翻译

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Tim G. J. Rudner , Cong Lu , Michael A. Osborne , Yarin Gal , Yee Whye Teh

分类：机器学习 | 人工智能 | (统计)机器学习

2022-12-28

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

translated by 谷歌翻译

Multisensor Data Fusion for Reliable Obstacle Avoidance

Thanh Nguyen Canh , Truong Son Nguyen , Cong Hoang Quach , Xiem HoangVan , Manh Duong Phung

分类：机器人

2022-12-26

In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments.

translated by 谷歌翻译

Multi-Projection Fusion and Refinement Network for Salient Object Detection in 360° Omnidirectional Image

Runmin Cong , Ke Huang , Jianjun Lei , Yao Zhao , Qingming Huang , Sam Kwong

分类：计算机视觉

2022-12-23

Salient object detection (SOD) aims to determine the most visually attractive objects in an image. With the development of virtual reality technology, 360{\deg} omnidirectional image has been widely used, but the SOD task in 360{\deg} omnidirectional image is seldom studied due to its severe distortions and complex scenes. In this paper, we propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360{\deg} omnidirectional image. Different from the existing methods, the equirectangular projection image and four corresponding cube-unfolding images are embedded into the network simultaneously as inputs, where the cube-unfolding images not only provide supplementary information for equirectangular projection image, but also ensure the object integrity of the cube-map projection. In order to make full use of these two projection modes, a Dynamic Weighting Fusion (DWF) module is designed to adaptively integrate the features of different projections in a complementary and dynamic manner from the perspective of inter and intra features. Furthermore, in order to fully explore the way of interaction between encoder and decoder features, a Filtration and Refinement (FR) module is designed to suppress the redundant information between the feature itself and the feature. Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.

translated by 谷歌翻译

Fast Rule-Based Decoding: Revisiting Syntactic Rules in Neural Constituency Parsing

Tianyu Shi , Zhicheng Wang , Liyin Xiao , Cong Liu

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-16

Most recent studies on neural constituency parsing focus on encoder structures, while few developments are devoted to decoders. Previous research has demonstrated that probabilistic statistical methods based on syntactic rules are particularly effective in constituency parsing, whereas syntactic rules are not used during the training of neural models in prior work probably due to their enormous computation requirements. In this paper, we first implement a fast CKY decoding procedure harnessing GPU acceleration, based on which we further derive a syntactic rule-based (rule-constrained) CKY decoding. In the experiments, our method obtains 95.89 and 92.52 F1 on the datasets of PTB and CTB respectively, which shows significant improvements compared with previous approaches. Besides, our parser achieves strong and competitive cross-domain performance in zero-shot settings.

translated by 谷歌翻译

Improving Warped Planar Object Detection Network For Automatic License Plate Recognition

Nguyen Dinh Tra , Nguyen Cong Tri , Phan Duy Hung

分类：计算机视觉 | 人工智能

2022-12-14

This paper aims to improve the Warping Planer Object Detection Network (WPOD-Net) using feature engineering to increase accuracy. What problems are solved using the Warping Object Detection Network using feature engineering? More specifically, we think that it makes sense to add knowledge about edges in the image to enhance the information for determining the license plate contour of the original WPOD-Net model. The Sobel filter has been selected experimentally and acts as a Convolutional Neural Network layer, the edge information is combined with the old information of the original network to create the final embedding vector. The proposed model was compared with the original model on a set of data that we collected for evaluation. The results are evaluated through the Quadrilateral Intersection over Union value and demonstrate that the model has a significant improvement in performance.

translated by 谷歌翻译

Edge Computing for Semantic Communication Enabled Metaverse: An Incentive Mechanism Design

Nguyen Cong Luong , Quoc-Viet Pham , Thien Huynh-The , Van-Dinh Nguyen , Derrick Wing Kwan Ng , Symeon Chatzinotas

分类：机器学习

2022-12-13

Semantic communication (SemCom) and edge computing are two disruptive solutions to address emerging requirements of huge data communication, bandwidth efficiency and low latency data processing in Metaverse. However, edge computing resources are often provided by computing service providers and thus it is essential to design appealingly incentive mechanisms for the provision of limited resources. Deep learning (DL)- based auction has recently proposed as an incentive mechanism that maximizes the revenue while holding important economic properties, i.e., individual rationality and incentive compatibility. Therefore, in this work, we introduce the design of the DLbased auction for the computing resource allocation in SemComenabled Metaverse. First, we briefly introduce the fundamentals and challenges of Metaverse. Second, we present the preliminaries of SemCom and edge computing. Third, we review various incentive mechanisms for edge computing resource trading. Fourth, we present the design of the DL-based auction for edge resource allocation in SemCom-enabled Metaverse. Simulation results demonstrate that the DL-based auction improves the revenue while nearly satisfying the individual rationality and incentive compatibility constraints.

translated by 谷歌翻译

Towards Accurate Ground Plane Normal Estimation from Ego-Motion

Jiaxin Zhang , Wei Sui , Qian Zhang , Tao Chen , Cong Yang

分类：计算机视觉 | 机器人

2022-12-08

In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of various autonomous driving tasks (e.g., 3D object detection, road surface reconstruction, and trajectory planning). Our proposed method only uses odometry as input and estimates accurate ground plane normal vectors in real time. Particularly, it fully utilizes the underlying connection between the ego pose odometry (ego-motion) and its nearby ground plane. Built on that, an Invariant Extended Kalman Filter (IEKF) is designed to estimate the normal vector in the sensor's coordinate. Thus, our proposed method is simple yet efficient and supports both camera- and inertial-based odometry algorithms. Its usability and the marked improvement of robustness are validated through multiple experiments on public datasets. For instance, we achieve state-of-the-art accuracy on KITTI dataset with the estimated vector error of 0.39{\deg}. Our code is available at github.com/manymuch/ground_normal_filter.

translated by 谷歌翻译

Learning to Dub Movies via Hierarchical Prosody Models

Gaoxiang Cong , Liang Li , Yuankai Qi , Zhengjun Zha , Qi Wu , Wenyu Wang , Bin Jiang , Ming-Hsuan Yang , Qingming Huang

分类：自然语言处理

2022-12-08

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions and speaking speed presented in the video. Unlike previous works, we propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modelling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene. Specifically, we align lip movement to the speech duration, and convey facial expression to speech energy and pitch via attention mechanism based on valence and arousal representations inspired by recent psychology findings. Moreover, we design an emotion booster to capture the atmosphere from global video scenes. All these embeddings together are used to generate mel-spectrogram and then convert to speech waves via existing vocoder. Extensive experimental results on the Chem and V2C benchmark datasets demonstrate the favorable performance of the proposed method. The source code and trained models will be released to the public.

translated by 谷歌翻译

WIDER & CLOSER: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition

Jun-Yu Ma , Beiduo Chen , Jia-Chen Gu , Zhen-Hua Ling , Wu Guo , Quan Liu , Zhigang Chen , Cong Liu

分类：自然语言处理

2022-12-07

Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields.

translated by 谷歌翻译