智能论文笔记

Toward Data Heterogeneity of Federated Learning

Yuchuan Huang , Chen Hu

分类：机器学习

2022-12-17

Federated learning is a popular paradigm for machine learning. Ideally, federated learning works best when all clients share a similar data distribution. However, it is not always the case in the real world. Therefore, the topic of federated learning on heterogeneous data has gained more and more effort from both academia and industry. In this project, we first do extensive experiments to show how data skew and quantity skew will affect the performance of state-of-art federated learning algorithms. Then we propose a new algorithm FedMix which adjusts existing federated learning algorithms and we show its performance. We find that existing state-of-art algorithms such as FedProx and FedNova do not have a significant improvement in all testing cases. But by testing the existing and new algorithms, it seems that tweaking the client side is more effective than tweaking the server side.

translated by 谷歌翻译

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems

Ting-En Lin , Yuchuan Wu , Fei Huang , Luo Si , Jian Sun , Yongbin Li

分类：自然语言处理

2022-05-30

在本文中，我们提出了双工对话，这是一种多型，多模式的口语对话系统，使基于电话的代理能够与人类这样的客户互动。我们在电信中使用全双工的概念来证明人类般的互动体验应该是什么以及如何通过三个子任务实现平稳的转弯：用户状态检测，后拨频选择和驳船检测。此外，我们建议使用多模式数据增强的半监督学习，以利用未标记的数据来增加模型的概括。三个子任务的实验结果表明，与基准相比，所提出的方法可实现一致的改进。我们将双工对话部署到阿里巴巴智能客户服务，并在生产中分享经验教训。在线A/B实验表明，所提出的系统可以将响应潜伏期显着降低50％。

translated by 谷歌翻译

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Wanwei He , Yinpei Dai , Yinhe Zheng , Yuchuan Wu , Zheng Cao , Dermot Liu , Peng Jiang , Min Yang , Fei Huang , Luo Si

分类：自然语言处理

2021-11-29

预先训练的模型已经证明是强大的增强面向任务的对话系统。但是，目前的预训练方法主要关注增强对话的理解和生成任务，同时忽略对话策略的开发。在本文中，我们提出了一个小说预先训练的对话模型，明确地通过半监督学习明确地从有限标记的对话框和大规模未标记的对话框中学习对话策略。具体而言，我们在预训练期间介绍一个对话框预测任务，以便在预训练中进行策略优化，并使用一致性正则化术语在未标记的对话的帮助下优化学习的表示。我们还实施了一个浇注机制来称量合适的未标记对话框样本。经验结果表明，星系大大提高了面向任务为导向的对话系统的性能，并在基准数据集中实现了新的最先进结果：车载，多种多纤2.0和多纺，改善其端到端合并分数2.5,5.3和5.5分。我们还显示Galaxy比各种低资源设置下的现有模型更强大的少量射击能力。

translated by 谷歌翻译

MultiEarth 2022 -- The Champion Solution for the Matrix Completion Challenge via Multimodal Regression and Generation

Bo Peng , Hongchen Liu , Hang Zhou , Yuchuan Gou , Jui-Hsin Lai

分类：计算机视觉

2022-06-17

地球观测卫星多年来一直在不同位置和具有不同模态的光谱带的地球环境中连续监测地球环境。由于复杂的卫星传感条件（例如，天气，云，大气，轨道），可能无法使用某些模式，乐队，位置和时间的观察。CVPR 2022 [1]中的多学历矩阵完成挑战提供了多模式卫星数据，用于以亚马逊雨林作为感兴趣的地区来解决此类数据稀疏挑战。这项工作提出了自适应的实时多模式回归和生成框架，并以0.2226的LPIP，123.0372的PSNR和0.6347的SSIM在这一挑战中在看不见的测试查询方面取得了出色的性能。

translated by 谷歌翻译

Human-Vehicle Cooperative Visual Perception for Shared Autonomous Driving

Yiyue Zhao , Cailin Lei , Yu Shen , Yuchuan Du , Qijun Chen

分类：计算机视觉

2021-12-17

随着环境感知等关键技术的发展，自动车辆的自动化水平一直在增加。然而，在达到高度自主驾驶之前，手动驾驶仍然需要参与驱动过程以确保人车辆的安全性。现有的人工合作社驾驶侧重于汽车工程和司机的行为，在视野中少数研究研究。由于复杂道路交通冲突情景的表现不佳，需要进一步研究合作视觉感知。此外，自主驾驶感知系统无法正确理解手动驾驶的特性。基于上面的背景，本文直接提出了一种基于转移学习方法和复杂道路交通场景的传输学习方法和图像融合算法来增强共享自主驱动的视觉感知能力的人工载体的协作视觉传感方法。基于转移学习，物体检测地图达到75.52％，为视觉融合奠定了坚实的基础。融合实验进一步揭示了人工车辆的合作视觉感知反映了最风险的区域并更准确地预测冲突物体的轨迹。本研究开创了在现实世界复杂的交通冲突方案中共享自主驾驶和实验的合作视觉认知解决方案，可以更好地支持以下规划和控制和提高自动车辆的安全性。

translated by 谷歌翻译

Rethinking Mobile Block for Efficient Neural Models

Jiangning Zhang , Xiangtai Li , Jian Li , Liang Liu , Zhucun Xue , Boshen Zhang , Zhengkai Jiang , Tianxin Huang , Yabiao Wang , Chengjie Wang

分类：计算机视觉

2023-01-03

This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.

translated by 谷歌翻译

PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora

Dinesh Nagumothu , Bahadorreza Ofoghi , Guangyan Huang , Peter W. Eklund

分类：自然语言处理 | 人工智能

2023-01-03

Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译

Analogical Inference Enhanced Knowledge Graph Embedding

Yao Zhen , Zhang Wen , Chen Mingyang , Huang Yufeng , Yang Yi , Chen Huajun

分类：人工智能 | 自然语言处理

2023-01-03

Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.

translated by 谷歌翻译

Digital Engineering Transformation with Trustworthy AI towards Industry 4.0: Emerging Paradigm Shifts

Jingwei Huang

分类：人工智能

2023-01-03

Digital engineering transformation is a crucial process for the engineering paradigm shifts in the fourth industrial revolution (4IR), and artificial intelligence (AI) is a critical enabling technology in digital engineering transformation. This article discusses the following research questions: What are the fundamental changes in the 4IR? More specifically, what are the fundamental changes in engineering? What is digital engineering? What are the main uncertainties there? What is trustworthy AI? Why is it important today? What are emerging engineering paradigm shifts in the 4IR? What is the relationship between the data-intensive paradigm and digital engineering transformation? What should we do for digitalization? From investigating the pattern of industrial revolutions, this article argues that ubiquitous machine intelligence (uMI) is the defining power brought by the 4IR. Digitalization is a condition to leverage ubiquitous machine intelligence. Digital engineering transformation towards Industry 4.0 has three essential building blocks: digitalization of engineering, leveraging ubiquitous machine intelligence, and building digital trust and security. The engineering design community at large is facing an excellent opportunity to bring the new capabilities of ubiquitous machine intelligence and trustworthy AI principles, as well as digital trust, together in various engineering systems design to ensure the trustworthiness of systems in Industry 4.0.

translated by 谷歌翻译