智能论文笔记

Industry 4.0: Challenges and success factors for adopting digital technologies in airports

Jia Hao Tan , Tariq Masood

分类：机器人

2021-12-29

随着在过去十年的行业4.0技术的出现，机场经历了数字化，以利用这些技术的声称益处，如改善的运营效率和乘客经验。正在进行的Covid-19随着其变体的出现（例如三角洲，omicron）加剧了机场需要采用非接触式和机器人技术的新技术，以便在这种大流行期间提供旅行。然而，了解最近的挑战和成功因素，以便在机场采用数字技术。因此，通过对世界各地的机场运营商和管理人员的行业调查（n = 102,0.754，0.754 <Composite可靠性<0.892;在Covid-19期间进行），本研究确定了采用行业4.0技术（N = 20）所面临的挑战这加强了对机场支持技术采用的最佳实践或成功因素的理解。广泛使用的技术，组织环境（TOE）框架被用作调查问卷的定量部分的理论上。互补的定性部分用于支撑并延长调查结果。该行业调查是首要讨论，了解机场运营商在机场采用行业4.0技术方面的实施挑战。调查结果表明，尽管在机场采用各种行业4.0技术的通用挑战，但行业4.0技术在机场也没有在机场中实现相似的。

translated by 谷歌翻译

Adoption of Industry 4.0 technologies in airports -- A systematic literature review

Jia Hao Tan , Tariq Masood

分类：机器人

2021-12-28

机场一直不断发展和采用数字技术，以提高运营效率，增强乘客经验，从现有基础设施产生辅助收入和提升能力。 Covid-19 Pandemase也挑战机场和航空利益相关者，以适应和管理新的业务挑战，例如促进非接触式旅游经验和确保业务连续性。使用行业4.0技术的数字化为机场提供机会，以解决与Covid-19大流行相关的短期挑战，同时也为未来的危机做准备未来的长期挑战。通过对102条有关文章的系统文献综述，我们讨论了当前在机场，相关挑战以及未来的研究方向上采用行业4.0技术的现状。本综述结果表明，行业4.0技术的实施正在慢慢获得机场环境的牵引力，并在发展未来机场的数字转型旅程中继续保持相关。

translated by 谷歌翻译

Human Image Generation: A Comprehensive Survey

Zhen Jia , Zhang Zhang , Liang Wang , Tieniu Tan

分类：计算机视觉

2022-12-17

Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various deep generative models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each route, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures and input/output requirements. Besides, the main public human image datasets and evaluation metrics in the literature are also summarized. Furthermore, due to the wide application potentials, two typical downstream usages of synthesized human images are covered, i.e., data augmentation for person recognition tasks and virtual try-on for fashion customers. Finally, we discuss the challenges and potential directions of human image generation to shed light on future research.

translated by 谷歌翻译

HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design

Hui Wei , Zhixiang Wang , Xuemei Jia , Yinqiang Zheng , Hao Tang , Shin'ichi Satoh , Zheng Wang

分类：计算机视觉

2022-12-12

Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attention-grabbing}. To address the need for a physically practical and stealthy adversarial attack, we introduce \textsc{HotCold} Block, a novel physical attack for infrared detectors that hide persons utilizing the wearable Warming Paste and Cooling Paste. By attaching these readily available temperature-controlled materials to the body, \textsc{HotCold} Block evades human eyes efficiently. Moreover, unlike existing methods that build adversarial patches with complex texture and structure features, \textsc{HotCold} Block utilizes an SSP-oriented adversarial optimization algorithm that enables attacks with pure color blocks and explores the influence of size, shape, and position on attack performance. Extensive experimental results in both digital and physical environments demonstrate the performance of our proposed \textsc{HotCold} Block. \emph{Code is available: \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}.

translated by 谷歌翻译

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Jinze Bai , Rui Men , Hao Yang , Xuancheng Ren , Kai Dang , Yichang Zhang , Xiaohuan Zhou , Peng Wang , Sinan Tan , An Yang

分类：计算机视觉 | 人工智能 | 自然语言处理 | 机器学习

2022-12-08

Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

RDA: An Accelerated Collision-free Motion Planner for Autonomous Navigation in Cluttered Environments

Ruihua Han , Shuai Wang , Shuaijun Wang , Zeqing Zhang , Qianru Zhang , Yonina C. Eldar , Qi Hao , Jia Pan

分类：机器人

2022-10-01

Motion planning is challenging for autonomous systems in multi-obstacle environments due to nonconvex collision avoidance constraints. Directly applying numerical solvers to these nonconvex formulations fails to exploit the constraint structures, resulting in excessive computation time. In this paper, we present an accelerated collision-free motion planner, namely regularized dual alternating direction method of multipliers (RDADMM or RDA for short), for the model predictive control (MPC) based motion planning problem. The proposed RDA addresses nonconvex motion planning via solving a smooth biconvex reformulation via duality and allows the collision avoidance constraints to be computed in parallel for each obstacle to reduce computation time significantly. We validate the performance of the RDA planner through path-tracking experiments with car-like robots in simulation and real world setting. Experimental results show that the proposed methods can generate smooth collision-free trajectories with less computation time compared with other benchmarks and perform robustly in cluttered environments.

translated by 谷歌翻译

Physical Adversarial Attack meets Computer Vision: A Decade Survey

Hui Wei , Hao Tang , Xuemei Jia , Hanxun Yu , Zhubo Li , Zhixiang Wang , Shin'ichi Satoh , Zheng Wang

分类：计算机视觉

2022-09-30

Although Deep Neural Networks (DNNs) have achieved impressive results in computer vision, their exposed vulnerability to adversarial attacks remains a serious concern. A series of works has shown that by adding elaborate perturbations to images, DNNs could have catastrophic degradation in performance metrics. And this phenomenon does not only exist in the digital space but also in the physical space. Therefore, estimating the security of these DNNs-based systems is critical for safely deploying them in the real world, especially for security-critical applications, e.g., autonomous cars, video surveillance, and medical diagnosis. In this paper, we focus on physical adversarial attacks and provide a comprehensive survey of over 150 existing papers. We first clarify the concept of the physical adversarial attack and analyze its characteristics. Then, we define the adversarial medium, essential to perform attacks in the physical world. Next, we present the physical adversarial attack methods in task order: classification, detection, and re-identification, and introduce their performance in solving the trilemma: effectiveness, stealthiness, and robustness. In the end, we discuss the current challenges and potential future directions.

translated by 谷歌翻译

Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos

Yilin Wen , Hao Pan , Lei Yang , Jia Pan , Taku Komura , Wenping Wang

分类：计算机视觉 | 机器人

2022-09-20

由于自我批判性和歧义，了解动态的手动运动和动态动作是一项基本而又具有挑战性的任务。为了解决遮挡和歧义，我们开发了一个基于变压器的框架来利用时间信息以进行稳健的估计。注意到手部姿势估计和动作识别之间的不同时间粒度和语义相关性，我们建立了一个网络层次结构，其中有两个级联变压器编码器，其中第一个利用了短期的时间cue进行手姿势估算，而后者则每次聚集物，后者每次聚集体 - 帧姿势和对象信息在更长的时间范围内识别动作。我们的方法在两个第一人称手动作基准（即FPHA和H2O）上取得了竞争成果。广泛的消融研究验证了我们的设计选择。我们将开放源代码和数据以促进未来的研究。

translated by 谷歌翻译