智能论文笔记

Identity-Aware Hand Mesh Estimation and Personalization from RGB Images

Deying Kong , Linguang Zhang , Liangjian Chen , Haoyu Ma , Xiangyi Yan , Shanlin Sun , Xingwei Liu , Kun Han , Xiaohui Xie

分类：计算机视觉

2022-09-22

从单眼RGB图像中重建3D手网络，由于其在AR/VR领域的巨大潜在应用，引起了人们的注意力越来越多。大多数最先进的方法试图以匿名方式解决此任务。具体而言，即使在连续录制会话中用户没有变化的实际应用程序中实际上可用，因此忽略了该主题的身份。在本文中，我们提出了一个身份感知的手网格估计模型，该模型可以结合由受试者的内在形状参数表示的身份信息。我们通过将提出的身份感知模型与匿名对待主题的基线进行比较来证明身份信息的重要性。此外，为了处理未见测试对象的用例，我们提出了一条新型的个性化管道来校准固有的形状参数，仅使用该受试者的少数未标记的RGB图像。在两个大型公共数据集上进行的实验验证了我们提出的方法的最先进性能。

translated by 谷歌翻译

PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Haoyu Ma , Zhe Wang , Yifei Chen , Deying Kong , Liangjian Chen , Xingwei Liu , Xiangyi Yan , Hao Tang , Xiaohui Xie

分类：计算机视觉

2022-09-16

最近，视觉变压器及其变体在人类和多视图人类姿势估计中均起着越来越重要的作用。将图像补丁视为令牌，变形金刚可以对整个图像中的全局依赖项进行建模或其他视图中的图像。但是，全球关注在计算上是昂贵的。结果，很难将这些基于变压器的方法扩展到高分辨率特征和许多视图。在本文中，我们提出了代币螺旋的姿势变压器（PPT）进行2D人姿势估计，该姿势估计可以找到粗糙的人掩模，并且只能在选定的令牌内进行自我注意。此外，我们将PPT扩展到多视图人类姿势估计。我们建立在PPT的基础上，提出了一种新的跨视图融合策略，称为人类区域融合，该策略将所有人类前景像素视为相应的候选者。可可和MPII的实验结果表明，我们的PPT可以在减少计算的同时匹配以前的姿势变压器方法的准确性。此外，对人类360万和滑雪姿势的实验表明，我们的多视图PPT可以有效地从多个视图中融合线索并获得新的最新结果。

translated by 谷歌翻译

In-plane prestressed hair clip mechanism for the fastest untethered compliant fish robot

Zechen Xiong , Liqi Chen , Wenxiong Hao , Pengfei Yang , Shicheng Wang , Sarah Li Wilkinson , Yufeng Su , Xiangyi Ren , Nipun Poddar , Xi Chen

分类：机器人

2022-07-18

在过去的几十年中，出现了一种趋势，指出在可移动，可编程和可转换机制中利用结构不稳定性。受钢制发夹的启发，我们将面板组件与可靠的结构相结合，并使用半刚性塑料板建造合规的拍打机构，并将其安装在束缚的气动软机器人鱼和无螺旋螺旋式的电动机驱动器上，以展示它的前所未有的优势。设计规则是根据理论和验证提出的。观察到与参考相比，气动鱼的游泳速度提高了两倍，对Untether Fish的进一步研究表明，对于不固定的兼容的游泳运动员，可损坏的速度为2.03 BL/S（43.6 cm/s），优于先前报告的最快的，其幅度为194％。这项工作可能预示着下一代符合下一代机器人技术的结构革命。

translated by 谷歌翻译

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

Gaoyuan Zhang , Songtao Lu , Yihua Zhang , Xiangyi Chen , Pin-Yu Chen , Quanfu Fan , Lee Martie , Lior Horesh , Mingyi Hong , Sijia Liu

分类：机器学习

2022-06-13

当前的深度神经网络（DNN）容易受到对抗性攻击的影响，在这种攻击中，对输入的对抗扰动可以改变或操纵分类。为了防御此类攻击，已证明一种有效而流行的方法，称为对抗性训练（AT），可通过一种最小的最大强大的训练方法来减轻对抗攻击的负面影响。尽管有效，但尚不清楚它是否可以成功地适应分布式学习环境。分布式优化对多台机器的功能使我们能够扩展大型型号和数据集的强大训练。我们提出了这一点，我们提出了分布式的对抗训练（DAT），这是在多台机器上实施的大批量对抗训练框架。我们证明DAT是一般的，它支持对标记和未标记的数据，多种类型的攻击生成方法以及梯度压缩操作的培训。从理论上讲，我们在优化理论中的标准条件下提供了DAT与一般非凸面设置中一阶固定点的收敛速率。从经验上讲，我们证明DAT要么匹配或胜过最先进的稳健精度，并实现了优美的训练速度（例如，在ImageNet下的Resnet-50上）。代码可在https://github.com/dat-2022/dat上找到。

translated by 谷歌翻译

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

Haoyu Ma , Liangjian Chen , Deying Kong , Zhe Wang , Xingwei Liu , Hao Tang , Xiangyi Yan , Yusheng Xie , Shih-Yao Lin , Xiaohui Xie

分类：计算机视觉

2021-10-18

估计每个视图中的2D人类姿势通常是校准多视图3D姿势估计的第一步。但是，2D姿势探测器的性能遭受挑战性的情况，例如闭塞和斜视角。为了解决这些挑战，以前的作品从eMipolar几何中的不同视图之间导出点对点对应关系，并利用对应关系来合并预测热插拔或特征表示。除了后预测合并/校准之外，我们引入了用于多视图3D姿势估计的变压器框架，其目的地通过将来自不同视图的信息集成信息来直接改善单个2D预测器。灵感来自先前的多模态变压器，我们设计一个统一的变压器体系结构，命名为输送，从当前视图和邻近视图中保险。此外，我们提出了eMipolar字段的概念来将3D位置信息编码到变压器模型中。由Epipolar字段引导的3D位置编码提供了一种有效的方式来编码不同视图的像素之间的对应关系。人类3.6M和滑雪姿势的实验表明，与其他融合方法相比，我们的方法更有效，并且具有一致的改进。具体而言，我们在256 x 256分辨率上只有5米参数达到人类3.6米的25.8毫米MPJPE。

translated by 谷歌翻译

Diffeomorphic Image Registration with Neural Velocity Field

Kun Han , Shanlin Sun , Chenyu You , Hao Tang , Deying Kong , Xiangyi Yan , Xiaohui Xie

分类：计算机视觉

2022-02-25

差异图像注册是医学图像分析中的至关重要任务。最近基于学习的图像注册方法利用卷积神经网络（CNN）学习图像对之间的空间转换并达到快速推理速度。但是，这些方法通常需要大量的培训数据来提高其概括能力。在测试时间内，基于学习的方法可能无法提供良好的注册结果，这很可能是因为培训数据集的模型过于拟合。在本文中，我们提出了连续速度场（NEVF）的神经表示，以描述两个图像之间的变形。具体而言，该神经速度场为空间中的每个点分配了一个速度向量，该速度在对复杂变形场进行建模时具有更高的灵活性。此外，我们提出了一种简单的稀疏抽样策略，以减少差异注册的记忆消耗。提出的NEVF还可以与预先训练的基于学习的模型合并，该模型的预测变形被视为优化的初始状态。在两个大规模3D MR脑扫描数据集上进行的广泛实验表明，我们提出的方法的表现优于最先进的注册方法。

translated by 谷歌翻译

New metal-plastic hybrid additive manufacturing strategy: Fabrication of arbitrary metal-patterns on external and even internal surfaces of 3D plastic structures

Kewei Song , Yue Cui , Tiannan Tao , Xiangyi Meng , Michinari Sone , Masahiro Yoshino , Shinjiro Umezu , Hirotaka Sato

分类：机器人

2021-12-22

构建复杂三维（3D）塑料部件上的精确微纳米金属图案允许制造用于先进应用的功能装置。但是，这种图案目前是昂贵的，需要具有长制造时间的复杂过程。本作者演示了一种用任意复杂的形状制造微纳米3D金属塑料复合结构的方法。在这种方法中，修饰光固化树脂以制备能够允许随后的化学镀（ELP）的活性前体。新开发了一种多材料数字光处理3D打印机，以使含有由标准树脂或彼此嵌套的标准树脂或有源前体树脂制成的区域的部件的制造。这些部件的选择性3D ELP处理提供了各种金属塑料复合部件，其具有复杂的中空微纳米结构，其尺寸小于40μm的尺寸规模特定的拓扑关系。使用这种技术，可以通过传统方法制造的3D金属拓扑，并且可以在塑料部件内产生金属图案作为进一步小型化电子设备的装置。所提出的方法还可以产生具有改善金属粘附到塑料基材的金属涂层。基于该技术，设计并制造了由不同功能性非金属材料和特定金属图案组成的几种传感器。本结果证明了该方法的可行性，并提出了智能3D微纳米电子，3D可穿戴设备，微/纳米传感器和医疗保健领域的潜在应用。

translated by 谷歌翻译

Part-aware Prototype Network for Few-shot Semantic Segmentation

Yongfei Liu , Xiangyi Zhang , Songyang Zhang , Xuming He

分类：计算机视觉

2020-07-13

Few-shot semantic segmentation aims to learn to segment new object classes with only a few annotated examples, which has a wide range of real-world applications. Most existing methods either focus on the restrictive setting of one-way few-shot segmentation or suffer from incomplete coverage of object regions. In this paper, we propose a novel few-shot semantic segmentation framework based on the prototype representation. Our key idea is to decompose the holistic class representation into a set of part-aware prototypes, capable of capturing diverse and fine-grained object features. In addition, we propose to leverage unlabeled data to enrich our part-aware prototypes, resulting in better modeling of intra-class variations of semantic objects. We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes based on labeled and unlabeled images. Extensive experimental evaluations on two benchmarks show that our method outperforms the prior art with a sizable margin.

translated by 谷歌翻译

Generative appearance replay for continual unsupervised domain adaptation

Boqi Chen , Kevin Thandiackal , Pushpak Pati , Orcun Goksel

分类：计算机视觉 | 人工智能

2023-01-03

Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译