智能论文笔记

FaceAtlasAR: Atlas of Facial Acupuncture Points in Augmented Reality

Menghe Zhang , Jurgen Schulze , Dong Zhang

分类：计算机视觉

2021-11-29

针灸是一种技术，从业者刺激身体上的特定点。这些点，称为针灸点（或穴位），解剖学上限定皮肤上的区域相对于身体上的一些地标。传统针灸治疗依靠经验丰富的针灸师进行精确定位穴位。由于缺乏视觉线索，新手通常会发现它很难。该项目提供了Faceatlasar，一个原型系统，在增强现实（AR）上下文中定位和可视化面部穴位。该系统旨在以解剖学但可行的方式定位面部穴位和耳廓区域图，2）通过AR中的类别覆盖所要求的穴位，3）在耳朵上显示檐耳区图。我们采用MediaPipe，一个跨平台机器学习框架，构建在桌面和Android手机上运行的管道。我们在不同的基准上执行实验，包括“野外”，AMI EAR数据集和我们自己的注释数据集。结果显示面部穴位的定位精度为95％，99％/ 97％（“野生”/ ami）用于耳廓区域地图和高稳健性。通过该系统，用户甚至不是专业人士，可以快速定位穴位以获得自我压缩处理。

translated by 谷歌翻译

Retinaface: Single-shot multi-level face localisation in the wild

分类：

Though tremendous strides have been made in uncontrolled face detection, accurate and efficient 2D face alignment and 3D face reconstruction in-the-wild remain an open challenge. In this paper, we present a novel singleshot, multi-level face localisation method, named Reti-naFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane. To fill the data gap, we manually annotated five facial landmarks on the WIDER FACE dataset and employed a semiautomatic annotation pipeline to generate 3D vertices for face images from the WIDER FACE, AFLW and FDDB datasets. Based on extra annotations, we propose a mutually beneficial regression target for 3D face reconstruction, that is predicting 3D vertices projected on the image plane constrained by a common 3D topology. The proposed 3D face reconstruction branch can be easily incorporated, without any optimisation difficulty, in parallel with the existing box and 2D landmark regression branches during joint training. Extensive experimental results show that Reti-naFace can simultaneously achieve stable face detection, accurate 2D face alignment and robust 3D face reconstruction while being efficient through single-shot inference.

translated by 谷歌翻译

A Survey on Computer Vision based Human Analysis in the COVID-19 Era

Fevziye Irem Eyiokur , Alperen Kantarcı , Mustafa Ekrem Erakın , Naser Damer , Ferda Ofli , Muhammad Imran , Janez Križaj , Albert Ali Salah , Alexander Waibel , Vitomir Štruc

分类：计算机视觉

2022-11-07

The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.

translated by 谷歌翻译

SCULPTOR: Skeleton-Consistent Face Creation Using a Learned Parametric Generator

Zesong Qiu , Yuwei Li , Dongming He , Qixuan Zhang , Longwen Zhang , Yinghao Zhang , Jingya Wang , Lan Xu , Xudong Wang , Yuyao Zhang

分类：计算机视觉

2022-09-14

近年来，由于其在数字人物，角色产生和动画中的广泛应用，人们对3D人脸建模的兴趣越来越大。现有方法压倒性地强调了对面部的外部形状，质地和皮肤特性建模，而忽略了内部骨骼结构和外观之间的固有相关性。在本文中，我们使用学习的参数面部发电机提出了雕塑家，具有骨骼一致性的3D面部创作，旨在通过混合参数形态表示轻松地创建解剖上正确和视觉上令人信服的面部模型。雕塑家的核心是露西（Lucy），这是与整形外科医生合作的第一个大型形状面部脸部数据集。我们的Lucy数据集以最古老的人类祖先之一的化石命名，其中包含正牙手术前后全人头的高质量计算机断层扫描（CT）扫描，这对于评估手术结果至关重要。露西（Lucy）由144次扫描，分别对72名受试者（31名男性和41名女性）组成，其中每个受试者进行了两次CT扫描，并在恐惧后手术中进行了两次CT扫描。根据我们的Lucy数据集，我们学习了一个新颖的骨骼一致的参数面部发电机雕塑家，它可以创建独特而细微的面部特征，以帮助定义角色，同时保持生理声音。我们的雕塑家通过将3D脸的描绘成形状混合形状，姿势混合形状和面部表达混合形状，共同在统一数据驱动的框架下共同建模头骨，面部几何形状和面部外观。与现有方法相比，雕塑家在面部生成任务中保留了解剖学正确性和视觉现实主义。最后，我们展示了雕塑家在以前看不见的各种花式应用中的鲁棒性和有效性。

translated by 谷歌翻译

End-to-end Weakly-supervised Multiple 3D Hand Mesh Reconstruction from Single Image

Jinwei Ren , Jianke Zhu , Jialiang Zhang

分类：计算机视觉

2022-04-18

在本文中，我们考虑了同时找到和从单个2D图像中恢复多手的具有挑战性的任务。先前的研究要么关注单手重建，要么以多阶段的方式解决此问题。此外，常规的两阶段管道首先检测到手部区域，然后估计每个裁剪贴片的3D手姿势。为了减少预处理和特征提取中的计算冗余，我们提出了一条简洁但有效的单阶段管道。具体而言，我们为多手重建设计了多头自动编码器结构，每个HEAD网络分别共享相同的功能图并分别输出手动中心，姿势和纹理。此外，我们采用了一个弱监督的计划来减轻昂贵的3D现实世界数据注释的负担。为此，我们提出了一系列通过舞台训练方案优化的损失，其中根据公开可用的单手数据集生成具有2D注释的多手数据集。为了进一步提高弱监督模型的准确性，我们在单手和多个手设置中采用了几个功能一致性约束。具体而言，从本地功能估算的每只手的关键点应与全局功能预测的重新投影点一致。在包括Freihand，HO3D，Interhand 2.6M和RHD在内的公共基准测试的广泛实验表明，我们的方法在弱监督和完全监督的举止中优于基于最先进的模型方法。代码和模型可在{\ url {https://github.com/zijinxuxu/smhr}}上获得。

translated by 谷歌翻译

REALY: Rethinking the Evaluation of 3D Face Reconstruction

Zenghao Chai , Haoxian Zhang , Jing Ren , Di Kang , Zhengzhuo Xu , Xuefei Zhe , Chun Yuan , Linchao Bao

分类：计算机视觉

2022-03-18

3D面重建结果的评估通常取决于估计的3D模型和地面真相扫描之间的刚性形状比对。我们观察到，将两个形状与不同的参考点进行排列可以在很大程度上影响评估结果。这给精确诊断和改进3D面部重建方法带来了困难。在本文中，我们提出了一种新的评估方法，并采用了新的基准测试，包括100张全球对齐的面部扫描，具有准确的面部关键点，高质量的区域口罩和拓扑符合的网格。我们的方法执行区域形状比对，并导致计算形状误差期间更准确，双向对应关系。细粒度，区域评估结果为我们提供了有关最先进的3D面部重建方法表现的详细理解。例如，我们对基于单图像的重建方法的实验表明，DECA在鼻子区域表现最好，而Ganfit在脸颊区域的表现更好。此外，使用与我们构造的相同过程以对齐和重新构造几个3D面部数据集的新型和高质量的3DMM基础HIFI3D ++。我们将在https://realy3dface.com上发布真正的HIFI3D ++以及我们的新评估管道。

translated by 谷歌翻译

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

Lumin Xu , Sheng Jin , Wentao Liu , Chen Qian , Wanli Ouyang , Ping Luo , Xiaogang Wang

分类：计算机视觉

2022-08-23

本文调查了2D全身人类姿势估计的任务，该任务旨在将整个人体（包括身体，脚，脸部和手）局部定位在整个人体上。我们提出了一种称为Zoomnet的单网络方法，以考虑到完整人体的层次结构，并解决不同身体部位的规模变化。我们进一步提出了一个称为Zoomnas的神经体系结构搜索框架，以促进全身姿势估计的准确性和效率。Zoomnas共同搜索模型体系结构和不同子模块之间的连接，并自动为搜索的子模块分配计算复杂性。为了训练和评估Zoomnas，我们介绍了第一个大型2D人类全身数据集，即可可叶全体V1.0，它注释了133个用于野外图像的关键点。广泛的实验证明了Zoomnas的有效性和可可叶v1.0的重要性。

translated by 谷歌翻译

Py-Feat: Python Facial Expression Analysis Toolbox

Eshin Jolly , Jin Hyun Cheong , Tiankang Xie , Sophie Byrne , Matthew Kenny , Luke J. Chang

分类：计算机视觉 | 机器学习

2021-04-08

Studying facial expressions is a notoriously difficult endeavor. Recent advances in the field of affective computing have yielded impressive progress in automatically detecting facial expressions from pictures and videos. However, much of this work has yet to be widely disseminated in social science domains such as psychology. Current state of the art models require considerable domain expertise that is not traditionally incorporated into social science training programs. Furthermore, there is a notable absence of user-friendly and open-source software that provides a comprehensive set of tools and functions that support facial expression research. In this paper, we introduce Py-Feat, an open-source Python toolbox that provides support for detecting, preprocessing, analyzing, and visualizing facial expression data. Py-Feat makes it easy for domain experts to disseminate and benchmark computer vision models and also for end users to quickly process, analyze, and visualize face expression data. We hope this platform will facilitate increased use of facial expression data in human behavior research.

translated by 谷歌翻译

Automatic Gaze Analysis: A Survey of Deep Learning based Approaches

Shreya Ghosh , Abhinav Dhall , Munawar Hayat , Jarrod Knibbe , Qiang Ji

分类：计算机视觉

2021-08-12

眼目光分析是计算机视觉和人类计算机相互作用领域的重要研究问题。即使在过去十年中取得了显着进展，由于眼睛外观，眼头相互作用，遮挡，图像质量和照明条件的独特性，自动凝视分析仍然具有挑战性。有几个开放的问题，包括在没有先验知识的情况下，在不受限制的环境中解释凝视方向的重要提示以及如何实时编码它们。我们回顾了一系列目光分析任务和应用程序的进展，以阐明这些基本问题，确定凝视分析中的有效方法并提供可能的未来方向。我们根据其优势和报告的评估指标分析了最近的凝视估计和分割方法，尤其是在无监督和弱监督的领域中。我们的分析表明，强大而通用的凝视分析方法的开发仍然需要解决现实世界中的挑战，例如不受限制的设置和学习，并减少了监督。最后，我们讨论了设计现实的目光分析系统的未来研究方向，该系统可以传播到其他领域，包括计算机视觉，增强现实（AR），虚拟现实（VR）和人类计算机交互（HCI）。项目页面：https：//github.com/i-am-shreya/eyegazesurvey} {https://github.com/i-am-shreya/eyegazesurvey

translated by 谷歌翻译

6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques for a Fast and Accurate Object Grasping

Tuan-Tang Le , Trung-Son Le , Yu-Ru Chen , Joel Vidal , Chyi-Yeu Lin

分类：计算机视觉 | 机器人

2021-11-11

实时机器人掌握，支持随后的精确反对操作任务，是高级高级自治系统的优先目标。然而，尚未找到这样一种可以用时间效率进行充分准确的掌握的算法。本文提出了一种新的方法，其具有2阶段方法，它使用深神经网络结合快速的2D对象识别，以及基于点对特征框架的随后的精确和快速的6D姿态估计来形成实时3D对象识别和抓握解决方案能够多对象类场景。所提出的解决方案有可能在实时应用上稳健地进行，需要效率和准确性。为了验证我们的方法，我们进行了广泛且彻底的实验，涉及我们自己的数据集的费力准备。实验结果表明，该方法在5CM5DEG度量标准中的精度97.37％，平均距离度量分数99.37％。实验结果显示了通过使用该方法的总体62％的相对改善（5cm5deg度量）和52.48％（平均距离度量）。此外，姿势估计执行也显示出运行时间的平均改善47.6％。最后，为了说明系统在实时操作中的整体效率，进行了一个拾取和放置的机器人实验，并显示了90％的准确度的令人信服的成功率。此实验视频可在https://sites.google.com/view/dl-ppf6dpose/上获得。

translated by 谷歌翻译

A Deeper Look into DeepCap

Marc Habermann , Weipeng Xu , Michael Zollhoefer , Gerard Pons-Moll , Christian Theobalt

分类：计算机视觉

2021-11-20

人类性能捕获是一种非常重要的计算机视觉问题，在电影制作和虚拟/增强现实中具有许多应用。许多以前的性能捕获方法需要昂贵的多视图设置，或者没有恢复具有帧到帧对应关系的密集时空相干几何。我们提出了一种新颖的深度致密人体性能捕获的深层学习方法。我们的方法是基于多视图监督的弱监督方式培训，完全删除了使用3D地面真理注释的培训数据的需求。网络架构基于两个单独的网络，将任务解散为姿势估计和非刚性表面变形步骤。广泛的定性和定量评估表明，我们的方法在质量和稳健性方面优于现有技术。这项工作是DeepCAP的扩展版本，在那里我们提供更详细的解释，比较和结果以及应用程序。

translated by 谷歌翻译

A Review of 3D Face Reconstruction From a Single Image

Hanxin Wang

分类：计算机视觉

2021-10-13

3D面部重建是一个具有挑战性的问题，但也是计算机视觉和图形领域的重要任务。最近，许多研究人员对这个问题提请注意，并且已经发表了大量的文章。单个图像重建是3D面部重建的分支之一，在我们的生活中具有大量应用。本文是对从单个图像的3D面部重建最近的文献述评。

translated by 谷歌翻译

Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Lei Li , Tianfang Zhang , Stefan Oehmcke , Fabian Gieseke , Christian Igel

分类：计算机视觉

2022-12-18

Fine-grained semantic segmentation of a person's face and head, including facial parts and head components, has progressed a great deal in recent years. However, it remains a challenging task, whereby considering ambiguous occlusions and large pose variations are particularly difficult. To overcome these difficulties, we propose a novel framework termed Mask-FPAN. It uses a de-occlusion module that learns to parse occluded faces in a semi-supervised way. In particular, face landmark localization, face occlusionstimations, and detected head poses are taken into account. A 3D morphable face model combined with the UV GAN improves the robustness of 2D face parsing. In addition, we introduce two new datasets named FaceOccMask-HQ and CelebAMaskOcc-HQ for face paring work. The proposed Mask-FPAN framework addresses the face parsing problem in the wild and shows significant performance improvements with MIOU from 0.7353 to 0.9013 compared to the state-of-the-art on challenging face datasets.

translated by 谷歌翻译

3D face reconstruction with dense landmarks

Erroll Wood , Tadas Baltrusaitis , Charlie Hewitt , Matthew Johnson , Jingjing Shen , Nikola Milosavljevic , Daniel Wilde , Stephan Garbin , Chirag Raman , Jamie Shotton

分类：计算机视觉

2022-04-06

地标通常在面部分析中起关键作用，但是仅凭稀疏地标就不能代表身份或表达的许多方面。因此，为了更准确地重建面，地标通常与其他信号（如深度图像或技术）相结合，例如可区分渲染。我们可以通过使用更多地标使事情变得简单吗？在答案中，我们提出了第一种准确地预测10倍地标的方法，覆盖整个头部，包括眼睛和牙齿。这是使用合成培训数据来完成的，该数据保证了完美的地标注释。通过将可变形的模型拟合到这些密集的地标，我们可以在野外实现单眼3D面重建的最新结果。我们表明，密集的地标是通过在单眼和多视图方案中展示准确和表现力的面部绩效捕获来整合跨帧面部形状信息的理想信号。这种方法也非常有效：我们可以预测密集的地标，并在单个CPU线程上以超过150fps的速度适合我们的3D面模型。请参阅我们的网站：https：//microsoft.github.io/denselandmarks/。

translated by 谷歌翻译

Procedural Humans for Computer Vision

Charlie Hewitt , Tadas Baltrušaitis , Erroll Wood , Lohit Petikam , Louis Florentin , Hanz Cuevas Velasquez

分类：计算机视觉

2023-01-03

Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving to face landmark detection and reconstruction. There are a number of benefits of using synthetic data from privacy preservation and bias elimination to quality and feasibility of annotation. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective machine learning models can be trained using synthetic face data alone. We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety, with ground-truth annotations for computer vision applications. In this report we describe how we construct a parametric model of the face and body, including articulated hands; our rendering pipeline to generate realistic images of humans based on this body model; an approach for training DNNs to regress a dense set of landmarks covering the entire body; and a method for fitting our body model to dense landmarks predicted from multiple views.

translated by 谷歌翻译

Weakly-Supervised Gaze Estimation from Synthetic Views

Evangelos Ververas , Polydefkis Gkagkos , Jiankang Deng , Jia Guo , Michail Christos Doukas , Stefanos Zafeiriou

分类：计算机视觉

2022-12-06

3D gaze estimation is most often tackled as learning a direct mapping between input images and the gaze vector or its spherical coordinates. Recently, it has been shown that pose estimation of the face, body and hands benefits from revising the learning target from few pose parameters to dense 3D coordinates. In this work, we leverage this observation and propose to tackle 3D gaze estimation as regression of 3D eye meshes. We overcome the absence of compatible ground truth by fitting a rigid 3D eyeball template on existing gaze datasets and propose to improve generalization by making use of widely available in-the-wild face images. To this end, we propose an automatic pipeline to retrieve robust gaze pseudo-labels from arbitrary face images and design a multi-view supervision framework to balance their effect during training. In our experiments, our method achieves improvement of 30% compared to state-of-the-art in cross-dataset gaze estimation, when no ground truth data are available for training, and 7% when they are. We make our project publicly available at https://github.com/Vagver/dense3Deyes.

translated by 谷歌翻译

Generative Neural Articulated Radiance Fields

Alexander W. Bergman , Petr Kellnhofer , Yifan Wang , Eric R. Chan , David B. Lindell , Gordon Wetzstein

分类：计算机视觉

2022-06-28

仅使用单视2D照片的收藏集对3D感知生成对抗网络（GAN）的无监督学习最近取得了很多进展。然而，这些3D gan尚未证明人体，并且现有框架的产生的辐射场不是直接编辑的，从而限制了它们在下游任务中的适用性。我们通过开发一个3D GAN框架来解决这些挑战的解决方案，该框架学会在规范的姿势中生成人体或面部的辐射场，并使用显式变形场将其扭曲成所需的身体姿势或面部表达。使用我们的框架，我们展示了人体的第一个高质量的辐射现场生成结果。此外，我们表明，与未接受明确变形训练的3D GAN相比，在编辑其姿势或面部表情时，我们的变形感知训练程序可显着提高产生的身体或面部的质量。

translated by 谷歌翻译

ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment

Yangyu Huang , Hao Yang , Chong Li , Jongyoo Kim , Fangyun Wei

分类：计算机视觉

2021-09-13

The recent progress of CNN has dramatically improved face alignment performance. However, few works have paid attention to the error-bias with respect to error distribution of facial landmarks. In this paper, we investigate the error-bias issue in face alignment, where the distributions of landmark errors tend to spread along the tangent line to landmark curves. This error-bias is not trivial since it is closely connected to the ambiguous landmark labeling task. Inspired by this observation, we seek a way to leverage the error-bias property for better convergence of CNN model. To this end, we propose anisotropic direction loss (ADL) and anisotropic attention module (AAM) for coordinate and heatmap regression, respectively. ADL imposes strong binding force in normal direction for each landmark point on facial boundaries. On the other hand, AAM is an attention module which can get anisotropic attention mask focusing on the region of point and its local edge connected by adjacent points, it has a stronger response in tangent than in normal, which means relaxed constraints in the tangent. These two methods work in a complementary manner to learn both facial structures and texture details. Finally, we integrate them into an optimized end-to-end training pipeline named ADNet. Our ADNet achieves state-of-the-art results on 300W, WFLW and COFW datasets, which demonstrates the effectiveness and robustness.

translated by 谷歌翻译

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

Kaipeng Zhang , Zhanpeng Zhang , Zhifeng Li , Yu Qiao

分类：

2016-04-11

translated by 谷歌翻译

A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19

Bingshu Wang , Jiangbin Zheng , C. L. Philip Chen

分类：计算机视觉 | 机器学习

2022-01-13

2019年冠状病毒疾病（Covid-19）继续自爆发以来对世界产生巨大挑战。为了对抗这种疾病，开发了一系列人工智能（AI）技术，并应用于现实世界的情景，如安全监测，疾病诊断，感染风险评估，Covid-19 CT扫描的病变细分等。 Coronavirus流行病迫使人们佩戴面膜来抵消病毒的传播，这也带来了监控戴着面具的大群人群的困难。在本文中，我们主要关注蒙面面部检测和相关数据集的AI技术。从蒙面面部检测数据集的描述开始，我们调查了最近的进步。详细描述并详细讨论了十三可用数据集。然后，该方法大致分为两类：传统方法和基于神经网络的方法。常规方法通常通过用手工制作的特征升高算法来训练，该算法占少比例。基于神经网络的方法根据处理阶段的数量进一步归类为三个部分。详细描述了代表性算法，与一些简要描述的一些典型技术耦合。最后，我们总结了最近的基准测试结果，讨论了关于数据集和方法的局限性，并扩大了未来的研究方向。据我们所知，这是关于蒙面面部检测方法和数据集的第一次调查。希望我们的调查可以提供一些帮助对抗流行病的帮助。

translated by 谷歌翻译