Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.
translated by 谷歌翻译
We present PhoMoH, a neural network methodology to construct generative models of photorealistic 3D geometry and appearance of human heads including hair, beards, clothing and accessories. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photorealistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and allow the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.
translated by 谷歌翻译
We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.
translated by 谷歌翻译
我们提出了Blazepose Ghum整体,这是一种针对3D人体地标和姿势估计的轻型神经网络管道,专门针对实时的实时推论量身定制。Blazepose Ghum整体可以从单个RGB图像中捕获运动捕获,包括头像控制,健身跟踪和AR/VR效果。我们的主要贡献包括i)一种用于3D地面真相数据获取的新方法,ii)更新了3D身体跟踪,并使用其他手工标记和iii)从单眼图像中进行全身姿势估算。
translated by 谷歌翻译
用于3D人类传感的最新技术的进展目前受到3D地面真理的缺乏视觉数据集的限制,包括多个人,运动,在现实世界环境中运行,具有复杂的照明或遮挡,并且可能观察到移动相机。复杂的场景理解需要估计人类的姿势和形状以及手势,朝着最终将有用的度量和行为信号与自由视点相结合的表示来估计的表示。为了维持进步,我们建立了一个大型的照片 - 现实数据集,人类空间(HSPACE),用于复杂的合成室内和室外环境中的动画人。我们将百种不同的年龄,性别,比例和种族相结合,以及数百个动作和场景,以及身体形状的参数变化(总共1,600种不同的人类),以产生初始数据集超过100万帧。人类的动画是通过拟合表达的人体模型,以单身扫描人们来获得,其次是新的重新定位和定位程序,支持穿着人的人类的现实动画,身体比例的统计变化,以及联合一致的场景放置多个移动的人。资产在规模上自动生成,并与现有的实时渲染和游戏引擎兼容。具有评估服务器的数据集将可用于研究。我们的大规模分析了合成数据的影响,与实际数据和弱监管有关,强调了持续质量改进和限制了这种实际设置,与模型容量增加的实际设定的相当大的潜力。
translated by 谷歌翻译
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.
translated by 谷歌翻译
This paper describes the 5th edition of the Predicting Video Memorability Task as part of MediaEval2022. This year we have reorganised and simplified the task in order to lubricate a greater depth of inquiry. Similar to last year, two datasets are provided in order to facilitate generalisation, however, this year we have replaced the TRECVid2019 Video-to-Text dataset with the VideoMem dataset in order to remedy underlying data quality issues, and to prioritise short-term memorability prediction by elevating the Memento10k dataset as the primary dataset. Additionally, a fully fledged electroencephalography (EEG)-based prediction sub-task is introduced. In this paper, we outline the core facets of the task and its constituent sub-tasks; describing the datasets, evaluation metrics, and requirements for participant submissions.
translated by 谷歌翻译
The Predicting Media Memorability task in the MediaEval evaluation campaign has been running annually since 2018 and several different tasks and data sets have been used in this time. This has allowed us to compare the performance of many memorability prediction techniques on the same data and in a reproducible way and to refine and improve on those techniques. The resources created to compute media memorability are now being used by researchers well beyond the actual evaluation campaign. In this paper we present a summary of the task, including the collective lessons we have learned for the research community.
translated by 谷歌翻译
This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interactions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and recent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assumption, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interactions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, including designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector regressions. Extensive experiments on two common applications demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods.
translated by 谷歌翻译
Determining the author of a text is a difficult task. Here we compare multiple AI techniques for classifying literary texts written by multiple authors by taking into account a limited number of speech parts (prepositions, adverbs, and conjunctions). We also introduce a new dataset composed of texts written in the Romanian language on which we have run the algorithms. The compared methods are Artificial Neural Networks, Support Vector Machines, Multi Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour. Numerical experiments show, first of all, that the problem is difficult, but some algorithms are able to generate decent errors on the test set.
translated by 谷歌翻译