智能论文笔记

Adaptation through prediction: multisensory active inference torque control

Cristian Meo , Giovanni Franzese , Corrado Pezzato , Max Spahn , Pablo Lanillos

分类：机器人 | 人工智能

2021-12-13

适应外部和内部变化是不确定环境中机器人系统的专业。在这里，我们提出了一种用于工业武器的新型多思科有源推理扭矩控制器，其显示如何使用预测来解决适应性。我们的控制器灵感来自预测性大脑假设，通过在简化架构的同时结合高速和高维传感器输入（例如，原始图像）的学习和多模式集成来提高当前有源推断方法的能力。我们通过将其与先前的有源推理基准和经典控制器进行比较，对我们的行为进行比较，对我们的行为进行了比较了定性和定量适应能力和控制精度，对我们的行为进行了系统评估。结果表明，由于多模式滤波，具有高噪声抑制的目标导向的控制精度提高，并且对动态惯性变化，弹性约束和人类干扰的适应性而无需释放模型，也不需要参数重新定量。

translated by 谷歌翻译

Active Inference in Robotics and Artificial Agents: Survey and Challenges

Pablo Lanillos , Cristian Meo , Corrado Pezzato , Ajith Anil Meera , Mohamed Baioumy , Wataru Ohata , Alexander Tschantz , Beren Millidge , Martijn Wisse , Christopher L. Buckley

分类：机器人 | 人工智能 | 机器学习

2021-12-03

有效推论是一种数学框架，它起源于计算神经科学，作为大脑如何实现动作，感知和学习的理论。最近，已被证明是在不确定性下存在国家估算和控制问题的有希望的方法，以及一般的机器人和人工代理人的目标驱动行为的基础。在这里，我们审查了最先进的理论和对国家估计，控制，规划和学习的积极推断的实现;描述当前的成就，特别关注机器人。我们展示了相关实验，以适应，泛化和稳健性而言说明其潜力。此外，我们将这种方法与其他框架联系起来，并讨论其预期的利益和挑战：使用变分贝叶斯推理具有功能生物合理性的统一框架。

translated by 谷歌翻译

OF-AE: Oblique Forest AutoEncoders

Cristian Daniel Alecsa

分类：机器学习

2023-01-02

In the present work we propose an unsupervised ensemble method consisting of oblique trees that can address the task of auto-encoding, namely Oblique Forest AutoEncoders (briefly OF-AE). Our method is a natural extension of the eForest encoder introduced in [1]. More precisely, by employing oblique splits consisting in multivariate linear combination of features instead of the axis-parallel ones, we will devise an auto-encoder method through the computation of a sparse solution of a set of linear inequalities consisting of feature values constraints. The code for reproducing our results is available at https://github.com/CDAlecsa/Oblique-Forest-AutoEncoders.

translated by 谷歌翻译

Constructing Organism Networks from Collaborative Self-Replicators

Steffen Illium , Maximilian Zorn , Cristian Lenta , Michael Kölle , Claudia Linnhoff-Popien , Thomas Gabor

分类：神经与进化计算 | 机器学习

2022-12-20

We introduce organism networks, which function like a single neural network but are composed of several neural particle networks; while each particle network fulfils the role of a single weight application within the organism network, it is also trained to self-replicate its own weights. As organism networks feature vastly more parameters than simpler architectures, we perform our initial experiments on an arithmetic task as well as on simplified MNIST-dataset classification as a collective. We observe that individual particle networks tend to specialise in either of the tasks and that the ones fully specialised in the secondary task may be dropped from the network without hindering the computational accuracy of the primary task. This leads to the discovery of a novel pruning-strategy for sparse neural networks

translated by 谷歌翻译

Emergent communication enhances foraging behaviour in evolved swarms controlled by Spiking Neural Networks

Cristian Jimenez Romero , Alper Yegenoglu , Aarón Pérez Martín , Sandra Diaz-Pier , Abigail Morrison

分类：神经与进化计算

2022-12-16

Social insects such as ants communicate via pheromones which allows them to coordinate their activity and solve complex tasks as a swarm, e.g. foraging for food. This behaviour was shaped through evolutionary processes. In computational models, self-coordination in swarms has been implemented using probabilistic or action rules to shape the decision of each agent and the collective behaviour. However, manual tuned decision rules may limit the behaviour of the swarm. In this work we investigate the emergence of self-coordination and communication in evolved swarms without defining any rule. We evolve a swarm of agents representing an ant colony. We use a genetic algorithm to optimize a spiking neural network (SNN) which serves as an artificial brain to control the behaviour of each agent. The goal of the colony is to find optimal ways to forage for food in the shortest amount of time. In the evolutionary phase, the ants are able to learn to collaborate by depositing pheromone near food piles and near the nest to guide its cohorts. The pheromone usage is not encoded into the network; instead, this behaviour is established through the optimization procedure. We observe that pheromone-based communication enables the ants to perform better in comparison to colonies where communication did not emerge. We assess the foraging performance by comparing the SNN based model to a rule based system. Our results show that the SNN based model can complete the foraging task more efficiently in a shorter time. Our approach illustrates that even in the absence of pre-defined rules, self coordination via pheromone emerges as a result of the network optimization. This work serves as a proof of concept for the possibility of creating complex applications utilizing SNNs as underlying architectures for multi-agent interactions where communication and self-coordination is desired.

translated by 谷歌翻译

HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

Andrei Zanfir , Mihai Zanfir , Alexander Gorban , Jingwei Ji , Yin Zhou , Dragomir Anguelov , Cristian Sminchisescu

分类：计算机视觉

2022-12-15

Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.

translated by 谷歌翻译

PhoMoH: Implicit Photorealistic 3D Models of Human Heads

Mihai Zanfir , Thiemo Alldieck , Cristian Sminchisescu

分类：计算机视觉

2022-12-14

We present PhoMoH, a neural network methodology to construct generative models of photorealistic 3D geometry and appearance of human heads including hair, beards, clothing and accessories. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photorealistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and allow the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.

translated by 谷歌翻译

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

Enric Corona , Mihai Zanfir , Thiemo Alldieck , Eduard Gabriel Bazavan , Andrei Zanfir , Cristian Sminchisescu

分类：计算机视觉

2022-12-13

We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.

translated by 谷歌翻译

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

Zhiheng Li , Ivan Evtimov , Albert Gordo , Caner Hazirbas , Tal Hassner , Cristian Canton Ferrer , Chenliang Xu , Mark Ibrahim

分类：计算机视觉

2022-12-09

Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are rife with multiple visual cues from background to texture. Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i.e., where mitigating one shortcut amplifies reliance on others. To address this shortcoming, we propose two benchmarks: 1) UrbanCars, a dataset with precisely controlled spurious cues, and 2) ImageNet-W, an evaluation set based on ImageNet for watermark, a shortcut we discovered affects nearly every modern vision model. Along with texture and background, ImageNet-W allows us to study multiple shortcuts emerging from training on natural images. We find computer vision models, including large foundation models -- regardless of training set, architecture, and supervision -- struggle when multiple shortcuts are present. Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. Our results surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems. The datasets and code are released: https://github.com/facebookresearch/Whac-A-Mole.git.

translated by 谷歌翻译

Reinforcement Learning for UAV control with Policy and Reward Shaping

Cristian Millán-Arias , Ruben Contreras , Francisco Cruz , Bruno Fernandes

分类：人工智能 | 机器学习 | 机器人

2022-12-06

In recent years, unmanned aerial vehicle (UAV) related technology has expanded knowledge in the area, bringing to light new problems and challenges that require solutions. Furthermore, because the technology allows processes usually carried out by people to be automated, it is in great demand in industrial sectors. The automation of these vehicles has been addressed in the literature, applying different machine learning strategies. Reinforcement learning (RL) is an automation framework that is frequently used to train autonomous agents. RL is a machine learning paradigm wherein an agent interacts with an environment to solve a given task. However, learning autonomously can be time consuming, computationally expensive, and may not be practical in highly-complex scenarios. Interactive reinforcement learning allows an external trainer to provide advice to an agent while it is learning a task. In this study, we set out to teach an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously. Two simulated scenarios were proposed for the training; one without obstacles and one with obstacles. We also studied the influence of each technique. The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach. Nevertheless, the agent achieves lower execution times and less dispersion during training.

translated by 谷歌翻译