智能论文笔记

PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

Kuang-Huei Lee , Ofir Nachum , Tingnan Zhang , Sergio Guadarrama , Jie Tan , Wenhao Yu

分类：机器人 | 人工智能 | 机器学习

2022-07-27

进化策略（ES）算法由于其巨大的并行能力，简单的实现，有效的参数空间探索和快速训练时间，在训练复杂的机器人控制策略中显示出令人鼓舞的结果。但是，ES的关键限制是其对大容量模型（包括现代神经网络体系结构）的可扩展性。在这项工作中，我们开发了预测信息增强随机搜索（PI-ARS），以通过利用表示表示学习来减少ES的参数搜索空间来减轻这种限制。即，PI-ARS将基于梯度的表示技术，预测信息（PI）与无梯度ES算法，增强随机搜索（ARS）结合在一起，以训练可以处理复杂机器人感觉输入并处理高度非线性机器人的策略动力学。我们在一系列具有挑战性的视觉范围任务上评估了PI-ARS，四倍的机器人需要在不平坦的踏脚石，Quincuncial Pile和移动平台上行走，并完成室内导航任务。在所有任务中，与ARS基线相比，PI-ARS表现出明显更好的学习效率和表现。我们通过证明学识渊博的政策可以成功地转移到真正的四倍机器人的情况下，进一步验证我们的算法，例如，在现实世界中的垫脚石环境上取得了100％的成功率，从而显着提高了先前的结果，从而实现了40％的成功。

translated by 谷歌翻译

Compressive Visual Representations

Kuang-Huei Lee , Anurag Arnab , Sergio Guadarrama , John Canny , Ian Fischer

分类：机器学习 | 计算机视觉

2021-09-27

学习概括不见于没有人类监督的有效视觉表现是一个基本问题，以便将机器学习施加到各种各样的任务。最近，分别是SIMCLR和BYOL的两个自我监督方法，对比学习和潜在自动启动的家庭取得了重大进展。在这项工作中，我们假设向这些算法添加显式信息压缩产生更好，更强大的表示。我们通过开发与条件熵瓶颈（CEB）目标兼容的SIMCLR和BYOL配方来验证这一点，允许我们衡量并控制学习的表示中的压缩量，并观察它们对下游任务的影响。此外，我们探讨了Lipschitz连续性和压缩之间的关系，显示了我们学习的编码器的嘴唇峰常数上的易触摸下限。由于Lipschitz连续性与稳健性密切相关，这为什么压缩模型更加强大提供了新的解释。我们的实验证实，向SIMCLR和BYOL添加压缩显着提高了线性评估精度和模型鲁棒性，跨各种域移位。特别是，Byol的压缩版本与Reset-50的ImageNet上的76.0％的线性评估精度达到了76.0％的直线评价精度，并使用Reset-50 2x的78.8％。

translated by 谷歌翻译

Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang , Vivek Rathod , Chen Sun , Menglong Zhu , Anoop Korattikara , Alireza Fathi , Ian Fischer , Zbigniew Wojna , Yang Song , Sergio Guadarrama

分类：

2016-11-30

The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-toapples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [31], R-FCN [6] and SSD [26] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

translated by 谷歌翻译

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , Trevor Darrell

分类：

2014-06-20

Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying generalpurpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (≈ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译

Measuring and Estimating Key Quality Indicators in Cloud Gaming services

Carlos Baena , O. S. Peñaherrera-Pulla , Raquel Barco , Sergio Fortes

分类：机器学习

2022-12-28

User equipment is one of the main bottlenecks facing the gaming industry nowadays. The extremely realistic games which are currently available trigger high computational requirements of the user devices to run games. As a consequence, the game industry has proposed the concept of Cloud Gaming, a paradigm that improves gaming experience in reduced hardware devices. To this end, games are hosted on remote servers, relegating users' devices to play only the role of a peripheral for interacting with the game. However, this paradigm overloads the communication links connecting the users with the cloud. Therefore, service experience becomes highly dependent on network connectivity. To overcome this, Cloud Gaming will be boosted by the promised performance of 5G and future 6G networks, together with the flexibility provided by mobility in multi-RAT scenarios, such as WiFi. In this scope, the present work proposes a framework for measuring and estimating the main E2E metrics of the Cloud Gaming service, namely KQIs. In addition, different machine learning techniques are assessed for predicting KQIs related to Cloud Gaming user's experience. To this end, the main key quality indicators (KQIs) of the service such as input lag, freeze percent or perceived video frame rate are collected in a real environment. Based on these, results show that machine learning techniques provide a good estimation of these indicators solely from network-based metrics. This is considered a valuable asset to guide the delivery of Cloud Gaming services through cellular communications networks even without access to the user's device, as it is expected for telecom operators.

translated by 谷歌翻译

Reinforcement Learning in System Identification

Jose Antonio Martin H. , Oscar Fernandez Vicente , Sergio Perez , Anas Belfadil , Cristina Ibanez-Llano , Freddy Jose Perozo Rondon , Jose Javier Valle , Javier Arechalde Pelaz

分类：机器学习 | 人工智能

2022-12-14

System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.

translated by 谷歌翻译

Sharing Linkable Learning Objects with the use of Metadata and a Taxonomy Assistant for Categorization

Valentina Franzoni , Sergio Tasso , Simonetta Pallottelli , Damiano Perri

分类：人工智能

2022-12-09

In this work, a re-design of the Moodledata module functionalities is presented to share learning objects between e-learning content platforms, e.g., Moodle and G-Lorep, in a linkable object format. The e-learning courses content of the Drupal-based Content Management System G-Lorep for academic learning is exchanged designing an object incorporating metadata to support the reuse and the classification in its context. In such an Artificial Intelligence environment, the exchange of Linkable Learning Objects can be used for dialogue between Learning Systems to obtain information, especially with the use of semantic or structural similarity measures to enhance the existent Taxonomy Assistant for advanced automated classification.

translated by 谷歌翻译

Towards a learning-based performance modeling for accelerating Deep Neural Networks

Damiano Perri , Paolo Sylos Labini , Osvaldo Gervasi , Sergio Tasso , Flavio Vella

分类：机器学习

2022-12-09

Emerging applications such as Deep Learning are often data-driven, thus traditional approaches based on auto-tuners are not performance effective across the wide range of inputs used in practice. In the present paper, we start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs). As a use-case, we focus on the ARM Compute Library which provides three different implementations of the convolution operator at different numeric precision. Starting from a collation of benchmarks, we build and validate models learned by Decision Tree and naive Bayesian classifier. Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.

translated by 谷歌翻译

Transformer-based normative modelling for anomaly detection of early schizophrenia

Pedro F Da Costa , Jessica Dafflon , Sergio Leonardo Mendes , João Ricardo Sato , M. Jorge Cardoso , Robert Leech , Emily JH Jones , Walter H. L. Pinaya

分类：机器学习 | 人工智能

2022-12-08

Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses.

translated by 谷歌翻译