智能论文笔记

Variance Reduction in Deep Learning: More Momentum is All You Need

Lionel Tondji , Sergii Kashubin , Moustapha Cisse

分类：机器学习 | 计算机视觉

2021-11-23

差异减少（VR）技术已经显着贡献，以便在光滑且强凸的设置中加速大规模数据集（Schmidt等，2017; Johnson＆Zhang，2013; Roux等，2012）。然而，由于各种因素，例如使用数据增强或正规化方法等各种因素，这种技术尚未达到相同的成功，例如诸如辍学的数据增强或正则化方法（Defazio＆Bottou，2019）。这一挑战最近促使设计了新型方差减少技术，明确定制了深度学习（Arnold等，2019; Ma＆Yarats，2018）。这项工作是沿着这个方向的额外步骤。特别是，我们利用深度学习中使用的丰富数据集的无处不在的聚类结构来设计一个可扩展的差异，通过将现有的优化器（例如，SGD +动量，准双曲线动量，隐性梯度传输，多动力组合来减少优化程序策略（袁等人。，2019）。我们的提议导致在标准基准数据集（例如，CiFar和Imagenet）上的Vanilla方法更快。它是标签噪声并适用于分布式优化是强大的。我们在JAX中提供了平行实现。

translated by 谷歌翻译

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring

Yossi Adi , Carsten Baum , Moustapha Cisse , Benny Pinkas , Joseph Keshet

分类：

2018-02-13

Deep Neural Networks have recently gained lots of success after enabling several breakthroughs in notoriously challenging problems. Training these networks is computationally expensive and requires vast amounts of training data. Selling such pre-trained models can, therefore, be a lucrative business model. Unfortunately, once the models are sold they can be easily copied and redistributed. To avoid this, a tracking mechanism to identify models as the intellectual property of a particular vendor is necessary.In this work, we present an approach for watermarking Deep Neural Networks in a black-box way. Our scheme works for general classification tasks and can easily be combined with current learning algorithms. We show experimentally that such a watermark has no noticeable impact on the primary task that the model is designed for and evaluate the robustness of our proposal against a multitude of practical attacks. Moreover, we provide a theoretical analysis, relating our approach to previous work on backdooring.

translated by 谷歌翻译

Countering Adversarial Images using Input Transformations

Chuan Guo , Mayank Rana , Moustapha Cisse , Laurens van der Maaten

分类：

2017-10-31

This paper investigates strategies that defend against adversarial-example attacks on image-classification systems by transforming the inputs before feeding them to the system. Specifically, we study applying image transformations such as bit-depth reduction, JPEG compression, total variance minimization, and image quilting before feeding the image to a convolutional network classifier. Our experiments on ImageNet show that total variance minimization and image quilting are very effective defenses in practice, in particular, when the network is trained on transformed images. The strength of those defenses lies in their non-differentiable nature and their inherent randomness, which makes it difficult for an adversary to circumvent the defenses. Our best defense eliminates 60% of strong gray-box and 90% of strong black-box attacks by a variety of major attack methods.

translated by 谷歌翻译

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang , Moustapha Cisse , Yann N. Dauphin , David Lopez-Paz

分类：

2017-10-25

Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

translated by 谷歌翻译

Parseval Networks: Improving Robustness to Adversarial Examples

Moustapha Cisse , Piotr Bojanowski , Edouard Grave , Yann Dauphin , Nicolas Usunier

分类：

2017-04-28

We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to non-square matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers (SVHN), while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks.

translated by 谷歌翻译

Personalized Student Attribute Inference

Khalid Moustapha Askia , Marie-Jean Meurs

分类：人工智能 | 机器学习

2022-12-26

Accurately predicting their future performance can ensure students successful graduation, and help them save both time and money. However, achieving such predictions faces two challenges, mainly due to the diversity of students' background and the necessity of continuously tracking their evolving progress. The goal of this work is to create a system able to automatically detect students in difficulty, for instance predicting if they are likely to fail a course. We compare a naive approach widely used in the literature, which uses attributes available in the data set (like the grades), with a personalized approach we called Personalized Student Attribute Inference (PSAI). With our model, we create personalized attributes to capture the specific background of each student. Both approaches are compared using machine learning algorithms like decision trees, support vector machine or neural networks.

translated by 谷歌翻译

Learning non-stationary and discontinuous functions using clustering, classification and Gaussian process modelling

M. Moustapha , B. Sudret

分类： (统计)机器学习 | 机器学习

2022-11-30

Surrogate models have shown to be an extremely efficient aid in solving engineering problems that require repeated evaluations of an expensive computational model. They are built by sparsely evaluating the costly original model and have provided a way to solve otherwise intractable problems. A crucial aspect in surrogate modelling is the assumption of smoothness and regularity of the model to approximate. This assumption is however not always met in reality. For instance in civil or mechanical engineering, some models may present discontinuities or non-smoothness, e.g., in case of instability patterns such as buckling or snap-through. Building a single surrogate model capable of accounting for these fundamentally different behaviors or discontinuities is not an easy task. In this paper, we propose a three-stage approach for the approximation of non-smooth functions which combines clustering, classification and regression. The idea is to split the space following the localized behaviors or regimes of the system and build local surrogates that are eventually assembled. A sequence of well-known machine learning techniques are used: Dirichlet process mixtures models (DPMM), support vector machines and Gaussian process modelling. The approach is tested and validated on two analytical functions and a finite element model of a tensile membrane structure.

translated by 谷歌翻译

Multi-objective robust optimization using adaptive surrogate models for problems with mixed continuous-categorical parameters

M. Moustapha , A. Galimshina , G. Habert , B. Sudret

分类： (统计)机器学习

2022-03-03

Explicitly accounting for uncertainties is paramount to the safety of engineering structures. Optimization which is often carried out at the early stage of the structural design offers an ideal framework for this task. When the uncertainties are mainly affecting the objective function, robust design optimization is traditionally considered. This work further assumes the existence of multiple and competing objective functions that need to be dealt with simultaneously. The optimization problem is formulated by considering quantiles of the objective functions which allows for the combination of both optimality and robustness in a single metric. By introducing the concept of common random numbers, the resulting nested optimization problem may be solved using a general-purpose solver, herein the non-dominated sorting genetic algorithm (NSGA-II). The computational cost of such an approach is however a serious hurdle to its application in real-world problems. We therefore propose a surrogate-assisted approach using Kriging as an inexpensive approximation of the associated computational model. The proposed approach consists of sequentially carrying out NSGA-II while using an adaptively built Kriging model to estimate the quantiles. Finally, the methodology is adapted to account for mixed categorical-continuous parameters as the applications involve the selection of qualitative design parameters as well. The methodology is first applied to two analytical examples showing its efficiency. The third application relates to the selection of optimal renovation scenarios of a building considering both its life cycle cost and environmental impact. It shows that when it comes to renovation, the heating system replacement should be the priority.

translated by 谷歌翻译