Binarized Neural Networks (BNNs) are receiving increasing attention due to their lightweight architecture and ability to run on low-power devices. The state-of-the-art for training classification BNNs restricted to few-shot learning is based on a Mixed Integer Programming (MIP) approach. This paper proposes the BeMi ensemble, a structured architecture of BNNs based on training a single BNN for each possible pair of classes and applying a majority voting scheme to predict the final output. The training of a single BNN discriminating between two classes is achieved by a MIP model that optimizes a lexicographic multi-objective function according to robustness and simplicity principles. This approach results in training networks whose output is not affected by small perturbations on the input and whose number of active weights is as small as possible, while good accuracy is preserved. We computationally validate our model using the MNIST and Fashion-MNIST datasets using up to 40 training images per class. Our structured ensemble outperforms both BNNs trained by stochastic gradient descent and state-of-the-art MIP-based approaches. While the previous approaches achieve an average accuracy of 51.1% on the MNIST dataset, the BeMi ensemble achieves an average accuracy of 61.7% when trained with 10 images per class and 76.4% when trained with 40 images per class.
translated by 谷歌翻译
最近的工作表明,使用混合整数编程(MIP)求解器来优化神经网络(NNS)的某些方面的潜力。但是,用MIP求解器训练NNS的有趣方法尚未探索。训练NNS的最先进的方法通常基于梯度,需要大量数据,GPU计算以及广泛的超参数调整。相比之下,使用MIP求解器的培训不需要GPU或重型参数调整,但目前除了少量数据外无法处理任何事情。本文以最新的进步为基础,该进步使用MIP求解器训练NNS。我们通过制定新的MIP模型来超越当前的工作,从而提高训练效率,并可以培训重要的整数值为值的神经网络(INNS)。我们提供了两种新型方法,以进一步使用MIP训练NNS的潜在意义。第一种方法在训练时优化了NN中神经元的数量。这减少了在培训之前确定网络体系结构的需求。第二种方法解决了MIP可以处理的训练数据量:我们提供了一种批处理培训方法,该方法可大大增加MIP求解器可以用来训练的数据量。因此,我们为使用MIP模型训练NNS时提供了比以前更多的数据的有希望的步骤。关于两个现实世界中数据限制数据集的实验结果表明,就准确性,训练时间和数据数量而言,我们的方法在用MIP训练NN中强烈优于先前的最新技术。当可获得最小的培训数据以及具有最小内存要求的培训时,我们的方法精通培训NNS,这对于部署到低内存设备而言可能是有价值的。
translated by 谷歌翻译
我们研究了通过具有整流线性单元(Relu)激活的前馈神经网络建模目标函数的优化问题。最近的文献已经探讨了单一神经网络的使用来模拟目标函数内的不确定或复杂元素。然而,众所周知,神经网络的集合产生更稳定的预测,并且具有比具有单个神经网络的模型更好的普遍性,这表明在决策管道中应用神经网络的集合。我们研究如何将神经网络集合纳入优化模型的客观函数,并探索随后的问题的计算方法。我们基于现有流行的大量M $配方提供了一种混合整数线性程序,以优化单个神经网络。我们为我们的模型开发了两个加速技术,首先是一种预处理程序,用于拧紧神经网络中的关键神经元的界限,而第二个是基于弯曲分解的一组有效的不等式。我们解决方案方法的实验评估在一个全球优化问题和两个现实世界数据集中进行;结果表明,我们的优化算法在计算时间和最优性间隙方面优于最先进的方法的适应。
translated by 谷歌翻译
深度神经网络(DNN)已成为实现各种复杂任务的首选技术。但是,正如许多最近的研究所强调的那样,即使是对正确分类的输入的不可察觉的扰动也可能导致DNN错误分类。这使DNNS容易受到攻击者的战略输入操作,并且对环境噪声过敏。为了减轻这种现象,从业人员通过DNNS的“合奏”进行联合分类。通过汇总不同单个DNN的分类输出对相同的输入,基于合奏的分类可以减少因任何单个DNN的随机训练过程的特定实现而导致错误分类的风险。但是,DNN集合的有效性高度依赖于其成员 *在许多不同的输入上没有同时错误 *。在本案例研究中,我们利用DNN验证的最新进展,设计一种方法来识别一种合奏组成,即使输入对对抗性进行了扰动,也不太容易出现同时误差 - 从而导致基于更坚固的集合分类。我们提出的框架使用DNN验证器作为后端,并包括启发式方法,有助于降低直接验证合奏的高复杂性。从更广泛的角度来看,我们的工作提出了一个新颖的普遍目标,以实现正式验证,该目标可能可以改善各种应用领域的现实世界中基于深度学习的系统的鲁棒性。
translated by 谷歌翻译
我们考虑非线性优化问题,涉及神经网络代表代理模型。我们首先展示了如何直接将神经网络评估嵌入优化模型中,突出难以防止收敛的方法,然后表征这些模型的平稳性。然后,我们在具有Relu激活的前馈神经网络的特定情况下存在两种替代配方,其具有recu激活:作为混合整数优化问题,作为具有互补限制的数学程序。对于后一种制剂,我们证明了在该问题的点处的有同性,对应于嵌入式制剂的实质性。这些配方中的每一个都可以用最先进的优化方法来解决,并且我们展示了如何为这些方法获得良好的初始可行解决方案。我们将三种实际应用的配方进行比较,在燃烧发动机的设计和控制中产生的三种实际应用,在对分类器网络的对抗攻击中产生的产生,以及在油井网中的最佳流动确定。
translated by 谷歌翻译
Learning monotonic models with respect to a subset of the inputs is a desirable feature to effectively address the fairness, interpretability, and generalization issues in practice. Existing methods for learning monotonic neural networks either require specifically designed model structures to ensure monotonicity, which can be too restrictive/complicated, or enforce monotonicity by adjusting the learning process, which cannot provably guarantee the learned model is monotonic on selected features. In this work, we propose to certify the monotonicity of the general piece-wise linear neural networks by solving a mixed integer linear programming problem.This provides a new general approach for learning monotonic neural networks with arbitrary model structures. Our method allows us to train neural networks with heuristic monotonicity regularizations, and we can gradually increase the regularization magnitude until the learned network is certified monotonic. Compared to prior works, our approach does not require human-designed constraints on the weight space and also yields more accurate approximation. Empirical studies on various datasets demonstrate the efficiency of our approach over the state-of-the-art methods, such as Deep Lattice Networks.
translated by 谷歌翻译
我们提出了一种用于训练深神经网络的新型混合算法,该算法将最先进的梯度下降(GD)方法与混合整数线性编程(MILP)求解器相结合,以准确性以及变体的差异以及变体,以及回归和分类任务的资源和数据效率。我们的GD+求解器混合算法称为GDSolver,工作如下:给定DNN $ d $作为输入,GDSolver召集GD派出部分训练$ d $,直到卡在当地的最小值中,这一点GDSOLVER将Milp Solver召集到一定程度上详尽地搜索损失景观的区域,围绕$ d $的最后一层参数的重量分配,目的是贯穿并逃脱本地的最小值。重复该过程,直到达到所需的准确性。在我们的实验中,我们发现GDSolver不仅可以很好地扩展到其他数据和非常大的模型大小,而且还优于收敛和数据效率率的所有其他竞争方法。对于回归任务,GDOLVER生产的模型平均在48%的时间内降低了31.5%,并且对于MNIST和CIFAR10的分类任务,GDSOLVER仅使用所有竞争方法就能达到最高精度,仅使用50% GD基准需要的培训数据。
translated by 谷歌翻译
在过去的十年中,神经网络(NNS)已被广泛用于许多应用程序,包括安全系统,例如自主系统。尽管采用了新兴的采用,但众所周知,NNS容易受到对抗攻击的影响。因此,提供确保此类系统正常工作的保证非常重要。为了解决这些问题,我们介绍了一个修复不安全NNS W.R.T.的框架。安全规范,即利用可满足的模型理论(SMT)求解器。我们的方法能够通过仅修改其重量值的一些重量值来搜索新的,安全的NN表示形式。此外,我们的技术试图最大程度地提高与原始网络在其决策边界方面的相似性。我们进行了广泛的实验,以证明我们提出的框架能够产生安全NNS W.R.T.的能力。对抗性的鲁棒性特性,只有轻度的准确性损失(就相似性而言)。此外,我们将我们的方法与天真的基线进行比较,以证明其有效性。总而言之,我们提供了一种算法以自动修复具有安全性的算法,并建议一些启发式方法以提高其计算性能。当前,通过遵循这种方法,我们能够产生由分段线性relu激活函数组成的小型(即具有多达数百个参数)的小型(即具有多达数百个参数)。然而,我们的框架是可以合成NNS W.R.T.的一般框架。一阶逻辑规范的任何可决定片段。
translated by 谷歌翻译
我们引入了一种新型的数学公式,用于训练以(可能非平滑)近端图作为激活函数的馈送前向神经网络的培训。该公式基于布雷格曼的距离,关键优势是其相对于网络参数的部分导数不需要计算网络激活函数的导数。我们没有使用一阶优化方法和后传播的组合估算参数(如最先进的),而是建议使用非平滑一阶优化方法来利用特定结构新颖的表述。我们提出了几个数值结果,这些结果表明,与更常规的培训框架相比,这些训练方法可以很好地很好地适合于培训基于神经网络的分类器和具有稀疏编码的(DeNoising)自动编码器。
translated by 谷歌翻译
We present an approach for the verification of feed-forward neural networks in which all nodes have a piece-wise linear activation function. Such networks are often used in deep learning and have been shown to be hard to verify for modern satisfiability modulo theory (SMT) and integer linear programming (ILP) solvers.The starting point of our approach is the addition of a global linear approximation of the overall network behavior to the verification problem that helps with SMT-like reasoning over the network behavior. We present a specialized verification algorithm that employs this approximation in a search process in which it infers additional node phases for the non-linear nodes in the network from partial node phase assignments, similar to unit propagation in classical SAT solving. We also show how to infer additional conflict clauses and safe node fixtures from the results of the analysis steps performed during the search. The resulting approach is evaluated on collision avoidance and handwritten digit recognition case studies.
translated by 谷歌翻译
人工神经网络(ANNS)是普遍存在的机器学习模型,这些模型已应用于各种现实世界分类任务。 ANNS需要大量数据来强大的样本性能,并且许多用于训练ANN参数的算法基于随机梯度下降(SGD)。然而,倾向于在预测任务上最佳地执行最佳的SGD ANN在结束以结束的方式培训,这需要大量模型参数和随机初始化。这意味着培训Anns非常耗时,所产生的模型需要大量的内存来部署。为了培养更多的宽松安卡型号,我们建议使用来自受限优化文献的替代方法,以便安训练和预先预测。特别是,我们提出了用于训练完全连接的ANN的新型混合整数编程(MIP)制剂。我们的配方可以考虑二进制激活和整流的线性单元(Relu)激活Ann,以及用于使用日志似然损耗。我们还开发了一个层展的贪婪方法,一种技术适用于减少ANN中的层数,用于使用我们的MIP制剂的模型预估计。然后,我们将基于MIP的方法与基于SGD的现有方法进行比较,并表明我们能够实现具有竞争力的模型,这些模型具有明显更加解析的样本性能。
translated by 谷歌翻译
In recent years there has been growing attention to interpretable machine learning models which can give explanatory insights on their behavior. Thanks to their interpretability, decision trees have been intensively studied for classification tasks, and due to the remarkable advances in mixed-integer programming (MIP), various approaches have been proposed to formulate the problem of training an Optimal Classification Tree (OCT) as a MIP model. We present a novel mixed-integer quadratic formulation for the OCT problem, which exploits the generalization capabilities of Support Vector Machines for binary classification. Our model, denoted as Margin Optimal Classification Tree (MARGOT), encompasses the use of maximum margin multivariate hyperplanes nested in a binary tree structure. To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing local sparsity of the hyperplanes. First, MARGOT has been tested on non-linearly separable synthetic datasets in 2-dimensional feature space to provide a graphical representation of the maximum margin approach. Finally, the proposed models have been tested on benchmark datasets from the UCI repository. The MARGOT formulation turns out to be easier to solve than other OCT approaches, and the generated tree better generalizes on new observations. The two interpretable versions are effective in selecting the most relevant features and maintaining good prediction quality.
translated by 谷歌翻译
We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is $\exists\mathbb{R}$-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Furthermore, we show that arbitrary algebraic numbers are required as weights to be able to train some instances to optimality, even if all data points are rational. Our results hold even if the following restrictions are all added simultaneously. $\bullet$ There are exactly two output neurons. $\bullet$ There are exactly two input neurons. $\bullet$ The data has only 13 different labels. $\bullet$ The number of hidden neurons is a constant fraction of the number of data points. $\bullet$ The target training error is zero. $\bullet$ The ReLU activation function is used. This shows that even very simple networks are difficult to train. The result explains why typical methods for $\mathsf{NP}$-complete problems, like mixed-integer programming or SAT-solving, cannot train neural networks to global optimality, unless $\mathsf{NP}=\exists\mathbb{R}$. We strengthen a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021].
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
这是一门专门针对STEM学生开发的介绍性机器学习课程。我们的目标是为有兴趣的读者提供基础知识,以在自己的项目中使用机器学习,并将自己熟悉术语作为进一步阅读相关文献的基础。在这些讲义中,我们讨论受监督,无监督和强化学习。注释从没有神经网络的机器学习方法的说明开始,例如原理分析,T-SNE,聚类以及线性回归和线性分类器。我们继续介绍基本和先进的神经网络结构,例如密集的进料和常规神经网络,经常性的神经网络,受限的玻尔兹曼机器,(变性)自动编码器,生成的对抗性网络。讨论了潜在空间表示的解释性问题,并使用梦和对抗性攻击的例子。最后一部分致力于加强学习,我们在其中介绍了价值功能和政策学习的基本概念。
translated by 谷歌翻译
本文提出了一种新的方法,称为模块化语法进化(MGE),以验证以下假设,即限制了神经进化的解决方案空间到模块化和简单的神经网络,可以有效地生成较小,更结构化的神经网络,同时提供可接受的(在某些方面)案例优于大型数据集的精度。 MGE还在两个方向上增强了最新的语法演化(GE)方法。首先,MGE的表示是模块化的,因为每个个体都有一组基因,并且每个基因都通过语法规则映射到神经元。其次,所提出的表示形式减轻了GE的两个重要缺点,即表示较低的表示性和弱位置,以生成具有大量神经元的模块化和多层网络。我们使用MGE定义和评估具有和不具有模块化的五种不同形式的结构,并找到没有耦合更有效的单层模块。我们的实验表明,模块化有助于更快地找到更好的神经网络。我们使用了十个具有不同尺寸,功能计数和输出类计数的众所周知的分类基准验证了提出的方法。我们的实验结果表明,MGE相对于现有的神经进化方法提供了卓越的准确性,并且返回分类器比其他机器学习生成的分类器要简单得多。最后,我们从经验上证明,MGE在局部性和可伸缩性属性方面优于其他GE方法。
translated by 谷歌翻译
在本文中,我们使用混合整数编程(MIP)探索基于模型的培训鲁棒和可解释的二金属化回归模型的培训鲁棒和可解释的二值化回归模型。我们的MIP模型通过使用加权目标来余额来实现预测边距和模型大小的优化,即:最大限度地减少错误分类的培训实例的总余量,最大限度地提高了正确分类的培训实例的总余量,并最大限度地提高了整体模型正则化。我们进行两组实验,以便在多个分类数据集的标准和损坏版本上测试MIP模型的分类准确性。在第一组实验中,我们表明我们的MIP模型优于等效的伪布尔优化(PBO)模型,并在标准数据集中的分类精度方面实现了对逻辑回归(LR)和梯度下降(GD)的竞争结果。在第二组实验中,我们表明我们的MIP模型在分类准确性方面优于大多数损坏的数据集的分类准确性。最后,我们在目视展示了MIP模型在其在MNIST DataSet上的学习参数方面的可解释性。总体而言,我们展示了使用MIP培训培训稳健和可解释的二值化回归模型的有效性。
translated by 谷歌翻译
Deep learning (DL) systems are increasingly deployed in safety-and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs.We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradientbased search techniques.DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in stateof-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity selfdriving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%.
translated by 谷歌翻译
机器学习算法和深度神经网络在几种感知和控制任务中的卓越性能正在推动该行业在安全关键应用中采用这种技术,作为自治机器人和自动驾驶车辆。然而,目前,需要解决几个问题,以使深入学习方法更可靠,可预测,安全,防止对抗性攻击。虽然已经提出了几种方法来提高深度神经网络的可信度,但大多数都是针对特定类的对抗示例量身定制的,因此未能检测到其他角落案件或不安全的输入,这些输入大量偏离训练样本。本文介绍了基于覆盖范式的轻量级监控架构,以增强针对不同不安全输入的模型鲁棒性。特别是,在用于评估多种检测逻辑的架构中提出并测试了四种覆盖分析方法。实验结果表明,该方法有效地检测强大的对抗性示例和分销外输入,引入有限的执行时间和内存要求。
translated by 谷歌翻译
组合优化是运营研究和计算机科学领域的一个公认领域。直到最近,它的方法一直集中在孤立地解决问题实例,而忽略了它们通常源于实践中的相关数据分布。但是,近年来,人们对使用机器学习,尤其是图形神经网络(GNN)的兴趣激增,作为组合任务的关键构件,直接作为求解器或通过增强确切的求解器。GNN的电感偏差有效地编码了组合和关系输入,因为它们对排列和对输入稀疏性的意识的不变性。本文介绍了对这个新兴领域的最新主要进步的概念回顾,旨在优化和机器学习研究人员。
translated by 谷歌翻译