信号处理和机器学习中的许多问题都可以正面被形式化为弱子模块优化任务。对于此类问题,保证了一种简单的贪婪算法(\ textsc {greedy}),以找到实现目标的解决方案,其中值不到1-e ^ { - 1 / c} $的最佳值,其中$ c $乘法弱潜水解度常数。由于查询大规模系统的高成本,在当代应用中,\ Textsc {贪婪}的复杂性变得令人望而却步。在这项工作中,我们研究了随机采样策略的绩效和复杂性之间的权衡,以减少\ textsc的查询复杂性{greedy}。具体而言,我们通过两个度量来量化统一采样策略对\ textsc {贪婪}的性能的影响:(i)识别最佳子集的概率,(ii)相对于最佳解决方案的次优。后者意味着具有固定采样尺寸的均匀采样策略实现了非平凡的近似因子;但是,我们表明,通过压倒性概率,这些方法无法找到最佳子集。我们的分析表明,通过连续增加搜索空间的大小,可以避免具有固定样本大小的均匀采样策略的失败。建立这种洞察力,我们提出了一种简单的渐进式随机贪婪算法,并研究其近似保证。此外,我们展示了提出的方法在维度减少应用中的提出方法以及用于聚类和对象跟踪的特征选择任务。
translated by 谷歌翻译
Runtime monitoring provides a more realistic and applicable alternative to verification in the setting of real neural networks used in industry. It is particularly useful for detecting out-of-distribution (OOD) inputs, for which the network was not trained and can yield erroneous results. We extend a runtime-monitoring approach previously proposed for classification networks to perception systems capable of identification and localization of multiple objects. Furthermore, we analyze its adequacy experimentally on different kinds of OOD settings, documenting the overall efficacy of our approach.
translated by 谷歌翻译
Deep Neural Networks (DNN) are becoming increasingly more important in assisted and automated driving. Using such entities which are obtained using machine learning is inevitable: tasks such as recognizing traffic signs cannot be developed reasonably using traditional software development methods. DNN however do have the problem that they are mostly black boxes and therefore hard to understand and debug. One particular problem is that they are prone to hidden backdoors. This means that the DNN misclassifies its input, because it considers properties that should not be decisive for the output. Backdoors may either be introduced by malicious attackers or by inappropriate training. In any case, detecting and removing them is important in the automotive area, as they might lead to safety violations with potentially severe consequences. In this paper, we introduce a novel method to remove backdoors. Our method works for both intentional as well as unintentional backdoors. We also do not require prior knowledge about the shape or distribution of backdoors. Experimental evidence shows that our method performs well on several medium-sized examples.
translated by 谷歌翻译
Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.
translated by 谷歌翻译
This paper offers a new authentication algorithm based on image matching of nano-resolution visual identifiers with tree-shaped patterns. The algorithm includes image-to-tree conversion by greedy extraction of the fractal pattern skeleton along with a custom-built graph matching algorithm that is robust against imaging artifacts such as scaling, rotation, scratch, and illumination change. The proposed algorithm is applicable to a variety of tree-structured image matching, but our focus is on dendrites, recently-developed visual identifiers. Dendrites are entropy rich and unclonable with existing 2D and 3D printers due to their natural randomness, nano-resolution granularity, and 3D facets, making them an appropriate choice for security applications such as supply chain trace and tracking. The proposed algorithm improves upon graph matching with standard image descriptors. For instance, image inconsistency due to the camera sensor noise may cause unexpected feature extraction leading to inaccurate tree conversion and authentication failure. Also, previous tree extraction algorithms are prohibitively slow hindering their scalability to large systems. In this paper, we fix the current issues of [1] and accelerate the key points extraction up to 10-times faster by implementing a new skeleton extraction method, a new key points searching algorithm, as well as an optimized key point matching algorithm. Using minimum enclosing circle and center points, make the algorithm robust to the choice of pattern shape. In contrast to [1] our algorithm handles general graphs with loop connections, therefore is applicable to a wider range of applications such as transportation map analysis, fingerprints, and retina vessel imaging.
translated by 谷歌翻译
基于深度学习的图生成方法具有显着的图形数据建模能力,从而使它们能够解决广泛的现实世界问题。使这些方法能够在生成过程中考虑不同的条件,甚至通过授权它们生成满足所需标准的新图形样本来提高其有效性。本文提出了一种条件深图生成方法,称为SCGG,该方法考虑了特定类型的结构条件。具体而言,我们提出的SCGG模型采用初始子图,并自动重新收获在给定条件子结构之上生成新节点及其相应的边缘。 SCGG的体系结构由图表表示网络和自动回归生成模型组成,该模型是端到端训练的。使用此模型,我们可以解决图形完成,这是恢复缺失的节点及其相关的部分观察图的猖and固有的困难问题。合成数据集和现实世界数据集的实验结果证明了我们方法的优势与最先进的基准相比。
translated by 谷歌翻译
在本文中,我们提出了一个样本复杂性,以从嘈杂的样本中学习单纯形。给出了$ n $的数据集,其中包括i.i.d.样品从$ \ mathbb {r}^k $中的未知任意单纯形上的均匀分布中得出,其中假定样品被任意幅度的加性高斯噪声损坏。我们提出了一种策略,该策略可以输出一个单纯概率,总变化距离为$ \ epsilon + o \ left(\ mathrm {snr}^{ - 1} \ right)$从true Simplex中,对于任何$ \ Epsilon> 0 $。我们证明,要接近True Simplex,就足以拥有$ n \ ge \ tilde {o} \ left(k^2/\ epsilon^2 \ right)$ samples。在这里,SNR代表信噪比,可以看作是单纯形直径与噪声的标准偏差的比率。我们的证明是基于样品压缩技术的最新进步,这些进步已经显示出在高维高斯混合模型中的密度估计的紧密范围方面的承诺。
translated by 谷歌翻译
由摩尔定律驱动的计算系统性能的改善已改变了社会。由于这种硬件驱动的收益放缓,对于软件开发人员而言,专注于开发过程中的性能和效率变得更加重要。尽管几项研究表明了这种提高的代码效率的潜力(例如,与硬件相比,2倍更好的世代改进),但在实践中解锁这些收益是充满挑战的。关于算法复杂性以及硬件编码模式的相互作用的推理对于普通程序员来说可能是具有挑战性的,尤其是当与围绕开发速度和多人发展的务实约束结合使用时。本文旨在解决这个问题。我们分析了Google Code JAM竞争中的大型竞争编程数据集,并发现有效的代码确实很少见,中位数和第90%的解决方案之间的运行时间差异为2倍。我们建议使用机器学习以提示的形式自动提供规范反馈,以指导程序员编写高性能代码。为了自动从数据集中学习这些提示,我们提出了一种新颖的离散变异自动编码器,其中每个离散的潜在变量代表了不同的代码编辑类别,从而提高了性能。我们表明,此方法代表代码效率的多模式空间比序列到序列基线更好地编辑,并生成更有效的解决方案的分布。
translated by 谷歌翻译
同工型是从同一基因位点产生的MRNA,称为替代剪接。研究表明,超过95%的人类多外XEX基因经历了替代剪接。尽管mRNA序列的变化很少,但它们可能会对细胞功能和调节产生系统的影响。广泛报道了基因的同工型具有不同甚至对比的功能。大多数研究表明,替代剪接在人类健康和疾病中起着重要作用。尽管具有广泛的基因功能研究,但关于同工型功能的信息很少。最近,已经提出了一些基于多个实例学习的计算方法,用于使用基因函数和基因表达谱预测同工型函数。但是,由于缺乏标记的培训数据,他们的性能并不理想。另外,概率模型(例如条件随机场(CRF))已被用于建模同工型之间的关系。该项目使用所有数据和有价值的信息,例如同工型序列,表达曲线和基因本体论图,并提出了基于深神经网络的综合模型。 Uniprot基因本体论(GO)数据库用作基因函数的标准参考。 NCBI REFSEQ数据库用于提取基因和同工型序列,NCBI SRA数据库用于表达式配置文件数据。曲线下(ROC AUC)下的接收器操作特征区域和曲线下的Precision-Recall等指标用于测量预测准确性。
translated by 谷歌翻译
虽然可以通过对位渠道进行排序来有效地实现连续策略解码的极性代码,但以有效且可扩展的方式为连续策略列表(SCL)解码找到最佳的极性代码结构,但仍在等待研究。本文提出了一个基于图形神经网络(GNN)基于迭代消息通话(IMP)算法的强化算法,以解决SCL解码的极性代码构建问题。该算法仅在极地代码的生成器矩阵诱导的图的局部结构上运行。 IMP模型的大小独立于区块长度和代码速率,从而使其可扩展到具有长块长度的极性代码。此外,单个受过训练的IMP模型可以直接应用于广泛的目标区块长度,代码速率和渠道条件,并且可以生成相应的极性代码,而无需单独的训练。数值实验表明,IMP算法找到了极性代码构建体,这些构建体在环状划分 - 检查辅助辅助AD的SCL(CA-SCL)解码下显着优于经典构建体。与针对SCL/CA-SCL解码量身定制的其他基于学习的施工方法相比,IMP算法构建具有可比或较低帧错误率的极地代码,同时通过消除每个目标阻止长度的单独训练的需求,从而大大降低了训练的复杂性,代码速率和通道状况。
translated by 谷歌翻译