由于其在多个工业应用领域的竞争性能,深度学习在我们的日常生活中起着越来越重要的作用。作为基于DL的系统的核心,深度神经网络会自动从精心收集和有组织的培训数据中学习知识,以获得预测看不见数据的标签的能力。与需要全面测试的传统软件系统类似,还需要仔细评估DNN,以确保受过训练的模型的质量满足需求。实际上,评估行业中DNN质量的事实上的标准是检查其在收集的标记测试数据集中的性能(准确性)。但是,准备这样的标记数据通常不容易部分,部分原因是标签工作巨大,即数据标记是劳动密集型的,尤其是每天有大量新的新传入的未标记数据。最近的研究表明,DNN的测试选择是一个有希望的方向,可以通过选择最小的代表性数据来标记并使用这些数据来评估模型来解决此问题。但是,它仍然需要人类的努力,不能自动。在本文中,我们提出了一种名为Aries的新技术,可以使用原始测试数据获得的信息估算新未标记数据的DNN的性能。我们技术背后的关键见解是,该模型在与决策边界具有相似距离的数据上应具有相似的预测准确性。我们对13种数据转换方法的技术进行了大规模评估。结果表明,我们技术的有用性是,白羊座的估计准确性仅为0.03%-2.60%(平均0.61%),从真实的准确性中差。此外,在大多数(128个)情况下,白羊座还优于最先进的选择标记方法。
translated by 谷歌翻译
在过去的几年中,深度学习(DL)一直在不断扩大其应用程序,并成为大型法规时代大规模源代码分析的推动力。由于意外的准确性降解,测试集与训练集不同的分布与训练集不同的分布与训练集不同。尽管最近在计算机视觉和自然语言过程等领域取得了分配转移基准测试的最新进展。对于源代码任务的分配转移分析和基准测试,进展有限,由于其数量和支持几乎所有工业部门的基础,都有很大的需求。为了填补这一空白,本文启动了提出代码,即用于源代码学习的分销基准数据集。具体而言,代码支持2种编程语言(即Java和Python)和5种代码分发偏移(即任务,程序员,时间戳记,代币和CST)。据我们所知,我们是第一个定义基于代码表示的分布变化的人。在实验中,我们首先评估现有分布探测器的有效性以及分配移位定义的合理性,然后测量流行代码学习模型(例如Codebert)对分类任务的模型概括。结果表明,1)仅基于SoftMax得分的OOD检测器在代码上表现良好,2)分配转移会导致所有代码分类模型中的准确性降解,3)基于表示的分布转移对模型的影响比其他模型具有更高的影响,并且4)预训练的模型对分布变化更具抵抗力。我们公开提供代码,从而实现了有关代码学习模型质量评估的后续研究。
translated by 谷歌翻译
积极学习是一种降低标签成本以构建高质量机器学习模型的既定技术。主动学习的核心组件是确定应选择哪些数据来注释的采集功能。最先进的采集功能 - 更重要的是主动学习技术 - 已经旨在最大限度地提高清洁性能(例如,准确性)并忽视了鲁棒性,这是一种受到越来越受关注的重要品质。因此,主动学习产生准确但不强大的模型。在本文中,我们提出了一种积极的学习过程,集成了对抗性培训的积极学习过程 - 最熟悉的制作强大模型的方法。通过对11个采集函数的实证研究,4个数据集,6个DNN架构和15105培训的DNN,我们表明,强大的主动学习可以产生具有鲁棒性的模型(对抗性示例的准确性),范围从2.35 \%到63.85 \%,而标准主动学习系统地实现了可忽略不计的鲁棒性(小于0.20 \%)。然而,我们的研究还揭示了在稳健性方面,在准确性上表现良好的采集功能比随机抽样更糟糕。因此,我们检查了它背后的原因,并设计了一个新的采购功能,这些功能既可定位清洁的性能和鲁棒性。我们的采集功能 - 基于熵(DRE)的基于密度的鲁棒采样 - 优于鲁棒性的其他采集功能(包括随机),最高可达24.40 \%(特别是3.84 \%),同时仍然存在竞争力准确性。此外,我们证明了DRE适用于测试选择度量,用于模型再培训,并从所有比较功能中脱颖而出,高达8.21%的鲁棒性。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译
The task of video prediction and generation is known to be notoriously difficult, with the research in this area largely limited to short-term predictions. Though plagued with noise and stochasticity, videos consist of features that are organised in a spatiotemporal hierarchy, different features possessing different temporal dynamics. In this paper, we introduce Dynamic Latent Hierarchy (DLH) -- a deep hierarchical latent model that represents videos as a hierarchy of latent states that evolve over separate and fluid timescales. Each latent state is a mixture distribution with two components, representing the immediate past and the predicted future, causing the model to learn transitions only between sufficiently dissimilar states, while clustering temporally persistent states closer together. Using this unique property, DLH naturally discovers the spatiotemporal structure of a dataset and learns disentangled representations across its hierarchy. We hypothesise that this simplifies the task of modeling temporal dynamics of a video, improves the learning of long-term dependencies, and reduces error accumulation. As evidence, we demonstrate that DLH outperforms state-of-the-art benchmarks in video prediction, is able to better represent stochasticity, as well as to dynamically adjust its hierarchical and temporal structure. Our paper shows, among other things, how progress in representation learning can translate into progress in prediction tasks.
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on $\ell_p$-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.
translated by 谷歌翻译