Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译
Neural Architecture Search (NAS) is an automatic technique that can search for well-performed architectures for a specific task. Although NAS surpasses human-designed architecture in many fields, the high computational cost of architecture evaluation it requires hinders its development. A feasible solution is to directly evaluate some metrics in the initial stage of the architecture without any training. NAS without training (WOT) score is such a metric, which estimates the final trained accuracy of the architecture through the ability to distinguish different inputs in the activation layer. However, WOT score is not an atomic metric, meaning that it does not represent a fundamental indicator of the architecture. The contributions of this paper are in three folds. First, we decouple WOT into two atomic metrics which represent the distinguishing ability of the network and the number of activation units, and explore better combination rules named (Distinguishing Activation Score) DAS. We prove the correctness of decoupling theoretically and confirmed the effectiveness of the rules experimentally. Second, in order to improve the prediction accuracy of DAS to meet practical search requirements, we propose a fast training strategy. When DAS is used in combination with the fast training strategy, it yields more improvements. Third, we propose a dataset called Darts-training-bench (DTB), which fills the gap that no training states of architecture in existing datasets. Our proposed method has 1.04$\times$ - 1.56$\times$ improvements on NAS-Bench-101, Network Design Spaces, and the proposed DTB.
translated by 谷歌翻译
最近,音频驱动的会说话的面部视频产生引起了广泛的关注。但是,很少有研究能够解决这些会说话的面部视频的情感编辑问题,并具有连续可控的表达式,这是行业中强烈的需求。面临的挑战是,与语音有关的表达和与情感有关的表达通常是高度耦合的。同时,由于表达式与其他属性(例如姿势)的耦合,即在每个框架中翻译角色的表达可能会同时改变头部姿势,因此传统的图像到图像翻译方法无法在我们的应用中很好地工作。培训数据分布。在本文中,我们提出了一种高质量的面部表达编辑方法,用于谈话面部视频,使用户可以连续控制编辑视频中的目标情感。我们为该任务提供了一个新的视角,作为运动信息编辑的特殊情况,我们使用3DMM捕获主要的面部运动和由StyleGAN模拟的相关纹理图,以捕获外观细节。两种表示(3DMM和纹理图)都包含情感信息,并且可以通过神经网络进行连续修改,并通过系数/潜在空间平均轻松平滑,从而使我们的方法变得简单而有效。我们还引入了口腔形状的保存损失,以控制唇部同步和编辑表达的夸张程度之间的权衡。广泛的实验和用户研究表明,我们的方法在各种评估标准中实现了最先进的表现。
translated by 谷歌翻译
很少有学习模型学习人类注释有限,而这种学习范式在各种任务中证明了实用性数据使该模型无法充分探索语义信息。为了解决这个问题,我们将知识蒸馏引入了几个弹出的对象检测学习范式。我们进一步进行了激励实验,该实验表明,在知识蒸馏的过程中,教师模型的经验误差将少数拍物对象检测模型的预测性能(作为学生)退化。为了了解这种现象背后的原因,我们从因果理论的角度重新审视了几个对象检测任务上知识蒸馏的学习范式,并因此发展了一个结构性因果模型。遵循理论指导,我们建议使用基于后门调整的知识蒸馏方法,用于少数拍物检测任务,即Disentangle和Remerge(D&R),以对相应的结构性因果模型进行有条件的因果干预。从理论上讲,我们为后门标准提供了扩展的定义,即一般后门路径,可以在特定情况下扩展后门标准的理论应用边界。从经验上讲,多个基准数据集上的实验表明,D&R可以在几个射击对象检测中产生显着的性能提升。
translated by 谷歌翻译
神经体系结构搜索(NAS)可以自动为深神经网络(DNN)设计架构,并已成为当前机器学习社区中最热门的研究主题之一。但是,NAS通常在计算上很昂贵,因为在搜索过程中需要培训大量DNN。绩效预测因素可以通过直接预测DNN的性能来大大减轻NAS的过失成本。但是,构建令人满意的性能预测能力很大程度上取决于足够的训练有素的DNN体系结构,在大多数情况下很难获得。为了解决这个关键问题,我们在本文中提出了一种名为Giaug的有效的DNN体系结构增强方法。具体而言,我们首先提出了一种基于图同构的机制,其优点是有效地生成$ \ boldsymbol n $(即$ \ boldsymbol n!$)的阶乘,对具有$ \ boldsymbol n $ n $ n $ n $ \ boldsymbol n $的单个体系结构进行了带注释的体系结构节点。此外,我们还设计了一种通用方法,将体系结构编码为适合大多数预测模型的形式。结果,可以通过各种基于性能预测因子的NAS算法灵活地利用Giaug。我们在中小型,中,大规模搜索空间上对CIFAR-10和Imagenet基准数据集进行了广泛的实验。实验表明,Giaug可以显着提高大多数最先进的同伴预测因子的性能。此外,与最先进的NAS算法相比,Giaug最多可以在ImageNet上节省三级计算成本。
translated by 谷歌翻译
协作过滤(CF)广泛用于推荐系统以模拟用户项目交互。随着各个领域的深度神经网络(DNN)的巨大成功,高级作品最近提出了几种基于DNN的CF模型,这已被证明是有效的。但是,神经网络都是手动设计的。因此,它要求设计人员在CF和DNN中开发专业知识,这限制了深度学习方法在CF中的应用以及推荐结果的准确性。在本文中,我们将遗传算法介绍到设计DNN的过程中。通过遗传操作,如交叉,突变和环境选择策略,可以自动设计DNN的架构和连接权重初始化。我们对两个基准数据集进行了广泛的实验。结果证明了所提出的算法优于几种手动设计的最先进的神经网络。
translated by 谷歌翻译
漫画是一种人类面孔的艺术风格,吸引了娱乐业的相当大的关注。到目前为止,存在少数3D漫画生成方法,所有这些都需要一些漫画信息(例如,漫画素描或2D漫画)作为输入。然而,这种输入难以由非专业用户提供。在本文中,我们提出了一个端到端的深度神经网络模型,可直接从正常的2D脸照片产生高质量的3D漫画。我们系统最具挑战性的问题是面部照片的源域(以正常的2D面为特征)与3D漫画的目标域有很大差异(以3D夸大的面形状和纹理为特征)。为了解决这一挑战,我们:(1)建立一个大型数据集5,343个3D漫画网格,并使用它来建立3D漫画形状空间中的PCA模型; (2)从输入面照片重建正常的全3D头,并在3D漫画形状空间中使用其PCA表示来建立输入照片和3D漫画形状之间的对应关系; (3)提出了一种基于以前对讽刺的心理研究的新颖性状损失和新颖的漫画损失。实验包括新型两级用户学习,表明我们的系统可以直接从正常面部照片产生高质量的3D漫画。
translated by 谷歌翻译
We present a new two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD). The first stage is a bottom-up proposal generation network that uses raw point cloud as input to generate accurate proposals by seeding each point with a new spherical anchor. It achieves a high recall with less computation compared with prior works. Then, PointsPool is applied for generating proposal features by transforming their interior point features from sparse expression to compact representation, which saves even more computation time. In box prediction, which is the second stage, we implement a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance. We conduct experiments on KITTI dataset, and evaluate our method in terms of 3D object and Bird's Eye View (BEV) detection. Our method outperforms other stateof-the-arts by a large margin, especially on the hard set, with inference speed more than 10 FPS.
translated by 谷歌翻译
Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time.
translated by 谷歌翻译
Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation.
translated by 谷歌翻译