Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive results on novel view synthesis by utilizing implicit neural representation to represent 3D scenes. Due to the process of volumetric rendering, the inference speed for NeRF is extremely slow, limiting the application scenarios of utilizing NeRF on resource-constrained hardware, such as mobile devices. Many works have been conducted to reduce the latency of running NeRF models. However, most of them still require high-end GPU for acceleration or extra storage memory, which is all unavailable on mobile devices. Another emerging direction utilizes the neural light field (NeLF) for speedup, as only one forward pass is performed on a ray to predict the pixel color. Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly. In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. We follow the setting of NeLF to train our network. Unlike existing works, we introduce a novel network architecture that runs efficiently on mobile devices with low latency and small size, i.e., saving $15\times \sim 24\times$ storage compared with MobileNeRF. Our model achieves high-resolution generation while maintaining real-time inference for both synthetic and real-world scenes on mobile devices, e.g., $18.04$ms (iPhone 13) for rendering one $1008\times756$ image of real 3D scenes. Additionally, we achieve similar image quality as NeRF and better quality than MobileNeRF (PSNR $26.15$ vs. $25.91$ on the real-world forward-facing dataset).
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Whole-slide images (WSI) in computational pathology have high resolution with gigapixel size, but are generally with sparse regions of interest, which leads to weak diagnostic relevance and data inefficiency for each area in the slide. Most of the existing methods rely on a multiple instance learning framework that requires densely sampling local patches at high magnification. The limitation is evident in the application stage as the heavy computation for extracting patch-level features is inevitable. In this paper, we develop RLogist, a benchmarking deep reinforcement learning (DRL) method for fast observation strategy on WSIs. Imitating the diagnostic logic of human pathologists, our RL agent learns how to find regions of observation value and obtain representative features across multiple resolution levels, without having to analyze each part of the WSI at the high magnification. We benchmark our method on two whole-slide level classification tasks, including detection of metastases in WSIs of lymph node sections, and subtyping of lung cancer. Experimental results demonstrate that RLogist achieves competitive classification performance compared to typical multiple instance learning algorithms, while having a significantly short observation path. In addition, the observation path given by RLogist provides good decision-making interpretability, and its ability of reading path navigation can potentially be used by pathologists for educational/assistive purposes. Our code is available at: \url{https://github.com/tencent-ailab/RLogist}.
translated by 谷歌翻译
文本到SQL解析是一项必不可少且具有挑战性的任务。文本到SQL解析的目的是根据关系数据库提供的证据将自然语言(NL)问题转换为其相应的结构性查询语言(SQL)。来自数据库社区的早期文本到SQL解析系统取得了显着的进展,重度人类工程和用户与系统的互动的成本。近年来,深层神经网络通过神经生成模型显着提出了这项任务,该模型会自动学习从输入NL问题到输出SQL查询的映射功能。随后,大型的预训练的语言模型将文本到SQL解析任务的最新作品带到了一个新级别。在这项调查中,我们对文本到SQL解析的深度学习方法进行了全面的评论。首先,我们介绍了文本到SQL解析语料库,可以归类为单转和多转。其次,我们提供了预先训练的语言模型和现有文本解析方法的系统概述。第三,我们向读者展示了文本到SQL解析所面临的挑战,并探索了该领域的一些潜在未来方向。
translated by 谷歌翻译
作为一种常见的安全工具,已广泛应用可见的水印来保护数字图像的版权。但是,最近的作品表明,可见的水印可以通过DNN删除而不会损坏其宿主图像。这样的水印驱动技术对图像的所有权构成了巨大威胁。受到DNN在对抗扰动方面的脆弱性的启发,我们提出了一种新颖的防御机制,可以永久地通过对抗机器学习。从对手的角度来看,可以将盲水水印网络作为我们的目标模型提出。然后,我们实际上优化了对宿主图像上不可察觉的对抗扰动,以主动攻击水印网络,称为水印疫苗。具体而言,提出了两种类型的疫苗。破坏水印疫苗(DWV)在通过水印拆除网络后,诱导了与水印一起破坏宿主图像。相比之下,不可行的水印疫苗(IWV)以另一种方式试图保持水印不清除且仍然明显。广泛的实验证明了我们的DWV/IWV在防止水印去除方面的有效性,尤其是在各种水印去除网络上。
translated by 谷歌翻译
深度神经网络通过学习从低分辨率(LR)图像到高分辨率(HR)图像的映射,在图像超分辨率(SR)任务中表现出了显着的性能。但是,SR问题通常是一个不适的问题,现有方法将受到一些局限性。首先,由于可能存在许多不同的HR图像,因此SR的可能映射空间可能非常大,可以将其删除到相同的LR图像中。结果,很难直接从如此大的空间中学习有希望的SR映射。其次,通常不可避免地要开发具有极高计算成本的非常大型模型来产生有希望的SR性能。实际上,可以使用模型压缩技术通过降低模型冗余来获得紧凑的模型。然而,由于非常大的SR映射空间,现有模型压缩方法很难准确识别冗余组件。为了减轻第一个挑战,我们提出了一项双重回归学习计划,以减少可能的SR映射空间。具体而言,除了从LR到HR图像的映射外,我们还学习了一个附加的双回归映射,以估算下采样内核和重建LR图像。通过这种方式,双映射是减少可能映射空间的约束。为了应对第二项挑战,我们提出了一种轻巧的双回归压缩方法,以基于通道修剪来降低图层级别和通道级别的模型冗余。具体而言,我们首先开发了一种通道编号搜索方法,该方法将双重回归损耗最小化以确定每一层的冗余。鉴于搜索的通道编号,我们进一步利用双重回归方式来评估通道的重要性并修剪冗余。广泛的实验显示了我们方法在获得准确有效的SR模型方面的有效性。
translated by 谷歌翻译
知识蒸馏(KD)是一种广泛使用的技术,用于训练对象检测中的紧凑模型。但是,仍然缺乏关于如何在异质探测器之间提炼的研究。在本文中,我们从经验上发现,尽管他们的探测头和标签分配不同,但异构教师探测器的更好的FPN功能可以帮助学生。但是,将特征图直接对齐以提炼探测器有两个问题。首先,老师和学生之间的功能幅度差异可能会对学生实施过度严格的限制。其次,来自教师模型的FPN阶段和具有较大特征大小的通道可能会主导蒸馏损失的梯度,这将压倒KD中其他功能的影响并引入大量噪音。为了解决上述问题,我们建议模仿Pearson相关系数的功能,以专注于教师的关系信息,并放宽对功能大小的约束。我们的方法始终优于现有检测方法,并适用于同质和异类的学生教师对。此外,它的收敛速度更快。基于Resnet-50的视网膜和FCO的强大MaskRCNN-SWIN检测器作为教师,在COCO2017上获得了41.5%和43.9%的地图,分别比基线高4.1 \%和4.8%。
translated by 谷歌翻译
多元时间序列预测已在各种领域(包括金融,交通,能源和医疗保健)中广泛范围的应用程序。为了捕获复杂的时间模式,大量研究设计了基于RNN,GNN和Transformers的许多变体的复杂神经网络体系结构。但是,复杂的模型在计算上通常是昂贵的,因此当应用于大型现实世界数据集时,在训练和推理效率方面面临严重的挑战。在本文中,我们介绍了Lightts,这是一种基于简单的基于MLP的结构的轻度深度学习体系结构。 LightT的关键思想是在两种微妙的下采样策略之上应用基于MLP的结构,包括间隔抽样和连续采样,灵感来自至关重要的事实,即下采样时间序列通常保留其大多数信息。我们对八个广泛使用的基准数据集进行了广泛的实验。与现有的最新方法相比,Lightts在其中五个方面表现出更好的性能,其余的性能可比性。此外,Lightts高效。与最大的基准数据集上的先前SOTA方法相比,它使用的触发器少于5%。此外,Lightts的预测准确性与以前的SOTA方法相比,在长序列预测任务中,预测准确性的差异要小得多。
translated by 谷歌翻译
自动开放域对话评估是对话系统的关键组成部分。最近,基于学习的评估指标在开放域对话评估中取得了最先进的表现。但是,这些仅关注一些素质的指标很难全面评估对话。此外,这些指标缺乏有效的分数组成方法,无法获得各种评估质量。为了解决上述问题,我们提出了基于相关性重新缩放(MME-CR)的多项式评估,以评估开放域对话。首先,我们建立了一个评估度量,该评估度量由5组平行的子对象组成,称为多金属评估(MME),以全面评估对话的质量。此外,我们提出了一种称为相关重新缩放(CRS)的新型分数组成方法,以模拟子计量与多样性之间的关系。我们的方法MME-CRS在DSTC10 TRACK5 SubTask1自动开放域对话评估挑战的最终测试数据中排名第一,这证明了我们提出的方法的有效性。
translated by 谷歌翻译