全向视频中的光流估计面临两个重要问题:缺乏基准数据集以及调整基于视频的方法以适应全向性质的挑战。本文提出了第一个具有360度视野Flow360的感知上天然合成的全向基准数据集,其中有40个不同的视频和4,000个视频帧。我们在数据集和现有的光流数据集之间进行了全面的特征分析和比较,这些数据集表现出感知现实主义,独特性和多样性。为了适应全向性质,我们提出了一个新颖的暹罗表示学习框架(SLOF)。我们以对比度的方式训练我们的网络,并结合了对比度损失和光流损失的混合损失函数。广泛的实验验证了所提出的框架的有效性,并在最新方法中显示出40%的性能提高。我们的Flow360数据集和代码可在https://siamlof.github.io/上找到。
translated by 谷歌翻译
依靠这样的前提是,二进制神经网络的性能可以在很大程度上恢复,而完全精确的权重向量与其相应的二进制向量之间的量化错误,网络二线化的现有作品经常采用模型鲁棒性的想法以达到上述目标。但是,鲁棒性仍然是一个不明智的概念,而没有扎实的理论支持。在这项工作中,我们介绍了Lipschitz的连续性,即定义明确的功能特性,是定义BNN模型鲁棒性的严格标准。然后,我们建议将Lipschitz连续性保留为正规化项,以提高模型的鲁棒性。特别是,虽然流行的Lipschitz涉及正则化方法由于其极端稀疏而经常在BNN中崩溃,但我们将保留矩阵设计以近似于目标重量矩阵的光谱规范,可以将其作为BNN的Lipschitz常数的近似值部署精确的L​​ipschitz恒定计算(NP-HARD)。我们的实验证明,我们的BNN特异性正则化方法可以有效地增强BNN的鲁棒性(在Imagenet-C上作证),从而在CIFAR和Imagenet上实现最新性能。
translated by 谷歌翻译
神经网络二进制通过将其权重和激活量化为1位来加速深层模型。但是,二进制神经网络(BNN)与其完整精确(FP)对应物之间仍然存在巨大的性能差距。由于早期作品中权重二进制引起的量化误差已减少,因此激活二进化成为进一步提高准确性的主要障碍。 BNN表征了独特而有趣的结构,其中二进制和潜在的fp激活存在于同一正向通行证中(\ textit {i.e。} $ \ text {binarize}(\ mathbf {a} _f {a} _f)= \ mathbf {a a} _b $) 。为了减轻从FP到二元激活的二进化操作引起的信息降解,我们在通过互信息(MI)最大化的镜头训练BNN时建立了一种新颖的对比学习框架。将MI作为指标引入,以衡量二进制和FP激活之间共享的信息,这有助于对比度学习。具体而言,通过从相同输入样品中拉出二进制和FP激活的正对,以及从不同样品中推动负面对(负面对数的数量可以大大),从而极大地增强了BNN的表示能力。这使下游任务不仅有益于分类,而且还受益于分类和深度估计,〜\ textit {etc}。实验结果表明,我们的方法可以作为现有最新二元方法的堆积模块实现NYUD-V2的能力。
translated by 谷歌翻译
A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.
translated by 谷歌翻译
In recent years, deep-learning-based approaches have been introduced to solving time-series forecasting-related problems. These novel methods have demonstrated impressive performance in univariate and low-dimensional multivariate time-series forecasting tasks. However, when these novel methods are used to handle high-dimensional multivariate forecasting problems, their performance is highly restricted by a practical training time and a reasonable GPU memory configuration. In this paper, inspired by a change of basis in the Hilbert space, we propose a flexible data feature extraction technique that excels in high-dimensional multivariate forecasting tasks. Our approach was originally developed for the National Science Foundation (NSF) Algorithms for Threat Detection (ATD) 2022 Challenge. Implemented using the attention mechanism and Convolutional Neural Networks (CNN) architecture, our method demonstrates great performance and compatibility. Our models trained on the GDELT Dataset finished 1st and 2nd places in the ATD sprint series and hold promise for other datasets for time series forecasting.
translated by 谷歌翻译
This paper provides an introductory survey to GPT-3. We cover some of the historical development behind this technology, some of the key features of GPT-3, and discuss the machine learning model and the datasets used. We survey both academic and commercial efforts applying GPT-3 in diverse domains such as developing conversational AI chatbots, software development, creative work, domain knowledge, and business productivity. We discuss some of the challenges that GPT-3 faces such as the problems of training complexity, bias, and hallucination/incorrect answers. We also discuss the future research opportunities in this area.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Batch Normalization (BN) is an important preprocessing step to many deep learning applications. Since it is a data-dependent process, for some homogeneous datasets it is a redundant or even a performance-degrading process. In this paper, we propose an early-stage feasibility assessment method for estimating the benefits of applying BN on the given data batches. The proposed method uses a novel threshold-based approach to classify the training data batches into two sets according to their need for normalization. The need for normalization is decided based on the feature heterogeneity of the considered batch. The proposed approach is a pre-training processing, which implies no training overhead. The evaluation results show that the proposed approach achieves better performance mostly in small batch sizes than the traditional BN using MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets. Additionally, the network stability is increased by reducing the occurrence of internal variable transformation.
translated by 谷歌翻译
在本文中,我们提出了一个称为SDFE-LV的大规模,多源和不受约束的数据库,用于发现长视频中完整动态面部表达的发作和偏移帧,这被称为动态面部表情斑点的主题(DFE)和许多面部表达分析任务的重要步骤。具体而言,SDFE-LV由1,191个长视频组成,每个视频包含一个或多个完整的动态面部表情。此外,在相应的长视频中,每个完整的动态面部表达都被10次训练有素的注释者独立标记了五次。据我们所知,SDFE-LV是DFES任务的第一个无限制的大规模数据库,其长期视频是从多个现实世界/密切现实世界中的媒体来源收集的,例如电视采访,纪录片,电影和电影,以及我们媒体短视频。因此,在实践中,SDFE-LV数据库上的DFE任务将遇到许多困难,例如头部姿势变化,遮挡和照明。我们还通过使用许多最新的深度发现方法,从不同角度提供了全面的基准评估,因此对DFE感兴趣的研究人员可以快速而轻松地开始。最后,通过有关实验评估结果的深入讨论,我们试图指出几个有意义的方向来处理DFES任务,并希望将来DFE可以更好地进步。此外,SDFE-LV将仅尽快自由发布供学术使用。
translated by 谷歌翻译
经过标准的横向损失训练的深度神经网络更容易记住嘈杂的标签,从而降低了其性能。当嘈杂的标签干预时,使用互补标签的负面学习更加健壮,但模型收敛速度极慢。在本文中,我们首先引入了双向学习方案,在这种方案中,积极的学习可确保收敛速度,而负面学习则可以与标签噪声保持稳健的应对。此外,提出了一种动态样本重新加权策略,以通过利用负面学习对样本概率分布的出色歧视能力来削弱噪声标记样品的影响。此外,我们结合了自我鉴定,以进一步提高模型性能。该代码可在\ url {https://github.com/chenchenzong/bldr}中获得。
translated by 谷歌翻译