Transformer variants dominate the state-of-the-art in different natural language processing tasks such as translation, reading comprehension and summarization. Our paper is more directed to use general memory slots added to the inputs and studying the results of adding these slots. This paper is a go on study of general memory slots rule that were added to the input of the proposed model in previous work. We have two main tasks;1) pretraining task using masked language modeling and b) fine tuning task using HotpotQA . This study aims to verify the ability of the proposed model to handle chunks as if they were one chunk comparing with the base model. As baseline we used T5 transformer. We studied the rule of memory slots augmented to each input chunk and studied the model performance without selector. We found that adding memory to input chunks helped the proposed model to overcome the baseline on Masked language modeling task with specific training parameters. Ablation study reveals the ability of using the compressed input chunks with a degradation in performance.
translated by 谷歌翻译
最近,Wong等人。表明,使用单步FGSM的对抗训练导致一种名为灾难性过度拟合(CO)的特征故障模式,其中模型突然变得容易受到多步攻击的影响。他们表明,在FGSM(RS-FGSM)之前添加随机扰动似乎足以防止CO。但是,Andriushchenko和Flammarion观察到RS-FGSM仍会导致更大的扰动,并提出了一个昂贵的常规化器(Gradalign),DEMATER(GARGALIGN)DES昂贵(Gradalign)Dust Forrasiniger(Gradalign)Dust co避免在这项工作中,我们有条不紊地重新审视了噪声和剪辑在单步对抗训练中的作用。与以前的直觉相反,我们发现在干净的样品周围使用更强烈的噪声与不剪接相结合在避免使用大扰动半径的CO方面非常有效。基于这些观察结果,我们提出了噪声-FGSM(N-FGSM),尽管提供了单步对抗训练的好处,但在大型实验套件上没有经验分析,这表明N-FGSM能够匹配或超越以前的单步方法的性能,同时达到3 $ \ times $加速。代码可以在https://github.com/pdejorge/n-fgsm中找到
translated by 谷歌翻译
The prevalence of diabetic retinopathy (DR) has reached 34.6% worldwide and is a major cause of blindness among middle-aged diabetic patients. Regular DR screening using fundus photography helps detect its complications and prevent its progression to advanced levels. As manual screening is time-consuming and subjective, machine learning (ML) and deep learning (DL) have been employed to aid graders. However, the existing CNN-based methods use either pre-trained CNN models or a brute force approach to design new CNN models, which are not customized to the complexity of fundus images. To overcome this issue, we introduce an approach for custom-design of CNN models, whose architectures are adapted to the structural patterns of fundus images and better represent the DR-relevant features. It takes the leverage of k-medoid clustering, principal component analysis (PCA), and inter-class and intra-class variations to automatically determine the depth and width of a CNN model. The designed models are lightweight, adapted to the internal structures of fundus images, and encode the discriminative patterns of DR lesions. The technique is validated on a local dataset from King Saud University Medical City, Saudi Arabia, and two challenging benchmark datasets from Kaggle: EyePACS and APTOS2019. The custom-designed models outperform the famous pre-trained CNN models like ResNet152, Densnet121, and ResNeSt50 with a significant decrease in the number of parameters and compete well with the state-of-the-art CNN-based DR screening methods. The proposed approach is helpful for DR screening under diverse clinical settings and referring the patients who may need further assessment and treatment to expert ophthalmologists.
translated by 谷歌翻译
各种科学和工程领域使用参数化机制模型。工程师和科学家通常可以假设几个竞争模型来解释特定的过程或现象。考虑一个模特歧视设置,我们希望找到最佳机械,动态模型候选者和最佳模型参数估计。通常,若干竞争机械模型可以解释可用数据,因此通过找到最大化模型预测发散的实验设置,可以通过找到最大化模型预测发散的实验设置来实现最佳地收集额外数据的动态实验。我们争论文献中有两种主要方法,用于解决最佳设计问题:(i)分析方法,使用线性和高斯近似来找设计目标的闭合表达式,以及(ii)数据驱动方法,这通常依赖于计算密集的蒙特卡罗技术。 olofsson等人。 (ICML 35,2018)介绍了高斯工艺(GP)替代模型来杂交的分析和数据驱动方法,这允许计算的实验设计,以识别黑盒式模型。在这项研究中,我们证明我们可以扩展现有的动态实验设计方法,以纳入更广泛的问题不确定性。我们还延伸了Olofsson等人。 (2018)使用GP代理模型来辨别动态黑盒式模型的方法。我们在文献中的着名案例研究中评估了我们的方法,并探讨了使用GP代理到近似基于梯度的方法的后果。
translated by 谷歌翻译
Modern deep neural networks tend to be evaluated on static test sets. One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations. For example, it is hard to study the robustness of these networks to variations of object scale, object pose, scene lighting and 3D occlusions. The main reason is that collecting real datasets with fine-grained naturalistic variations of sufficient scale can be extremely time-consuming and expensive. In this work, we present Counterfactual Simulation Testing, a counterfactual framework that allows us to study the robustness of neural networks with respect to some of these naturalistic variations by building realistic synthetic scenes that allow us to ask counterfactual questions to the models, ultimately providing answers to questions such as "Would your classification still be correct if the object were viewed from the top?" or "Would your classification still be correct if the object were partially occluded by another object?". Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers, with respect to these naturalistic variations. We find evidence that ConvNext is more robust to pose and scale variations than Swin, that ConvNext generalizes better to our simulated domain and that Swin handles partial occlusion better than ConvNext. We also find that robustness for all networks improves with network scale and with data scale and variety. We release the Naturalistic Variation Object Dataset (NVD), a large simulated dataset of 272k images of everyday objects with naturalistic variations such as object pose, scale, viewpoint, lighting and occlusions. Project page: https://counterfactualsimulation.github.io
translated by 谷歌翻译
Continual Learning is a step towards lifelong intelligence where models continuously learn from recently collected data without forgetting previous knowledge. Existing continual learning approaches mostly focus on image classification in the class-incremental setup with clear task boundaries and unlimited computational budget. This work explores Online Domain-Incremental Continual Segmentation~(ODICS), a real-world problem that arises in many applications, \eg, autonomous driving. In ODICS, the model is continually presented with batches of densely labeled images from different domains; computation is limited and no information about the task boundaries is available. In autonomous driving, this may correspond to the realistic scenario of training a segmentation model over time on a sequence of cities. We analyze several existing continual learning methods and show that they do not perform well in this setting despite working well in class-incremental segmentation. We propose SimCS, a parameter-free method complementary to existing ones that leverages simulated data as a continual learning regularizer. Extensive experiments show consistent improvements over different types of continual learning methods that use regularizers and even replay.
translated by 谷歌翻译
在大型数据集上,对视力任务的深度学习模型进行了培训,因为存在一个通用表示,可用于对所有样本进行预测。尽管事实证明,高复杂性模型能够学习此类表示,但对数据的特定子集进行了培训的专家,可以更有效地推断出标签。然而,使用专家的混合物会提出两个新问题,即(i)在提出新的看不见的样本时分配正确的专家。 (ii)找到培训数据的最佳分区,以使专家最依赖于共同特征。在动态路由(DR)中,提出了一个新颖的体系结构,其中每层由一组专家组成,但是在没有解决这两个挑战的情况下,我们证明该模型可以恢复使用相同的专家子集。在我们的方法中,对多元化的动态路由(DIVDR)进行了明确培训,以解决找到数据相关分区并以无监督的方法分配正确的专家的挑战。我们对MS-Coco的城市景观和对象检测以及实例分割进行了几项实验,显示了几个基线的性能的改善。
translated by 谷歌翻译
我们提出了一个新的灵敏度分析模型,该模型结合了Copulas和在未观察到的混杂状态下的因果推断的标准化。我们将新模型称为$ \ rho $ -gnf($ \ rho $ - graphical正常化流),其中$ \ rho {\ in} [ - 1,+1] $是一个有界灵敏度参数,表示后门非 - 由于未观察到的混杂而引起的因果关系,使用研究最丰富且广泛流行的高斯副群建模。具体而言,$ \ rho $ -gnf使我们能够估计和分析前门因果效应或平均因果效应(ACE)作为$ \ rho $的函数。我们将其称为$ \ rho_ {curve} $。 $ \ rho_ {curve} $使我们能够指定无王牌所需的混杂力量。我们将其称为$ \ rho_ {value} $。此外,$ \ rho_ {curve} $还使我们能够为$ \ rho $ values的间隔提供ACE的界限。我们说明了$ \ rho $ -gnf的好处,并通过对我们的经验王牌界限的实验比其他流行的王牌范围更狭窄。
translated by 谷歌翻译
作物疾病显着影响农业生产的数量和质量。在精确农业的目标是最大程度地减少甚至避免使用农药的目的,具有深度学习的天气和遥感数据可以在检测作物疾病中发挥关键作用,从而允许对农作物的局部治疗。但是,将天气和图像等异质数据结合在一起仍然是一个热门话题和具有挑战性的任务。变压器体系结构的最新发展显示了从不同领域(例如文本图像)融合数据的可能性。当前的趋势是仅定制一个变压器来创建多模式融合模型。相反,我们提出了一种使用三个变压器实现数据融合的新方法。在本文中,我们首先通过使用ConvlstM模型来插值来解决缺失的卫星图像问题。然后,提出了一种多模式融合体系结构,该体系结构共同学习处理视觉和天气信息。该体系结构是由三个主要组件,一个视觉变压器和两个变压器编码器构建的,可以融合图像和天气方式。所提出的方法的结果有望达到97 \%的总体准确性。
translated by 谷歌翻译
分类器的性能通常是根据测试数据的平均准确性来衡量的。尽管是标准措施,但平均准确性未能表征模型对标签的基本条件定律的拟合度,鉴于特征向量($ y | x $),例如由于模型错误指定,拟合和高维度。在本文中,我们考虑了评估通用二元分类器的拟合优点的基本问题。我们的框架对条件定律$ y | x $没有任何参数假设,并且将其视为黑匣子甲骨文模型,只能通过查询访问。我们将拟合优度评估问题提出作为表格\ [h_0:\ mathbb {e} \ big [d_f \ big({\ sf bern}(\ esta(x))\ | {\ | {\ | {\ | { sf bern}(\ hat {\ eta}(x))\ big)\ big] \ leq \ tau \ ,, \],其中$ d_f $代表$ f $ -DDIVERGENCE函数,$ \ eta(x)$ ,$ \ hat {\ eta}(x)$分别表示功能向量$ x $的真实和估计可能性。我们提出了一个新颖的测试,称为\ grasp用于测试$ H_0 $,无论功能如何(无分配)在有限的样品设置中起作用。我们还提出了为模型-X设置设计的Model-X \ Grasp,其中已知特征向量的联合分布。 Model-X \ Grasp使用此分配信息来实现更好的功率。我们通过广泛的数值实验评估测试的性能。
translated by 谷歌翻译