We propose a technique for producing 'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable.Our approach -Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say 'dog' in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fullyconnected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multimodal inputs (e.g. visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative vi-
translated by 谷歌翻译
Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric (CIDEr) that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.
translated by 谷歌翻译
Property inference attacks against machine learning (ML) models aim to infer properties of the training data that are unrelated to the primary task of the model, and have so far been formulated as binary decision problems, i.e., whether or not the training data have a certain property. However, in industrial and healthcare applications, the proportion of labels in the training data is quite often also considered sensitive information. In this paper we introduce a new type of property inference attack that unlike binary decision problems in literature, aim at inferring the class label distribution of the training data from parameters of ML classifier models. We propose a method based on \emph{shadow training} and a \emph{meta-classifier} trained on the parameters of the shadow classifiers augmented with the accuracy of the classifiers on auxiliary data. We evaluate the proposed approach for ML classifiers with fully connected neural network architectures. We find that the proposed \emph{meta-classifier} attack provides a maximum relative improvement of $52\%$ over state of the art.
translated by 谷歌翻译
在这项工作中,我们介绍了我们提出的方法,该方法是使用SWIN UNETR和基于U-NET的深神经网络体系结构从CT扫描中分割肺动脉的方法。六个型号,基于SWIN UNETR的三个型号以及基于3D U-NET的三个模型,使用加权平均值来制作最终的分割掩码。我们的团队通过这种方法获得了84.36%的多级骰子得分。我们的工作代码可在以下链接上提供:https://github.com/akansh12/parse2022。这项工作是Miccai Parse 2022挑战的一部分。
translated by 谷歌翻译
基于物理学的数值模型代表了地球系统建模中的最先进,包括我们的最佳工具,用于产生洞察和预测。尽管计算能力快速增长,但对更高模型分辨率的感知需求压倒了最新一代电脑,降低了建模者为理解参数敏感性和表征变异性和不确定性而产生模拟的能力。因此,通常开发了代理模型以捕获全吹制数值的基本属性。最近的机器学习方法的成功,尤其是深度学习,跨越许多学科提供了复杂的非线性连接者表示可能能够捕获地球系统中的底层复杂结构和非线性过程的可能性。基于深度学习的仿真的难度测试,这是指数值模型的近似,是为了了解它们是否可以在计算效率方面与传统形式的代理模型相当,同时再现模型以可靠的方式再现模型。可以预期通过该测试的深度学习仿真,而不是捕获复杂进程和时空依赖性的简单模型来表现更好。在这里,我们检查了基于卫星的遥感的案例研究,深度学习方法可以可靠地代表来自代理模型的模拟,具有可比的计算效率。我们的结果令人鼓舞的是,深度学习仿真以可接受的准确性再现结果,并且往往更快的性能。我们阐明了我们对深度学习的高性能实现的改进步伐的更广泛的影响以及地球科学中更高分辨率模拟的渴望。
translated by 谷歌翻译
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.
translated by 谷歌翻译