Generative models are becoming ever more powerful, being able to synthesize highly realistic images. We propose an algorithm for taming these models - changing the probability that the model will produce a specific image or image category. We consider generative models that are powered by normalizing flows, which allows us to reason about the exact generation probability likelihood for a given image. Our method is general purpose, and we exemplify it using models that generate human faces, a subdomain with many interesting privacy and bias considerations. Our method can be used in the context of privacy, e.g., removing a specific person from the output of a model, and also in the context of de-biasing by forcing a model to output specific image categories according to a given target distribution. Our method uses a fast fine-tuning process without retraining the model from scratch, achieving the goal in less than 1% of the time taken to initially train the generative model. We evaluate qualitatively and quantitatively, to examine the success of the taming process and output quality.
translated by 谷歌翻译
异常检测是一种既定的研究区,寻求识别出预定分布外的样本。异常检测管道由两个主要阶段组成:(1)特征提取和(2)正常评分分配。最近的论文使用预先训练的网络进行特征提取,实现最先进的结果。然而,使用预先训练的网络没有完全利用火车时间可用的正常样本。本文建议通过使用教师学生培训利用此信息。在我们的环境中,佩带的教师网络用于训练正常训练样本上的学生网络。由于学生网络仅在正常样本上培训,因此预计将偏离异常情况下的教师网络。这种差异可以用作预先训练的特征向量的互补表示。我们的方法 - 变换 - 利用预先训练的视觉变压器(VIV)来提取两个特征向量:预先接受的(不可知论者)功能和教师 - 学生(微调)功能。我们报告最先进的AUROC导致共同的单向设置,其中一个类被认为是正常的,其余的被认为是异常的,并且多模式设置,其中所有类别但是一个被认为是正常的,只有一个类被认为是异常的。代码可在https://github.com/matancohen1/transformaly获得。
translated by 谷歌翻译
基于深度学习的面部识别(FR)模型在过去几年中表现出最先进的性能,即使在佩戴防护医疗面罩时,面膜在Covid-19大流行期间变得普遍。鉴于这些模型的出色表现,机器学习研究界已经表明对挑战其稳健性越来越令人兴趣。最初,研究人员在数字域中呈现了对抗性攻击,后来将攻击转移到物理领域。然而,在许多情况下,物理领域的攻击是显眼的,例如,需要在脸上放置贴纸,因此可能会在真实环境中引起怀疑(例如,机场)。在本文中,我们提出了对伪装在面部面罩的最先进的FR模型的身体对抗性掩模,以仔细制作的图案的形式施加在面部面具上。在我们的实验中,我们检查了我们的对抗掩码对广泛的FR模型架构和数据集的可转移性。此外,我们通过在织物医疗面罩上印刷对抗性模式来验证了我们的对抗性面膜效果,使FR系统仅识别穿面膜的3.34%的参与者(相比最低83.34%其他评估的面具)。
translated by 谷歌翻译
Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
translated by 谷歌翻译
Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Some existing approaches extract deep features from pre-trained networks and build a graph to apply classical clustering methods (e.g., $k$-means and normalized-cuts) as a post-processing stage. These techniques reduce the high-dimensional information encoded in the features to pair-wise scalar affinities. In this work, we replace classical clustering algorithms with a lightweight Graph Neural Network (GNN) trained to achieve the same clustering objective function. However, in contrast to existing approaches, we feed the GNN not only the pair-wise affinities between local image features but also the raw features themselves. Maintaining this connection between the raw feature and the clustering goal allows to perform part semantic segmentation implicitly, without requiring additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training our image segmentation GNN. Additionally, we use the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters ($k$-less clustering). We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.
translated by 谷歌翻译
Although prediction models for delirium, a commonly occurring condition during general hospitalization or post-surgery, have not gained huge popularity, their algorithmic bias evaluation is crucial due to the existing association between social determinants of health and delirium risk. In this context, using MIMIC-III and another academic hospital dataset, we present some initial experimental evidence showing how sociodemographic features such as sex and race can impact the model performance across subgroups. With this work, our intent is to initiate a discussion about the intersectionality effects of old age, race and socioeconomic factors on the early-stage detection and prevention of delirium using ML.
translated by 谷歌翻译
本文提出了2022年访问量的挑战的最终结果。 OOV竞赛介绍了一个重要方面,而光学角色识别(OCR)模型通常不会研究,即,在培训时对看不见的场景文本实例的识别。竞赛编制了包含326,385张图像的公共场景文本数据集的集合,其中包含4,864,405个场景文本实例,从而涵盖了广泛的数据分布。形成了一个新的独立验证和测试集,其中包括在训练时出词汇量不超出词汇的场景文本实例。竞争是在两项任务中进行的,分别是端到端和裁剪的文本识别。介绍了基线和不同参与者的结果的详尽分析。有趣的是,在新研究的设置下,当前的最新模型显示出显着的性能差距。我们得出的结论是,在此挑战中提出的OOV数据集将是要探索的重要领域,以开发场景文本模型,以实现更健壮和广义的预测。
translated by 谷歌翻译
在过去的几年中,提出了多种基于深神经网络(DNN)的方法,以解决来自未取消采样的“ K-Space”(傅立叶域)数据的挑战性不足的反向问题。然而,反对采集过程中的变化和解剖学分布的不稳定性表明,与其经典的对应物相比,DNN体系结构对相关物理模型的概括不佳。较差的概括有效地排除了DNN适用于临床环境中不足采样的MRI重建。我们通过引入物理培养的DNN体系结构和培训方法来提高DNN方法的泛化MRI重建能力。除了模型体系结构中观察到的数据外,我们的体系结构还编码底面采样掩码,并采用适当的培训方法,该方法使用与各种无底采样掩码生成的数据一起鼓励模型概括了未散布的MRI重建问题。我们通过对公开可用的快速MRI数据集进行了广泛的实验,证明了我们的方法的附加价值。我们的物理提出的方法达到了增强的概括能力,这使得与获得的稳健性和解剖学分布的变化相比,尤其是在病理区域中,与香草DNN方法和DNN进行了显着提高,并在病理区域中进行了显着提高,并且受过培训的DNN训练,并接受了强烈的掩盖掩模的增强。接受训练的模型和代码以复制我们的实验,将在接受后用于研究目的。
translated by 谷歌翻译
网络分类旨在根据其结构将网络(或图形)分为不同的类别。我们研究网络及其组成节点的分类之间的联系,以及不同组网络的节点是否基于结构性节点特征,例如中心性和聚类系数。我们使用各种网络数据集和随机网络模型证明,可以训练分类器以准确预测给定节点的网络类别(不看到整个网络),这意味着复杂的网络即使在节点级别也显示出不同的结构模式。最后,我们讨论节点级网络分类的两个应用程序:(i)节点小样本和(ii)网络引导程序的全网络分类。
translated by 谷歌翻译
通过滚动式摄像机获得的视频导致空间延伸的帧。在快速相机/场景动作下,这些扭曲变得很重要。 RS的撤消效果有时被称为空间问题,需要对象进行整流/流离失所,以生成其正确的全局快门(GS)帧。但是,RS效应的原因是固有的,而不是空间。在本文中,我们为RS问题提出了一个时空解决方案。我们观察到,尽管它们的XY帧,RS视频及其相应的GS视频之间存在严重差异,但往往共享完全相同的XT片 - 直到已知的子帧时间变化。此外,尽管每个视频中都有强烈的时间别名,但它们共享相同的小型2D XT-Patches的分布。这允许使用RS输入视频施加的视频特定约束来限制GS输出视频。我们的算法由3个主要组成部分组成:(i)使用现成方法(通过常规视频序列训练)在连续的RS帧之间进行密集的时间上采样,从中我们提取GS“建议”。 (ii)学习使用专用Mergenet正确合并此类GS的“建议”。 (iii)特定于视频的零拍优化,该优化构成了GS输出视频和RS输入视频之间XT-Patches的相似性。我们的方法在基准数据集上获得了最新的结果,尽管在小型合成RS/GS数据集上进行了培训,但在数值和视觉上都获得了最新结果。此外,它可以很好地概括到具有运动类型的新的复杂RS视频(例如,复杂的非刚性动作)之外的运动类型 - 竞争对更多数据训练的竞争方法的视频无法很好地处理。我们将这些概括功能归因于外部和内部约束的组合。
translated by 谷歌翻译