Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Recently, webly supervised learning (WSL) has been studied to leverage numerous and accessible data from the Internet. Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain. However, only by tackling the performance gap above can we fully exploit the practical value of web datasets. To this end, we propose a Few-shot guided Prototypical (FoPro) representation learning method, which only needs a few labeled examples from reality and can significantly improve the performance in the real-world domain. Specifically, we initialize each class center with few-shot real-world data as the ``realistic" prototype. Then, the intra-class distance between web instances and ``realistic" prototypes is narrowed by contrastive learning. Finally, we measure image-prototype distance with a learnable metric. Prototypes are polished by adjacent high-quality web images and involved in removing distant out-of-distribution samples. In experiments, FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Compared with existing WSL methods under the same few-shot settings, FoPro still excels in real-world generalization. Code is available at https://github.com/yuleiqin/fopro.
translated by 谷歌翻译
Nerf-based Generative models have shown impressive capacity in generating high-quality images with consistent 3D geometry. Despite successful synthesis of fake identity images randomly sampled from latent space, adopting these models for generating face images of real subjects is still a challenging task due to its so-called inversion issue. In this paper, we propose a universal method to surgically fine-tune these NeRF-GAN models in order to achieve high-fidelity animation of real subjects only by a single image. Given the optimized latent code for an out-of-domain real image, we employ 2D loss functions on the rendered image to reduce the identity gap. Furthermore, our method leverages explicit and implicit 3D regularizations using the in-domain neighborhood samples around the optimized latent code to remove geometrical and visual artifacts. Our experiments confirm the effectiveness of our method in realistic, high-fidelity, and 3D consistent animation of real faces on multiple NeRF-GAN models across different datasets.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions. Some recent works separately predict simple primitives (i.e., states and objects) to reduce cardinality. However, they consider simple primitives as independent probability distributions, ignoring the heavy dependence between states, objects, and compositions. In this paper, we model the dependence of compositions via feasibility and contextuality. Feasibility-dependence refers to the unequal feasibility relations between simple primitives, e.g., \textit{hairy} is more feasible with \textit{cat} than with \textit{building} in the real world. Contextuality-dependence represents the contextual variance in images, e.g., \textit{cat} shows diverse appearances under the state of \textit{dry} and \textit{wet}. We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively. SA captures semantics in compositions to alleviate impossible predictions, driven by the visual similarity between simple primitives. KD disentangles images into unbiased feature representations, easing contextual bias in predictions. Moreover, we complement the current compositional probability model with feasibility and contextuality in a compatible format. Finally, we conduct comprehensive experiments to analyze and validate the superior or competitive performance of our model, Semantic Attention and knowledge Disentanglement guided Simple Primitives (SAD-SP), on three widely-used benchmark OW-CZSL datasets.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
立体声匹配是许多视觉和机器人应用程序的基本构建块。信息性和简洁的成本量表示对于高准确性和效率的立体声匹配至关重要。在本文中,我们提出了一种新颖的成本量构建方法,称为“注意串联量”(ACV),该方法从相关线索中产生了注意力权重,以抑制冗余信息并增强串联体积中与匹配相关的信息。 ACV可以无缝嵌入大多数立体声匹配网络中,所得网络可以使用更轻巧的聚合网络,同时获得更高的精度。我们进一步设计了快速版本的ACV版本以实现实时性能,名为FAST-ACV,它产生了很高的可能性差异假设,以及来自低分辨率相关线索的相应注意力权重,可显着降低计算和记忆成本,同时保持令人满意的精度。我们快速ACV的核心思想是音量注意传播(VAP),它可以自动从上采样相关量中选择准确的相关值,并将这些准确的值传播到周围环境像素具有模棱两可的相关线索。此外,我们分别基于我们的ACV和Fast-ACV设计了高度准确的网络ACVNET和实时网络快速ACVNET,该网络在几个基准上实现了最新性能(即,我们的ACVNET排名第二,第二名在Kitti 2015和场景流以及所有已发布方法中的Kitti 2012和Eth3d的第三次;我们的快速ACVNET几乎优于现场流的所有最新实时方法,Kitti 2012和2015年,与此同时,与此同时更好的概括能力)
translated by 谷歌翻译
详细的肺气道分割是支撑周围肺癌病变的支撑室干预和治疗的临床重要任务。卷积神经网络(CNN)是医学图像分析的有前途的工具,但对于出现不平衡功能分布的情况,案件的性能较差,这对于气道数据是正确的,因为气管和主要支气管在大部分voxels中占主导支气管和远端节段支气管仅占用一小部分。在本文中,我们提出了一个可区分的拓扑保存距离变换(DTPDT)框架,以提高气道分割的性能。首先提出了拓扑保存的替代(TPS)学习策略,以均衡课堂分布的培训进度。此外,卷积距离变换(CDT)旨在识别具有提高灵敏度的破裂现象,从而最大程度地减少了预测和地面真实之间距离图的变化。提出的方法已通过公开可用的参考气道细分数据集进行验证。
translated by 谷歌翻译
聚类是基于它们的相似性对组对象的重要探索性数据分析技术。广泛使用的$ k $ -MEANS聚类方法依赖于一些距离的概念将数据划分为较少数量的组。在欧几里得空间中,$ k $ -Means的基于质心和基于距离的公式相同。在现代机器学习应用中,数据通常是作为概率分布而出现的,并且可以使用最佳运输指标来处理测量值数据。由于瓦斯坦斯坦空间的非负亚历山德罗夫曲率,巴里中心遭受了规律性和非舒适性问题。 Wasserstein Barycenters的特殊行为可能使基于质心的配方无法代表集群内的数据点,而基于距离的$ K $ -MEANS方法及其半决赛计划(SDP)可以恢复真实的方法集群标签。在聚集高斯分布的特殊情况下,我们表明SDP放松的Wasserstein $ k $ - 金钱可以实现精确的恢复,因为这些集群按照$ 2 $ - WASSERSTEIN MERTRIC进行了良好的分离。我们的仿真和真实数据示例还表明,基于距离的$ K $ -Means可以比基于标准的基于质心的$ k $ -Means获得更好的分类性能,用于聚类概率分布和图像。
translated by 谷歌翻译
关于神经体系结构搜索(NAS)的现有研究主要集中于有效地搜索具有更好性能的网络体系结构。几乎没有取得进展,以系统地了解NAS搜索的架构是否对隐私攻击是强大的,而丰富的工作已经表明,人类设计的架构容易受到隐私攻击。在本文中,我们填补了这一空白,并系统地衡量了NAS体系结构的隐私风险。利用我们的测量研究中的见解,我们进一步探索了基于细胞的NAS架构的细胞模式,并评估细胞模式如何影响NAS搜索架构的隐私风险。通过广泛的实验,我们阐明了如何针对隐私攻击设计强大的NAS体系结构,还提供了一种通用方法,以了解NAS搜索的体系结构与其他隐私风险之间的隐藏相关性。
translated by 谷歌翻译