Recent advances in spatial omics methods enable the molecular composition of human tumors to be imaged at micron-scale resolution across hundreds of patients and ten to thousands of molecular imaging channels. Large-scale molecular imaging datasets offer a new opportunity to understand how the spatial organization of proteins and cell types within a tumor modulate the response of a patient to different therapeutic strategies and offer potential insights into the design of novel therapies to increase patient response. However, spatial omics datasets require computational analysis methods that can scale to incorporate hundreds to thousands of imaging channels (ie colors) while enabling the extraction of molecular patterns that correlate with treatment responses across large number of patients with potentially heterogeneous tumors presentations. Here, we have develop a machine learning strategy for the identification and design of signaling molecule combinations that predict the degree of immune system engagement with a specific patient tumors. We specifically train a classifier to predict T cell distribution in patient tumors using the images from 30-40 molecular imaging channels. Second, we apply a gradient descent based counterfactual reasoning strategy to the classifier and discover combinations of signaling molecules predicted to increase T cell infiltration. Applied to spatial proteomics data of melanoma tumor, our model predicts that increasing the level of CXCL9, CXCL10, CXCL12, CCL19 and decreasing the level of CCL8 in melanoma tumor will increase T cell infiltration by 10-fold across a cohort of 69 patients. The model predicts that the combination is many fold more effective than single target perturbations. Our work provides a paradigm for machine learning based prediction and design of cancer therapeutics based on classification of immune system activity in spatial omics data.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
我们考虑对具有3D结构的蛋白质的代表性学习。我们基于蛋白质结构构建3D图并开发图形网络以学习其表示形式。根据我们希望捕获的细节级别,可以在不同级别计算蛋白质表示,\ emph {e.g。},氨基酸,骨干或全原子水平。重要的是,不同级别之间存在层次关系。在这项工作中,我们建议开发一个新型的层次图网络(称为pronet)来捕获关系。我们的pronet非常灵活,可用于计算不同水平粒度的蛋白质表示。我们表明,鉴于完整的基本3D图网络,我们的PRONET表示在所有级别上也已完成。为了关闭循环,我们开发了一个完整有效的3D图网络,以用作基本模型,从而使我们的pronet完成。我们对多个下游任务进行实验。结果表明,PRONET优于大多数数据集上的最新方法。此外,结果表明,不同的下游任务可能需要不同级别的表示。我们的代码可作为DIG库的一部分(\ url {https://github.com/divelab/dig})。
translated by 谷歌翻译
我们为姿势传输任务提供了一种定制的3D网格变压器模型。随着3D姿势转移基本上是依赖于给定网格的变形过程,这项工作的直觉是在具有强大的自我关注机制之间感知给定网格之间的几何不一致。具体而言,我们提出了一种新的几何对比变压器,其具有高效的3D结构感知能力,对给定网格的全局几何不一致。此外,在本地,进一步提出了一种简单但高效的中央测地对比损失,以改善区域几何不一致学习。最后,我们将潜在的等距正则化模块与新的半合成数据集一起呈现,用于跨DataSet 3D姿势传输任务对未知空间。大规模的实验结果证明了我们对SMPL-NPT,浮点和新建议的数据集SMG-3D数据集的最新定量表演的效果,以及在MG布和SMAL数据集中有前途的定性结果。结果证明,我们的方法可以实现鲁棒3D姿势传输,并且广泛地挑战来自跨数据集任务的未知空间的网格。代码和数据集可用。代码可用:https://github.com/mikecheninoulu/cgt。
translated by 谷歌翻译
面部反欺骗(FAS)在确保人脸识别系统中起着至关重要的作用。经验上,给定图像,在该图像的不同视图上具有更一致的输出的模型通常更好地执行,如图1所示。通过这种令人兴奋的观察,我们猜想令人鼓舞的特征符合不同视图的一致性可能是提升FAS模型的有希望的方法。在本文中,我们通过增强FAS中的嵌入级和预测级别一致性正规(EPCR)来彻底探讨这种方式。具体地,在嵌入级别,我们设计了密集的相似性损失,以最大化两个中间特征映射的所有位置之间以自我监督的方式;虽然在预测级别,我们优化了两个视图的预测之间的均方误差。值得注意的是,我们的EPCR没有注释,可以直接融入半监督的学习计划。考虑到不同的应用方案,我们进一步设计了五种不同的半监督协议,以衡量半监督的FAS技术。我们进行广泛的实验表明EPCR可以显着提高基准数据集上几个监督和半监控任务的性能。代码和协议即将发布。
translated by 谷歌翻译
面部反欺骗(FAS)保护人脸识别从演示攻击(PAS)。现有的FAS方法通常监控PA探测器,使用手工制作二进制或像素 - 明智标签。然而,手工制作的标签可能不是监督PA探测器学习足够和内在欺骗线索的最适当的方式。我们提出了一种新的元师FAS(MT-FAS)方法来培训Meta-Tears FAS(MT-FAS)方法,用于更有效地监督PA探测器。元师以双层优化方式接受培训,以了解监督PA探测器学习丰富欺骗线索的能力。双级优化包含两个关键组件:1)元教师在培训集上监督探测器的学习过程的较低级别培训; 2)通过最大限度地减少探测器的验证损失,优化了元教师教学性能的更高级别培训。我们的元老师与现有的教师学生模型有很大不同,因为元教师明确培训,以便更好地教导探测器(学生),而现有教师受过卓越的准确性忽视教学能力。在五个FAS基准上的广泛实验表明,随着拟议的MT-FA,训练有素的Meta-Buiter 1)提供比手工标签和现有教师学生模型更适合的监督; 2)显着提高了PA探测器的性能。
translated by 谷歌翻译
在本文中,我们用relu,正弦和$ 2^x $构建神经网络作为激活功能。对于$ [0,1]^d $定义的一般连续$ f $,带有连续模量$ \ omega_f(\ cdot)$,我们构造了Relu-sine- $ 2^x $网络,这些网络享受近似值$ \ MATHCAL {o }(\ omega_f(\ sqrt {d})\ cdot2^{ - m}+\ omega_ {f} \ in \ Mathbb {n}^{+} $表示与网络宽度相关的超参数。结果,我们可以构建Relu-Sine- $ 2^x $网络,其深度为$ 5 $和宽度$ \ max \ left \ weft \ {\ left \ lceil2d^{3/2} \ left(\ frac {3 \ mu}) {\ epsilon} \ right)^{1/{\ alpha}} \ right \ rceil,2 \ left \ lceil \ log_2 \ frac {3 \ mu d^{\ alpha/2}} \ rceil+2 \ right \} $ tht \ Mathcal {h} _ {\ mu}^{\ alpha}([0,1]^d)$近似$ f \以$ l^p $ norm $ p \在[1,\ infty)$中的测量,其中$ \ mathcal {h} _ {\ mu}^{\ alpha}(\ alpha}([0,1]^d)$表示H \“ $ [0,1]^d $定义的旧连续函数类,带有订单$ \ alpha \ in(0,1] $和常数$ \ mu> 0 $。因此,relu-sine- $ 2^x $网络克服了$ \ Mathcal {h} _ {\ mu}^{\ alpha}([0,1]^d)$。除了其晚餐表达能力外,由relu-sine- $ 2实施的功能,也克服了维度的诅咒。 ^x $网络是(广义)可区分的,使我们能够将SGD应用于训练。
translated by 谷歌翻译
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tilde{\Omega}(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.
translated by 谷歌翻译
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .
translated by 谷歌翻译
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that - across 6 architectures trained with 4 algorithms on massive datasets - they exhibit little compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark CREPE which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of three test datasets. The three test sets are designed to test models trained on three of the popular training datasets: CC-12M, YFCC-15M, and LAION-400M. They contain 385K, 385K, and 373K image-text pairs and 237K, 210K, and 178K hard negative captions. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 246K hard negative captions with atomic, swapping, and negation foils. The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3. For systematicity, we find that model performance decreases consistently when novel compositions dominate the retrieval set, with Recall@1 dropping by up to 8%. For productivity, models' retrieval success decays as complexity increases, frequently nearing random chance at high complexity. These results hold regardless of model and training dataset size.
translated by 谷歌翻译