Transformers are becoming increasingly popular due to their superior performance over conventional convolutional neural networks(CNNs). However, transformers usually require a much larger amount of memory to train than CNNs, which prevents their application in many low resource settings. Local learning, which divides the network into several distinct modules and trains them individually, is a promising alternative to the end-to-end (E2E) training approach to reduce the amount of memory for training and to increase parallelism. This paper is the first to apply Local Learning on transformers for this purpose. The standard CNN-based local learning method, InfoPro [32], reconstructs the input images for each module in a CNN. However, reconstructing the entire image does not generalize well. In this paper, we propose a new mechanism for each local module, where instead of reconstructing the entire image, we reconstruct its input features, generated from previous modules. We evaluate our approach on 4 commonly used datasets and 3 commonly used decoder structures on Swin-Tiny. The experiments show that our approach outperforms InfoPro-Transformer, the InfoPro with Transfomer backbone we introduced, by at up to 0.58% on CIFAR-10, CIFAR-100, STL-10 and SVHN datasets, while using up to 12% less memory. Compared to the E2E approach, we require 36% less GPU memory when the network is divided into 2 modules and 45% less GPU memory when the network is divided into 4 modules.
translated by 谷歌翻译
Dense prediction tasks such as segmentation and detection of pathological entities hold crucial clinical value in the digital pathology workflow. However, obtaining dense annotations on large cohorts is usually tedious and expensive. Contrastive learning (CL) is thus often employed to leverage large volumes of unlabeled data to pre-train the backbone network. To boost CL for dense prediction, some studies have proposed variations of dense matching objectives in pre-training. However, our analysis shows that employing existing dense matching strategies on histopathology images enforces invariance among incorrect pairs of dense features and, thus, is imprecise. To address this, we propose a precise location-based matching mechanism that utilizes the overlapping information between geometric transformations to precisely match regions in two augmentations. Extensive experiments on two pretraining datasets (TCGA-BRCA, NCT-CRC-HE) and three downstream datasets (GlaS, CRAG, BCSS) highlight the superiority of our method in semantic and instance segmentation tasks. Our method outperforms previous dense matching methods by up to 7.2 % in average precision for detection and 5.6 % in average precision for instance segmentation tasks. Additionally, by using our matching mechanism in the three popular contrastive learning frameworks, MoCo-v2, VICRegL and ConCL, the average precision in detection is improved by 0.7 % to 5.2 % and the average precision in segmentation is improved by 0.7 % to 4.0 %, demonstrating its generalizability.
translated by 谷歌翻译
人工智能(AI)已被广泛应用于药物发现中,其主要任务是分子财产预测。尽管分子表示学习中AI技术的繁荣,但尚未仔细检查分子性质预测的一些关键方面。在这项研究中,我们对三个代表性模型,即随机森林,莫尔伯特和格罗弗进行了系统比较,该模型分别利用了三个主要的分子表示,扩展连接的指纹,微笑的字符串和分子图。值得注意的是,莫尔伯特(Molbert)和格罗弗(Grover)以自我监督的方式在大规模的无标记分子库中进行了预定。除了常用的分子基准数据集外,我们还组装了一套与阿片类药物相关的数据集进行下游预测评估。我们首先对标签分布和结构分析进行了数据集分析;我们还检查了阿片类药物相关数据集中的活动悬崖问题。然后,我们培训了4,320个预测模型,并评估了学习表示的有用性。此外,我们通过研究统计测试,评估指标和任务设置的效果来探索模型评估。最后,我们将化学空间的概括分解为施加间和支柱内的概括,并测量了预测性能,以评估两种设置下模型的普遍性。通过采取这种喘息,我们反映了分子财产预测的基本关键方面,希望在该领域带来更好的AI技术的意识。
translated by 谷歌翻译
组织病理学全幻灯片图像(WSIS)在临床研究中起着非常重要的作用,并作为许多癌症诊断的黄金标准。但是,由于其巨大尺寸,生成用于处理WSIS的自动工具是具有挑战性的。当前,为了解决这个问题,传统方法依靠多个实例学习(MIL)策略来处理贴剂级别的WSI。尽管有效,但这种方法在计算上很昂贵,因为将WSI整理成斑块需要时间,并且不探索这些瓷砖之间的空间关系。为了解决这些限制,我们提出了一个本地监督的学习框架,该框架通过探索包含的整个本地和全球信息来处理整个幻灯片。该框架将预训练的网络划分为几个模块,并使用辅助模型在本地优化每个模块。我们还引入了一个随机特征重建单元(RFR),以在训练过程中保留区分特征,并将方法的性能提高1%至3%。对三个公开可用的WSI数据集进行了广泛的实验:TCGA-NSCLC,TCGA-RCC和LKS,突出了我们方法在不同分类任务上的优越性。我们的方法的准确性优于最先进的MIL方法,而高7至10倍。此外,将其分为八个模块时,我们的方法需要端到端培训所需的GPU总内存总数的20%。我们的代码可从https://github.com/cvlab-stonybrook/local_learning_wsi获得。
translated by 谷歌翻译
人类凝视行为的预测对于构建可以预见用户注意力的人类计算机交互式系统很重要。已经开发了计算机视觉模型,以预测人们在寻找目标对象时进行的固定。但是,何时没有目标呢?同样重要的是要知道人们在找不到目标时如何搜索以及何时停止搜索。在本文中,我们提出了第一个以数据驱动的计算模型来解决搜索终止问题,并预测了搜索未出现在图像中的目标的人进行的搜索固定的扫描路径。我们将视觉搜索建模为模仿学习问题,并代表观众通过使用新颖的状态表示来获取的内部知识,我们称之为foveated特征映射(FFMS)。 FFMS将模拟的散发性视网膜集成到预处理的Convnet中,该转向网络产生网络内功能金字塔,所有这些都具有最小的计算开销。我们的方法将FFMS作为逆增强学习中的状态表示。在实验上,我们在预测可可搜索数据集上的人类目标搜索行为方面提高了最新技术的状态
translated by 谷歌翻译
我们考虑在模型中推断高维数据$ \ mathbf {x} $的问题,该模型由先前的$ p(\ mathbf {x})$和辅助约束$ c(\ mathbf {x},\ mathbf){y})$。在本文中,先验是一个独立训练的denoising扩散生成模型。辅助约束预计将具有可区分的形式,但可能来自不同的来源。这种推理的可能性将扩散模型转换为插件模块,从而允许在适应新域和任务(例如条件生成或图像分割)中进行一系列潜在应用。扩散模型的结构使我们能够通过通过固定的denoising网络迭代分化来执行近似推断,每个步骤在每个步骤中都有不同量的噪声。考虑到评估其健身的许多噪声版本的$ \ mathbf {x} $是一种新颖的搜索机制,可能导致新算法用于解决组合优化问题。
translated by 谷歌翻译
标签昂贵,有时是不可靠的。嘈杂的标签学习,半监督学习和对比学习是三种不同的设计,用于设计需要更少的注释成本的学习过程。最近已经证明了半监督学习和对比学习,以改善使用嘈杂标签地址数据集的学习策略。尽管如此,这些领域之间的内部连接以及将它们的强度结合在一起的可能性仅开始出现。在本文中,我们探讨了融合它们的进一步方法和优势。具体而言,我们提出了CSSL,统一的对比半监督学习算法和Codim(对比DivideMix),一种用嘈杂标签学习的新算法。 CSSL利用经典半监督学习和对比学习技术的力量,并进一步适应了Codim,其从多种类型和标签噪声水平鲁莽地学习。我们表明Codim带来了一致的改进,并在多个基准上实现了最先进的结果。
translated by 谷歌翻译
人工智能(AI)在过去十年中一直在改变药物发现的实践。各种AI技术已在广泛的应用中使用,例如虚拟筛选和药物设计。在本调查中,我们首先概述了药物发现,并讨论了相关的应用,可以减少到两个主要任务,即分子性质预测和分子产生。然后,我们讨论常见的数据资源,分子表示和基准平台。此外,为了总结AI在药物发现中的进展情况,我们介绍了在调查的论文中包括模型架构和学习范式的相关AI技术。我们预计本调查将作为有兴趣在人工智能和药物发现界面工作的研究人员的指南。我们还提供了GitHub存储库(HTTPS:///github.com/dengjianyuan/survey_survey_au_drug_discovery),其中包含文件和代码,如适用,作为定期更新的学习资源。
translated by 谷歌翻译
我们提出了一种新颖的暗影拆除深层学习方法。灵感来自暗影形成的物理模型,我们使用线性照明变换来模拟图像中的阴影效果,允许阴影图像表示为无影子图像,阴影参数和遮罩层的组合。我们使用两个深网络,即SP-Net和M-Net,分别预测阴影参数和阴影遮罩。该系统允许我们删除图像的影子效果。然后,我们采用了一个素食网络,I-Net,以进一步改进结果。我们在最具挑战性的阴影删除数据集(ISTD)上培训并测试我们的框架。我们的方法通过20 \%的阴影区域的根均线误差(RMSE)来改善最先进的。此外,这种分解允许我们制定基于补丁的弱监督暗影去除方法。这种型号可以培训,没有任何暗影图像(非常麻烦的图像),与使用完全配对的阴影和无影子图像训练的最先进的方法相比,实现了竞争阴影去除结果。最后,我们介绍了SBU-timelapse,一个视频阴影删除数据集,用于评估阴影清除方法。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译