Localizing anatomical landmarks are important tasks in medical image analysis. However, the landmarks to be localized often lack prominent visual features. Their locations are elusive and easily confused with the background, and thus precise localization highly depends on the context formed by their surrounding areas. In addition, the required precision is usually higher than segmentation and object detection tasks. Therefore, localization has its unique challenges different from segmentation or detection. In this paper, we propose a zoom-in attentive network (ZIAN) for anatomical landmark localization in ocular images. First, a coarse-to-fine, or "zoom-in" strategy is utilized to learn the contextualized features in different scales. Then, an attentive fusion module is adopted to aggregate multi-scale features, which consists of 1) a co-attention network with a multiple regions-of-interest (ROIs) scheme that learns complementary features from the multiple ROIs, 2) an attention-based fusion module which integrates the multi-ROIs features and non-ROI features. We evaluated ZIAN on two open challenge tasks, i.e., the fovea localization in fundus images and scleral spur localization in AS-OCT images. Experiments show that ZIAN achieves promising performances and outperforms state-of-the-art localization methods. The source code and trained models of ZIAN are available at https://github.com/leixiaofeng-astar/OMIA9-ZIAN.
translated by 谷歌翻译
营销活动是一系列战略活动,可以促进企业的目标。在真正的工业场景中,营销活动的效果预测非常复杂且具有挑战性,因为通常从观察数据中学到了先验知识,而没有任何营销活动干预。此外,每个主题始终在几个营销活动的干预下同时受到干扰。因此,我们无法轻松解析和评估单个营销活动的效果。据我们所知,目前尚无有效的方法来解决此类问题,即,基于具有多个相互缠绕事件的层次结构对个体级别的预测任务进行建模。在本文中,我们对效果预测任务中涉及的基础解析树的结构进行了深入的分析,并进一步建立了一个层次结构胶囊预测网络(HAPNET)来预测营销活动的影响。基于合成数据和实际数据的广泛结果证明了我们模型比最新方法的优越性,并在实际工业应用中表现出显着的实用性。
translated by 谷歌翻译
主流对象检测器通常由两个子任务组成,包括由两个并行头部实现的分类和回归任务。这种经典的设计范式不可避免地会导致分类得分和本地化质量(IOU)之间的空间分布不一致。因此,本文从知识蒸馏的角度来减轻这种错位。首先,我们观察到,与轻量级学生相比,庞大的老师获得的和谐预测比例更高。基于这个有趣的观察,设计了一种新颖的和谐评分(HS),以估计分类和回归质量的一致性。 HS对两个子任务之间的关系进行建模,并被视为先验知识,以促进学生的和谐预测。其次,这种空间未对准将在提炼特征时会导致选择性区域的选择。为了减轻这个问题,通过灵活平衡分类和回归任务的贡献,提出了一种新颖的任务功能蒸馏(TFD)。最终,HD和TFD构成了所提出的方法,称为任务均衡蒸馏(TBD)。广泛的实验证明了该方法的巨大潜力和概括。具体而言,当配备TBD时,带有Resnet-50的视网膜在可可基准下获得41.0地图,表现优于最近的FGD和FRS。
translated by 谷歌翻译
细粒度的动作识别是计算机视觉中的一项具有挑战性的任务。由于细粒的数据集在空间和时间空间中具有较小的类间变化,因此细粒度的动作识别模型需要良好的时间推理和属性动作语义的歧视。利用CNN捕获高级时空特征表示能力以及变压器在捕获潜在语义和全球依赖性方面的建模效率,我们研究了两个结合CNN视觉骨干和变压器编码器以增强良好粒度动作识别的框架:1)基于编码器学习潜在的时间语义,以及2)多模式视频文本交叉编码器,以利用其他文本输入并学习视觉语义和文本语义之间的交叉关联。我们的实验结果表明,我们的变压器编码器框架有效地学习潜在的时间语义和跨模式关联,并且比CNN视觉模型改善了识别性能。我们在firgym基准数据集上实现了新的最先进的性能,用于两种拟议的架构。
translated by 谷歌翻译
甲状腺结节分类旨在根据给定的超声图像确定结节是良性还是恶性。但是,通过细胞学活检获得的标签是临床医学的黄金标准,并不总是与超声成像TI-RADS标准一致。两者之间的信息差异导致现有的基于深度学习的分类方法具有优柔寡断。为了解决不一致的标签问题,我们提出了一个自适应课程学习(ACL)框架,该框架可以自适应地发现并用不一致的标签丢弃样品。具体而言,ACL同时考虑了硬样品和模型确定性,并且可以准确确定用不一致的标签区分样品的阈值。此外,我们贡献了TNCD:甲状腺结节分类数据集,以促进对甲状腺结节的未来相关研究。基于三个不同的骨干网络的TNCD的广泛实验结果不仅证明了我们方法的优势,而且证明了较少的IS原理在战略上以不一致​​的标签抛弃样品可以产生性能提高。源代码和数据可从https://github.com/chenghui-666/acl/获得。
translated by 谷歌翻译
在本文中,我们介绍了VCSL(视频复制段本地化),这是一种新的综合段级注释的视频复制数据集。与受视频级注释或小规模限制的现有复制检测数据集相比,VCSL不仅具有两个段级标签的数据级,其中有160k现实的视频副本对,其中包含超过280k的本地化copied seggment对,而且还包含超过280k涵盖各种视频类别和各种视频持续时间。每个收集的视频对中的所有复制段均经过手动提取,并伴随着精确注释的启动和结束时间戳。除了数据集外,我们还提出了一种新颖的评估协议,该协议可以更好地衡量视频对之间复制重叠段的预测准确性,并在不同情况下显示出改善的适应性。通过使用拟议的数据集和评估指标对几个基线和最先进的细分级视频副本检测方法进行基准测试,我们提供了一项全面的分析,可以揭示当前方法的优势和劣势作品。 VCSL数据集,公制和基准代码均在https://github.com/alipay/vcsl上公开获得。
translated by 谷歌翻译
从“Internet AI”的时代到“体现AI”的时代,AI算法和代理商出现了一个新兴范式转变,其中不再从主要来自Internet策划的图像,视频或文本的数据集。相反,他们通过与与人类类似的Enocentric感知来通过与其环境的互动学习。因此,对体现AI模拟器的需求存在大幅增长,以支持各种体现的AI研究任务。这种越来越多的体现AI兴趣是有利于对人工综合情报(AGI)的更大追求,但对这一领域并无一直存在当代和全面的调查。本文旨在向体现AI领域提供百科全书的调查,从其模拟器到其研究。通过使用我们提出的七种功能评估九个当前体现的AI模拟器,旨在了解模拟器,以其在体现AI研究和其局限性中使用。最后,本文调查了体现AI - 视觉探索,视觉导航和体现问题的三个主要研究任务(QA),涵盖了最先进的方法,评估指标和数据集。最后,随着通过测量该领域的新见解,本文将为仿真器 - 任务选择和建议提供关于该领域的未来方向的建议。
translated by 谷歌翻译
Tiktok是一个受欢迎的新社交媒体,用户通过短视频剪辑表达自己。平台上的常见互动形式参与了“挑战”,这是用户迭代的歌曲和舞蹈。挑战传染可以通过复制范围来衡量,即用户上传他们参与挑战的视频。 Tiktok平台的唯一性,其中挑战内容和用户偏好都在不断发展,需要挑战和用户表示的组合。本文通过预测用户的参与调查Tiktok挑战的社会传染。我们提出了一种新的深度学习模型,深度学习模型,学习和组合潜在的用户和挑战表格,以执行此用户挑战预测任务。我们从Fortoupage,App的登陆页面上的12个趋势挑战收集超过7,000个视频的数据集,从1303名用户提供超过10,000个视频。进行了广泛的实验,结果表明,我们所提出的Deepballenger(F1 = 0.494)在预测任务中优于基线(F1 = 0.188)。
translated by 谷歌翻译
Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the transfer learning performance on the given target task. The framework consists of modality analysis, RoI (region of interest) analysis, and transferability estimation, such that the source task selection can be refined step by step. Specifically, we adapt the state-of-the-art analytical transferability estimation metrics to medical image segmentation tasks and further show that their performance can be significantly boosted by filtering candidate source tasks based on modality and RoI characteristics. Our experiments on brain matter, brain tumor, and white matter hyperintensities segmentation datasets reveal that transferring from different tasks under the same modality is often more successful than transferring from the same task under different modalities. Furthermore, within the same modality, transferring from the source task that has stronger RoI shape similarity with the target task can significantly improve the final transfer performance. And such similarity can be captured using the Structural Similarity index in the label space.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译