Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Fine-grained classification and counting of bone marrow erythroid cells are vital for evaluating the health status and formulating therapeutic schedules for leukemia or hematopathy. Due to the subtle visual differences between different types of erythroid cells, it is challenging to apply existing image-based deep learning models for fine-grained erythroid cell classification. Moreover, there is no large open-source datasets on erythroid cells to support the model training. In this paper, we introduce BMEC (Bone Morrow Erythroid Cells), the first large fine-grained image dataset of erythroid cells, to facilitate more deep learning research on erythroid cells. BMEC contains 5,666 images of individual erythroid cells, each of which is extracted from the bone marrow erythroid cell smears and professionally annotated to one of the four types of erythroid cells. To distinguish the erythroid cells, one key indicator is the cell shape which is closely related to the cell growth and maturation. Therefore, we design a novel shape-aware image classification network for fine-grained erythroid cell classification. The shape feature is extracted from the shape mask image and aggregated to the raw image feature with a shape attention module. With the shape-attended image feature, our network achieved superior classification performance (81.12\% top-1 accuracy) on the BMEC dataset comparing to the baseline methods. Ablation studies also demonstrate the effectiveness of incorporating the shape information for the fine-grained cell classification. To further verify the generalizability of our method, we tested our network on two additional public white blood cells (WBC) datasets and the results show our shape-aware method can generally outperform recent state-of-the-art works on classifying the WBC. The code and BMEC dataset can be found on https://github.com/wangye8899/BMEC.
translated by 谷歌翻译
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
事实证明,大脑时代是与认知性能和脑部疾病相关的表型。实现准确的脑年龄预测是优化预测的脑时代差异作为生物标志物的必要先决条件。作为一种综合的生物学特征,很难使用特征工程和局部处理的模型来准确利用大脑时代,例如局部卷积和经常性操作,这些操作一次是一次处理一个本地社区。取而代之的是,视觉变形金刚学习斑块令牌的全球专注相互作用,引入了较少的电感偏见和建模长期依赖性。就此而言,我们提出了一个新的网络,用于学习大脑年龄,以全球和局部依赖性解释,其中相应的表示由连续排列的变压器(SPT)和卷积块捕获。 SPT带来了计算效率,并通过从不同视图中连续编码2D切片间接地定位3D空间信息。最后,我们收集了一大批22645名受试者,年龄范围从14到97,我们的网络在一系列深度学习方法中表现最好,在验证集中产生了平均绝对错误(MAE)为2.855,而在独立方面产生了2.911测试集。
translated by 谷歌翻译
来自计算机断层扫描血管造影(CTA)的肾脏结构分割对于许多计算机辅助的肾脏癌治疗应用至关重要。肾脏解析〜(KIPA 2022)挑战旨在建立细粒度的多结构数据集并改善多个肾脏结构的分割。最近,U-NET主导了医疗图像分割。在KIPA挑战中,我们评估了几个U-NET变体,并选择了最终提交的最佳模型。
translated by 谷歌翻译
布局生成是计算机视觉中的一项新任务,它结合了对象本地化和美学评估中的挑战,在广告,海报和幻灯片设计中广泛使用。准确而愉快的布局应考虑布局元素内的内域关系以及布局元素与图像之间的域间关系。但是,大多数以前的方法只是专注于图像 - 范围 - 不平衡的布局生成,而无需利用图像中复杂的视觉信息。为此,我们探索了一个名为“图像条件的布局生成”的新颖范式,该范式旨在以语义连贯的方式将文本叠加层添加到图像中。具体而言,我们提出了一个图像条件的变分变压器(ICVT),该变形变压器(ICVT)在图像中生成各种布局。首先,采用自我注意的机制来对布局元素内的上下文关系进行建模,而交叉注意机制用于融合条件图像的视觉信息。随后,我们将它们作为有条件变异自动编码器(CVAE)的构件,表现出吸引人的多样性。其次,为了减轻布局元素域和视觉域之间的差距,我们设计了一个几何对齐模块,其中图像的几何信息与布局表示形式对齐。此外,我们构建了一个大规模的广告海报布局设计数据集,并具有精致的布局和显着图。实验结果表明,我们的模型可以在图像的非侵入区域中自适应生成布局,从而产生和谐的布局设计。
translated by 谷歌翻译
与单案摘要相比,抽象性多文件摘要(MDS)对其冗长和链接的来源的表示和覆盖范围提出了挑战。这项研究开发了一个平行的层次变压器(PHT),具有MDS的注意对齐。通过合并单词和段落级的多头注意,PHT的层次结构可以更好地处理令牌和文档级别的依赖项。为了指导解码到更好的源文档覆盖范围,然后将注意力调整机制引入以校准光束搜索,并预测的最佳注意力分布。根据Wikisum数据,进行了全面的评估,以测试拟议的体系结构对MD的改进。通过更好地处理内部和跨文档的信息,结果胭脂和人类评估都表明,我们的分层模型以相对较低的计算成本生成较高质量的摘要。
translated by 谷歌翻译
在各种图形相关的任务中出现了计算两个图之间的距离/相似性的图形相似性测量。最近的基于学习的方法缺乏可解释性,因为它们直接将两个图之间的交互信息转换为一个隐藏的向量,然后将其映射到相似性。为了解决这个问题,这项研究提出了图形相似性学习的端到端更容易解释的范式,并通过最大的常见子图推理(INFMC)命名相似性计算。我们对INFMCS的关键见解是相似性评分与最大公共子图(MCS)之间的牢固相关性。我们隐含地推断MC获得标准化的MCS大小,其监督信息仅在训练过程中的相似性得分。为了捕获更多的全局信息,我们还使用图形卷积层堆叠一些香草变压器编码层,并提出一种新颖的置换不变的节点位置编码。整个模型非常简单却有效。全面的实验表明,INFMC始终优于用于图形分类和回归任务的最先进基线。消融实验验证了提出的计算范式和其他组件的有效性。同样,结果的可视化和统计数据揭示了INFMC的解释性。
translated by 谷歌翻译