Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
translated by 谷歌翻译
The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most of the current unsupervised learning methods need to be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, which is a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images and without using any pre-trained models, Swin MAE is still able to learn useful semantic features purely from images. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in terms of the transfer learning results of downstream tasks. The code will be publicly available soon.
translated by 谷歌翻译
Despite the fast advances in high-sigma yield analysis with the help of machine learning techniques in the past decade, one of the main challenges, the curse of dimensionality, which is inevitable when dealing with modern large-scale circuits, remains unsolved. To resolve this challenge, we propose an absolute shrinkage deep kernel learning, ASDK, which automatically identifies the dominant process variation parameters in a nonlinear-correlated deep kernel and acts as a surrogate model to emulate the expensive SPICE simulation. To further improve the yield estimation efficiency, we propose a novel maximization of approximated entropy reduction for an efficient model update, which is also enhanced with parallel batch sampling for parallel computing, making it ready for practical deployment. Experiments on SRAM column circuits demonstrate the superiority of ASDK over the state-of-the-art (SOTA) approaches in terms of accuracy and efficiency with up to 10.3x speedup over SOTA methods.
translated by 谷歌翻译
Breast cancer is one of the common cancers that endanger the health of women globally. Accurate target lesion segmentation is essential for early clinical intervention and postoperative follow-up. Recently, many convolutional neural networks (CNNs) have been proposed to segment breast tumors from ultrasound images. However, the complex ultrasound pattern and the variable tumor shape and size bring challenges to the accurate segmentation of the breast lesion. Motivated by the selective kernel convolution, we introduce an enhanced selective kernel convolution for breast tumor segmentation, which integrates multiple feature map region representations and adaptively recalibrates the weights of these feature map regions from the channel and spatial dimensions. This region recalibration strategy enables the network to focus more on high-contributing region features and mitigate the perturbation of less useful regions. Finally, the enhanced selective kernel convolution is integrated into U-net with deep supervision constraints to adaptively capture the robust representation of breast tumors. Extensive experiments with twelve state-of-the-art deep learning segmentation methods on three public breast ultrasound datasets demonstrate that our method has a more competitive segmentation performance in breast ultrasound images.
translated by 谷歌翻译
与自然图像相比,医学图像很难获取,标签成本很高。作为一种无监督的学习方法,对比学习可以更有效地利用未标记的医学图像。在本文中,我们使用了一种基于变压器的对比学习方法,并通过转移学习创新了对比度学习网络。然后,将输出模型转移到下游腮腺分割任务,该任务改善了测试集上腮腺分割模型的性能。改善的DSC为89.60%,MPA为99.36%,MIOU为85.11%,HD为2.98。与使用监督学习模型作为腮腺分割网络的预训练模型的结果相比,所有四个指标均显示出显着改善。此外,我们发现,通过对比度学习模型对细分网络的改进主要在编码器部分中,因此本文还试图为解码器部分构建对比度学习网络,并讨论了在构建过程中遇到的问题。
translated by 谷歌翻译
动态面部表达识别(FER)数据库为情感计算和应用提供了重要的数据支持。但是,大多数FER数据库都用几个基本的相互排斥性类别注释,并且仅包含一种模式,例如视频。单调的标签和模式无法准确模仿人类的情绪并实现现实世界中的应用。在本文中,我们提出了MAFW,这是一个大型多模式复合情感数据库,野外有10,045个视频Audio剪辑。每个剪辑都有一个复合的情感类别和几个句子,这些句子描述了剪辑中受试者的情感行为。对于复合情绪注释,每个剪辑都被归类为11种广泛使用的情绪中的一个或多个,即愤怒,厌恶,恐惧,幸福,中立,悲伤,惊喜,蔑视,焦虑,焦虑,无助和失望。为了确保标签的高质量,我们通过预期最大化(EM)算法来滤除不可靠的注释,然后获得11个单标签情绪类别和32个多标签情绪类别。据我们所知,MAFW是第一个带有复合情感注释和与情感相关的字幕的野外多模式数据库。此外,我们还提出了一种新型的基于变压器的表达片段特征学习方法,以识别利用不同情绪和方式之间表达变化关系的复合情绪。在MAFW数据库上进行的广泛实验显示了所提出方法的优势,而不是其他最先进的方法对单型和多模式FER的优势。我们的MAFW数据库可从https://mafw-database.github.io/mafw公开获得。
translated by 谷歌翻译
对象检测器的复杂性过度权衡是资源约束视觉任务的关键问题。先前的作品强调了用有效的骨干实现的检测器。在这项工作中,研究了对检测负责人对提案处理的这种权衡的影响。假设提高的检测效率需要范式转移,朝着不平等的建议处理,将更多的计算分配给良好的建议,而不是贫穷的建议。这可以更好地利用可用的计算预算,从而为同一失败提供了更高的精度。我们将其作为一个学习问题提出,目的是将操作员分配给检测头的建议,以便将总计算成本受到限制,并且精确度最大。关键发现是,可以将这种匹配作为一个函数,该函数将每个提案嵌入到操作员的单速代码中。尽管此功能诱导了复杂的动态网络路由机制,但它可以由简单的MLP实现,并通过现成的对象检测器端到端学习。这种“动态建议处理”(DPP)显示出明确的计算复杂性的明确余量,表现出优于最先进的端到端对象检测器(DETR,稀疏R-CNN)。
translated by 谷歌翻译
腮腺肿瘤约占头颈肿瘤的2%至10%。术前肿瘤定位,鉴别诊断以及随后选择适当的腮腺肿瘤治疗方法。然而,这些肿瘤的相对稀有性和高度分散的组织类型使基于术前放射线学对这种肿瘤病变的细微差异诊断造成了未满足的需求。最近,深度学习方法发展迅速,尤其是变形金刚在计算机视觉中击败了传统的卷积神经网络。为计算机视觉任务提出了许多新的基于变压器的网络。在这项研究中,收集了多中心多模束MRI图像。使用了基于变压器的SWIN-UNET。将搅拌,T1和T2模态的MRI图像合并为三通道数据以训练网络。我们实现了对腮腺和肿瘤感兴趣区域的分割。测试集上的模型DSC为88.63%,MPA为99.31%,MIOU为83.99%,HD为3.04。然后在本文中设计了一系列比较实验,以进一步验证算法的分割性能。
translated by 谷歌翻译
皮肤病变的准确和公正检查对于早期诊断和治疗皮肤疾病至关重要。皮肤病变的视觉特征明显差异,因为图像是通过使用不同的成像设备从具有不同病变颜色和形态的患者中收集的。最近的研究报告说,结合卷积神经网络(CNN)是实用的,可以对图像进行分类以早期诊断皮肤疾病。但是,这些连接的CNN的实际使用受到限制,因为这些网络是重量级的,并且不足以处理上下文信息。尽管开发了轻量级网络(例如MobileNetV3和ExcilityNet),以减少参数来实现移动设备上的深神经网络,但功能表示深度不足会限制性能。为了解决现有的局限性,我们开发了一个新的精简神经网络,即Hierattn。 Hierattn采用了一种新颖的深度监督策略,通过使用只有一种训练损失的多阶段和多分支注意力机制来学习本地和全球特征。通过使用皮肤镜图像数据集ISIC2019和智能手机照片数据集PAD-FIFES-20(PAD2020)评估Hierattn的功效。实验结果表明,Hierattn在最先进的轻量级网络中达到了曲线(AUC)下最佳的精度和面积。该代码可从https://github.com/anthonyweidai/hierattn获得。
translated by 谷歌翻译
大型语言模型已被证明可以使用少量学习来实现各种自然语言任务的出色表现,这大大减少了将模型调整到特定应用程序所需的特定任务培训示例的数量。为了进一步了解量表对少量学习的影响,我们培训了一个5400亿个参数,密集激活的变压器语言模型,我们称之为“途径”语言模型棕榈。我们使用Pathways在6144 TPU V4芯片上训练了Palm,这是一种新的ML系统,可在多个TPU POD上进行高效的训练。我们通过在数百种语言理解和产生基准的基准方面实现最先进的学习结果来证明扩展的持续好处。在这些任务中,Palm 540B实现了突破性的表现,在一系列多步推理任务上表现出色,超过了最新的最新表现,并且在最近发布的Big Benchmark上表现优于平均人类表现。大量的大型基础任务显示出与模型量表的不连续改进,这意味着当我们扩展到最大模型时,性能急剧增加。 Palm在多语言任务和源代码生成方面也具有很强的功能,我们在各种基准测试中证明了这一点。我们还提供了有关偏见和毒性的全面分析,并研究了训练数据记忆的程度,相对于模型量表。最后,我们讨论与大语言模型有关的道德考虑,并讨论潜在的缓解策略。
translated by 谷歌翻译