重用最初对不同数据训练的模型重用以提高下游任务性能是普遍的做法。尤其是在计算机视觉域中,已成功用于各种任务。在这项工作中,我们研究了转移学习对细分问题的影响,是可以用编码器decoder架构来解决的细分分类问题。我们发现,转移学习解码器不能帮助下游细分任务,而转移学习编码器确实是有益的。我们证明,解码器的预估量权重可能会产生更快的收敛性,但是它们不能改善整体模型性能,因为人们可以通过随机初始化的解码器获得等效的结果。但是,我们表明,重复使用在细分或重建任务上训练的编码器权重比重复对分类任务训练的编码器权重相比,更有效。这一发现暗示,使用ImageNet预言的编码器解决下游分割问题是次优的。我们还提出了一种具有多个自我重建任务的对比自我监督的方法,该方法提供了适合在没有分割标签的情况下在分割问题中转移学习的编码器。
translated by 谷歌翻译
在转移学习中,只有网络的最后一部分 - 所谓的头部 - 经常进行微调。表示相似性分析表明,即使所有权重都可以更新,最重要的变化仍会发生在头部中。但是,很少有射击学习的最新结果表明,早期层中的表示变化(主要是卷积)是有益的,尤其是在跨域适应的情况下。在我们的论文中,我们发现这是否也适用于转移学习。此外,我们分析了在预训练和微调过程中转移学习中表示的变化,并确定如果不可用的话,预先训练的结构将是未训练的。
translated by 谷歌翻译
Creating compelling captions for data visualizations has been a longstanding challenge. Visualization researchers are typically untrained in journalistic reporting and hence the captions that are placed below data visualizations tend to be not overly engaging and rather just stick to basic observations about the data. In this work we explore the opportunities offered by the newly emerging crop of large language models (LLM) which use sophisticated deep learning technology to produce human-like prose. We ask, can these powerful software devices be purposed to produce engaging captions for generic data visualizations like a scatterplot. It turns out that the key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering. We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We report on experiments for the fingerprint modality conducted during the First BioSecure Residential Workshop. Two reference systems for fingerprint verification have been tested together with two additional non-reference systems. These systems follow different approaches of fingerprint processing and are discussed in detail. Fusion experiments I volving different combinations of the available systems are presented. The experimental results show that the best recognition strategy involves both minutiae-based and correlation-based measurements. Regarding the fusion experiments, the best relative improvement is obtained when fusing systems that are based on heterogeneous strategies for feature extraction and/or matching. The best combinations of two/three/four systems always include the best individual systems whereas the best verification performance is obtained when combining all the available systems.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
在许多现实世界和高影响力决策设置中,从分类过程中说明预测性不确定性的神经网络的概率预测至关重要。但是,实际上,大多数数据集经过非稳定神经网络的培训,默认情况下,这些神经网络不会捕获这种固有的不确定性。这个众所周知的问题导致了事后校准程序的开发,例如PLATT缩放(Logistic),等渗和β校准,这将得分转化为校准良好的经验概率。校准方法的合理替代方法是使用贝叶斯神经网络,该网络直接建模预测分布。尽管它们已应用于图像和文本数据集,但在表格和小型数据制度中的采用有限。在本文中,我们证明了与校准神经网络相比,贝叶斯神经网络在各种数据集中进行实验,从而产生竞争性能。
translated by 谷歌翻译
在汽车行业中,标记数据的匮乏是典型的挑战。注释的时间序列测量需要固体域知识和深入的探索性数据分析,这意味着高标签工作。传统的主动学习(AL)通过根据估计的分类概率积极查询最有用的实例来解决此问题,并在迭代中重新审视该模型。但是,学习效率强烈依赖于初始模型,从而导致初始数据集和查询编号的大小之间的权衡。本文提出了一个新颖的几杆学习(FSL)基于AL框架,该框架通过将原型网络(Protonet)纳入AL迭代来解决权衡问题。一方面,结果表明了对初始模型的鲁棒性,另一方面,通过在每种迭代中的支持设置的主动选择方面的学习效率。该框架已在UCI HAR/HAPT数据集​​和现实世界制动操纵数据集上进行了验证。学习绩效在两个数据集上都显着超过了传统的AL算法,分别以10%和5%的标签工作实现了90%的分类精度。
translated by 谷歌翻译
非平稳来源分离是具有许多不同方法的盲源分离的一个完善的分支。但是,对于这些方法都没有大样本结果可用。为了弥合这一差距,我们开发了NSS-JD的大样本理论,NSS-JD是一种基于块构成协方差矩阵的联合对角线化的非平稳源分离方法。我们在独立高斯非平稳源信号的瞬时线性混合模型以及一组非常通用的假设下工作:除了有界条件外,我们做出的唯一假设是,源表现出有限的依赖性,其方差函数足够差异,足以差异为渐近可分离。在以前的条件下显示,未混合估计器及其在标准平方根速率下的限制高斯分布的一致性显示。模拟实验用于验证理论结果并研究块长度对分离的影响。
translated by 谷歌翻译
Visual Place识别(VPR)是机器人平台从其车载摄像机中正确解释视觉刺激的能力,以确定其当前是否位于先前访问的位置,尽管有不同的视点,照明和外观变化。 JPEG是一种广泛使用的图像压缩标准,能够以图像清晰度为代价显着降低图像的大小。对于同时部署多个机器人平台的应用程序,必须在每个机器人之间远程传输收集的视觉数据。因此,可以采用JPEG压缩来大大减少通信渠道传输的数据量,因为可以证明使用有限的带宽为有限的带宽是一项具有挑战性的任务。然而,以前尚未研究JPEG压缩对当前VPR技术性能的影响。因此,本文对与VPR相关方案中的JPEG压缩进行了深入研究。我们在8个数据集上使用一系列已建立的VPR技术,并应用了各种压缩。我们表明,通过引入压缩,VPR性能大大降低,尤其是在较高的压缩频谱中。为了克服JPEG压缩对VPR性能的负面影响,我们提出了一个微调的CNN,该CNN针对JPEG压缩数据进行了优化,并表明其在极度压缩的JPEG图像中检测到的图像转换更加一致。
translated by 谷歌翻译