智能论文笔记

SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI

Sara Atito , Syed Muhammad Anwar , Muhammad Awais , Josef Kitler

分类：计算机视觉 | 机器学习

2022-08-29

在为医疗保健领域开发监督的机器学习解决方案时，具有高质量地面真实标签的大规模数据的可用性是一个挑战。尽管临床工作流程中的数字数据量正在增加，但大多数数据都分布在临床站点上并受到保护以确保患者隐私。放射学读数和处理大型临床数据给可用资源带来了重大负担，这是机器学习和人工智能发挥关键作用的地方。用于肌肉骨骼（MSK）诊断的磁共振成像（MRI）是一个例子，其中扫描具有大量信息，但需要大量时间阅读和标记。自我监督的学习（SSL）可以是处理缺乏地面真相标签的解决方案，但通常需要在训练阶段进行大量培训数据。本文中，我们提出了一个基于切片的自制深度学习框架（SB-SSL），这是一种基于切片的新型范式，用于使用膝盖MRI扫描对异常进行分类。我们表明，在有限数量的情况下（<1000），我们提出的框架能够以89.17％的精度识别前交叉韧带撕裂，而AUC为0.954，不超过最先进的情况，而无需使用外部数据。在训练期间。这表明我们提出的框架适用于有限的数据制度中的SSL。

translated by 谷歌翻译

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

Sara Atito , Muhammad Awais , Ammarah Farooq , Zhenhua Feng , Josef Kittler

分类：计算机视觉 | 机器学习

2021-11-30

自我监督的预制是自然语言处理模型的首选方法，在许多愿景任务中迅速获得普及。最近，自我监督的预借鉴已经显示出胜过许多下游视觉应用的预测，标志着该地区的里程碑。这种优越性归因于传达多个概念的训练图像的不完全标记的负面影响，而是使用单个主要类标签进行注释。虽然自我监督的学习（SSL）原则上没有这种限制，但促进SSL的借口任务的选择是通过向单个概念输出驱动学习过程来实现这种缺点。本研究旨在调查在不使用标签的情况下建模图像中存在的所有概念的可能性。在这方面，所提出的SSL帧工作MC-SSL0.0是迈向多概念自我监督学习（MC-SSL）的步骤，其超出了在图像中建模的单一主导标签，以有效地利用来自所有概念的所有概念在里面。 MC-SSL0.0由两个核心设计概念，组屏蔽模型学习和学习伪概念，用于使用势头（教师学生）框架的数据令牌。多标签和多类图像分类下游任务的实验结果表明，MC-SSL0.0不仅超越了现有的SSL方法，而且超越了监督转移学习。源代码将公开可供社区培训更大的语料库。

translated by 谷歌翻译

SiT: Self-supervised vIsion Transformer

Sara Atito , Muhammad Awais , Josef Kittler

分类：计算机视觉 | 机器学习

2021-04-08

由于其最近在减少监督学习的差距方面取得了成功，自我监督的学习方法正在增加计算机愿景的牵引力。在自然语言处理（NLP）中，自我监督的学习和变形金刚已经是选择的方法。最近的文献表明，变压器也在计算机愿景中越来越受欢迎。到目前为止，当使用大规模监督数据或某种共同监督时，视觉变压器已被证明可以很好地工作。在教师网络方面。这些监督的普试视觉变压器在下游任务中实现了非常好的变化，变化最小。在这项工作中，我们调查自我监督学习的预用图像/视觉变压器，然后使用它们进行下游分类任务的优点。我们提出了自我监督的视觉变压器（坐在）并讨论了几种自我监督的培训机制，以获得借口模型。静坐的架构灵活性允许我们将其用作自动统计器，并无缝地使用多个自我监控任务。我们表明，可以在小规模数据集上进行预训练，以便在小型数据集上进行下游分类任务，包括几千个图像而不是数百万的图像。使用公共协议对所提出的方法进行评估标准数据集。结果展示了变压器的强度及其对自我监督学习的适用性。我们通过大边缘表现出现有的自我监督学习方法。我们还观察到坐着很好，很少有镜头学习，并且还表明它通过简单地训练从坐的学到的学习功能的线性分类器来学习有用的表示。预先训练，FineTuning和评估代码将在以下：https://github.com/sara-ahmed/sit。

translated by 谷歌翻译

Large Language Models Encode Clinical Knowledge

Karan Singhal , Shekoofeh Azizi , Tao Tu , S. Sara Mahdavi , Jason Wei , Hyung Won Chung , Nathan Scales , Ajay Tanwani , Heather Cole-Lewis , Stephen Pfohl

分类：自然语言处理

2022-12-26

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.

translated by 谷歌翻译

Design interpretable experience of dynamical feed forward machine learning model for forecasting NASDAQ

Pouriya Khalilian , Sara Azizi , Mohammad Hossein Amiri , Javad T. Firouzjaee

分类：人工智能

2022-12-22

National Association of Securities Dealers Automated Quotations(NASDAQ) is an American stock exchange based. It is one of the most valuable stock economic indices in the world and is located in New York City \cite{pagano2008quality}. The volatility of the stock market and the influence of economic indicators such as crude oil, gold, and the dollar in the stock market, and NASDAQ shares are also affected and have a volatile and chaotic nature \cite{firouzjaee2022lstm}.In this article, we have examined the effect of oil, dollar, gold, and the volatility of the stock market in the economic market, and then we have also examined the effect of these indicators on NASDAQ stocks. Then we started to analyze the impact of the feedback on the past prices of NASDAQ stocks and its impact on the current price. Using PCA and Linear Regression algorithm, we have designed an optimal dynamic learning experience for modeling these stocks. The results obtained from the quantitative analysis are consistent with the results of the qualitative analysis of economic studies, and the modeling done with the optimal dynamic experience of machine learning justifies the current price of NASDAQ shares.

translated by 谷歌翻译

Removing Objects From Neural Radiance Fields

Silvan Weder , Guillermo Garcia-Hernando , Aron Monszpart , Marc Pollefeys , Gabriel Brostow , Michael Firman , Sara Vicente

分类：计算机视觉

2022-12-22

Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects from a NeRF representation created from an RGB-D sequence. Our NeRF inpainting method leverages recent work in 2D image inpainting and is guided by a user-provided mask. Our algorithm is underpinned by a confidence based view selection procedure. It chooses which of the individual 2D inpainted images to use in the creation of the NeRF, so that the resulting inpainted NeRF is 3D consistent. We show that our method for NeRF editing is effective for synthesizing plausible inpaintings in a multi-view coherent manner. We validate our approach using a new and still-challenging dataset for the task of NeRF inpainting.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Attention as a guide for Simultaneous Speech Translation

Sara Papi , Matteo Negri , Marco Turchi

分类：自然语言处理

2022-12-15

The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->{de, es} show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.

translated by 谷歌翻译

Transfer Learning using Spectral Convolutional Autoencoders on Semi-Regular Surface Meshes

Sara Hahner , Felix Kerkhoff , Jochen Garcke

分类：计算机视觉

2022-12-12

The underlying dynamics and patterns of 3D surface meshes deforming over time can be discovered by unsupervised learning, especially autoencoders, which calculate low-dimensional embeddings of the surfaces. To study the deformation patterns of unseen shapes by transfer learning, we want to train an autoencoder that can analyze new surface meshes without training a new network. Here, most state-of-the-art autoencoders cannot handle meshes of different connectivity and therefore have limited to no generalization capacities to new meshes. Also, reconstruction errors strongly increase in comparison to the errors for the training shapes. To address this, we propose a novel spectral CoSMA (Convolutional Semi-Regular Mesh Autoencoder) network. This patch-based approach is combined with a surface-aware training. It reconstructs surfaces not presented during training and generalizes the deformation behavior of the surfaces' patches. The novel approach reconstructs unseen meshes from different datasets in superior quality compared to state-of-the-art autoencoders that have been trained on these shapes. Our transfer learning errors on unseen shapes are 40% lower than those from models learned directly on the data. Furthermore, baseline autoencoders detect deformation patterns of unseen mesh sequences only for the whole shape. In contrast, due to the employed regional patches and stable reconstruction quality, we can localize where on the surfaces these deformation patterns manifest.

translated by 谷歌翻译

Testing GLOM's ability to infer wholes from ambiguous parts

Laura Culp , Sara Sabour , Geoffrey E. Hinton

分类：计算机视觉 | 机器学习

2022-11-29

The GLOM architecture proposed by Hinton [2021] is a recurrent neural network for parsing an image into a hierarchy of wholes and parts. When a part is ambiguous, GLOM assumes that the ambiguity can be resolved by allowing the part to make multi-modal predictions for the pose and identity of the whole to which it belongs and then using attention to similar predictions coming from other possibly ambiguous parts to settle on a common mode that is predicted by several different parts. In this study, we describe a highly simplified version of GLOM that allows us to assess the effectiveness of this way of dealing with ambiguity. Our results show that, with supervised training, GLOM is able to successfully form islands of very similar embedding vectors for all of the locations occupied by the same object and it is also robust to strong noise injections in the input and to out-of-distribution input transformations.

translated by 谷歌翻译