多分辨率的深度学习方法,例如U-NET体系结构,在分类和分割图像中已经达到了高性能。但是,这些方法不能提供潜在的图像表示形式,也不能用于分解,denoise和重建图像数据。 U-NET和其他卷积神经网络(CNNS)通常使用合并来扩大接受场,这通常会导致不可逆的信息丢失。这项研究建议包括riesz-quincunx(RQ)小波变换,结合1)高阶Riesz小波变换和2)在U-NET体系结构内正交Quincunx小波(两者都用于减少医学图像中的模糊) ,以减少卫星图像及其时间序列中的噪音。在变换的特征空间中,我们提出了一种变异方法,以了解特征的随机扰动如何影响图像以进一步降低噪声。结合两种方法,我们引入了一种用于减少卫星图像中噪声的图像和时间序列分解的混合Rqunet-VAE方案。我们提出了定性和定量的实验结果,表明与其他最先进的方法相比,我们提出的Rqunet-VAE在降低卫星图像中的噪声方面更有效。我们还将我们的方案应用于多波段卫星图像的多个应用程序,包括:通过扩散和图像分割分解图像denoising,图像和时间序列分解。
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
translated by 谷歌翻译
这项研究介绍了我们对越南语言和语音处理任务(VLSP)挑战2021的文本处理任务的医疗保健领域的自动越南图像字幕的方法作为编码器的体系结构和长期的短期内存(LSTM)作为解码器生成句子。这些模型在不同的数据集中表现出色。我们提出的模型还具有编码器和一个解码器,但是我们在编码器中使用了SWIN变压器,LSTM与解码器中的注意模块结合在一起。该研究介绍了我们在比赛期间使用的培训实验和技术。我们的模型在vietcap4h数据集上达到了0.293的BLEU4分数,并且该分数在私人排行榜上排名3 $^{rd} $。我们的代码可以在\ url {https://git.io/jddjm}上找到。
translated by 谷歌翻译
算法追索权旨在推荐提供丰富的反馈,以推翻不利的机器学习决策。我们在本文中介绍了贝叶斯追索权,这是一种模型不足的追索权,可最大程度地减少后验概率比值比。此外,我们介绍了其最小的稳健对应物,目的是对抗机器学习模型参数的未来变化。强大的对应物明确考虑了使用最佳传输(Wasserstein)距离规定的高斯混合物中数据的扰动。我们表明,可以将最终的最差目标函数分解为求解一系列二维优化子问题,因此,最小值追索问题发现问题可用于梯度下降算法。与现有的生成健壮的回流的方法相反,可靠的贝叶斯追索不需要线性近似步骤。数值实验证明了我们提出的稳健贝叶斯追索权面临模型转移的有效性。我们的代码可在https://github.com/vinairesearch/robust-bayesian-recourse上找到。
translated by 谷歌翻译
多摄像机多对象跟踪目前在计算机视野中引起了注意力,因为它在现实世界应用中的卓越性能,如具有拥挤场景或巨大空间的视频监控。在这项工作中,我们提出了一种基于空间升降的多乳制型配方的数学上优雅的多摄像多对象跟踪方法。我们的模型利用单摄像头跟踪器产生的最先进的TOOTWLET作为提案。由于这些Tracklet可能包含ID-Switch错误,因此我们通过从3D几何投影获得的新型预簇来完善它们。因此,我们派生了更好的跟踪图,没有ID交换机,更精确的数据关联阶段的亲和力成本。然后通过求解全局提升的多乳制型制剂,将轨迹与多摄像机轨迹匹配,该组件包含位于同一相机和相互相机间的Tracklet上的短路和远程时间交互。在Wildtrack DataSet的实验结果是近乎完美的结果,在校园上表现出最先进的追踪器,同时在PETS-09数据集上处于校准状态。我们将在接受纸质时进行我们的实施。
translated by 谷歌翻译
我们考虑了在透明的蜂窝车辆到所有物品(C-V2X)系统中的联合渠道分配和电力分配的问题,其中多个车辆到网络(V2N)上行链路共享与多个车辆到车辆的时频资源( v2v)排,使连接和自动驾驶汽车的团体可以紧密地一起旅行。由于在车辆环境中使用高用户移动性的性质,依赖全球渠道信息的传统集中优化方法在具有大量用户的C-V2X系统中可能不可行。利用多机构增强学习(RL)方法,我们提出了分布式资源分配(RA)算法来克服这一挑战。具体而言,我们将RA问题建模为多代理系统。仅基于本地渠道信息,每个排领导者充当代理,共同相互交互,因此选择了子频段和功率水平的最佳组合来传输其信号。为此,我们利用双重Q学习算法在同时最大化V2N链接的总和率的目标下共同训练代理,并满足所需延迟限制的每个V2V链接的数据包输送概率。仿真结果表明,与众所周知的详尽搜索算法相比,我们提出的基于RL的算法提供了紧密的性能。
translated by 谷歌翻译
Making histopathology image classifiers robust to a wide range of real-world variability is a challenging task. Here, we describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG) to address the problem of generalization for mitosis detection in images of hematoxylin-eosin-stained histology slides under high variability (scanner, tissue type and species variability). Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation with a training set enriched with hard negative examples and automatically selected negative examples from the unlabeled part of the challenge dataset. To optimize the performance of our models, we investigated a hard negative mining regime search procedure that lead us to train our best model using a subset of image patches representing 19.6% of our training partition of the challenge dataset. Our candidate model ensemble achieved a F1-score of .697 on the final test set after automated evaluation on the challenge platform, achieving the third best overall score in the MIDOG 2022 Challenge.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.
translated by 谷歌翻译