Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译
It is no secret that deep learning models exhibit undesirable behaviors such as learning spurious correlations instead of learning correct relationships between input/output pairs. Prior works on robustness study datasets that mix low-level features to quantify how spurious correlations affect predictions instead of considering natural semantic factors due to limitations in accessing realistic datasets for comprehensive evaluation. To bridge this gap, in this paper we first investigate how natural background colors play a role as spurious features in image classification tasks by manually splitting the test sets of CIFAR10 and CIFAR100 into subgroups based on the background color of each image. We name our datasets CIFAR10-B and CIFAR100-B. We find that while standard CNNs achieve human-level accuracy, the subgroup performances are not consistent, and the phenomenon remains even after data augmentation (DA). To alleviate this issue, we propose FlowAug, a semantic DA method that leverages the decoupled semantic representations captured by a pre-trained generative flow. Experimental results show that FlowAug achieves more consistent results across subgroups than other types of DA methods on CIFAR10 and CIFAR100. Additionally, it shows better generalization performance. Furthermore, we propose a generic metric for studying model robustness to spurious correlations, where we take a macro average on the weighted standard deviations across different classes. Per our metric, FlowAug demonstrates less reliance on spurious correlations. Although this metric is proposed to study our curated datasets, it applies to all datasets that have subgroups or subclasses. Lastly, aside from less dependence on spurious correlations and better generalization on in-distribution test sets, we also show superior out-of-distribution results on CIFAR10.1 and competitive performances on CIFAR10-C and CIFAR100-C.
translated by 谷歌翻译
Visual reinforcement learning (RL), which makes decisions directly from high-dimensional visual inputs, has demonstrated significant potential in various domains. However, deploying visual RL techniques in the real world remains challenging due to their low sample efficiency and large generalization gaps. To tackle these obstacles, data augmentation (DA) has become a widely used technique in visual RL for acquiring sample-efficient and generalizable policies by diversifying the training data. This survey aims to provide a timely and essential review of DA techniques in visual RL in recognition of the thriving development in this field. In particular, we propose a unified framework for analyzing visual RL and understanding the role of DA in it. We then present a principled taxonomy of the existing augmentation techniques used in visual RL and conduct an in-depth discussion on how to better leverage augmented data in different scenarios. Moreover, we report a systematic empirical evaluation of DA-based techniques in visual RL and conclude by highlighting the directions for future research. As the first comprehensive survey of DA in visual RL, this work is expected to offer valuable guidance to this emerging field.
translated by 谷歌翻译
光学相干断层扫描(OCT)是一种非侵入性技术,可在微米分辨率中捕获视网膜的横截面区域。它已被广泛用作辅助成像参考,以检测与眼睛有关的病理学并预测疾病特征的纵向进展。视网膜层分割是至关重要的特征提取技术之一,其中视网膜层厚度的变化和由于液体的存在而引起的视网膜层变形高度相关,与多种流行性眼部疾病(如糖尿病性视网膜病)和年龄相关的黄斑疾病高度相关。变性(AMD)。但是,这些图像是从具有不同强度分布或换句话说的不同设备中获取的,属于不同的成像域。本文提出了一种分割引导的域适应方法,以将来自多个设备的图像调整为单个图像域,其中可用的最先进的预训练模型可用。它避免了即将推出的新数据集的手动标签的时间消耗以及现有网络的重新培训。网络的语义一致性和全球特征一致性将最大程度地减少许多研究人员报告的幻觉效果,这些效应对周期矛盾的生成对抗网络(Cyclegan)体系结构。
translated by 谷歌翻译
我们可以将异源图结构与文本结合在一起以学习高质量的语义和行为表示吗?图形神经网络(GNN)S编码数值节点属性和图形结构,以在各种监督的学习任务中实现令人印象深刻的性能。当前的GNN方法受到文本特征的挑战,文本特征通常需要编码为数值向量,然后再提供给GNN,这可能会导致一些信息损失。在本文中,我们提出了一个有效有效的框架,称为语言模型GNN(LM-GNN),以共同训练大型语言模型和图形神经网络。我们的框架中的有效性是通过首先使用异质图信息,然后使用GNN模型应用BERT模型的阶段微调来实现的。提出了几种系统和设计优化,以实现可扩展有效的培训。 LM-GNN可容纳节点和边缘分类以及链接预测任务。我们在不同数据集的性能中评估了LM-GNN框架,并展示了所提出方法的有效性。 LM-GNN在亚马逊查询购买应用程序中提供竞争结果。
translated by 谷歌翻译
机器学习(ML)鲁棒性和域的概括从根本上相关:它们基本上涉及对抗和自然设置下的数据分布变化。一方面,最近的研究表明,更健壮的(受对抗训练)模型更为普遍。另一方面,缺乏对其基本联系的理论理解。在本文中,我们探讨了考虑到不同因素(例如规范正规化和数据增强)(DA)等不同因素的正则化和域转移性之间的关系。我们提出了一个一般的理论框架,证明涉及模型函数类正则化的因素是相对域可传递性的足够条件。我们的分析意味着``鲁棒性''既不必需,也不足以使其可转移性;而正规化是理解域可转移性的更基本的观点。然后,我们讨论流行的DA协议(包括对抗性培训),并显示何时可以将其视为功能在某些条件下进行类正则化并因此改善了概括。我们进行了广泛的实验以验证我们的理论发现,并显示了几个反例,其中鲁棒性和概括在不同的数据集上呈负相关。
translated by 谷歌翻译
Learning models are highly dependent on data to work effectively, and they give a better performance upon training on big datasets. Massive research exists in the literature to address the dataset adequacy issue. One promising approach for solving dataset adequacy issues is the data augmentation (DA) approach. In DA, the amount of training data instances is increased by making different transformations on the available data instances to generate new correct and representative data instances. DA increases the dataset size and its variability, which enhances the model performance and its prediction accuracy. DA also solves the class imbalance problem in the classification learning techniques. Few studies have recently considered DA in the Arabic language. These studies rely on traditional augmentation approaches, such as paraphrasing by using rules or noising-based techniques. In this paper, we propose a new Arabic DA method that employs the recent powerful modeling technique, namely the AraGPT-2, for the augmentation process. The generated sentences are evaluated in terms of context, semantics, diversity, and novelty using the Euclidean, cosine, Jaccard, and BLEU distances. Finally, the AraBERT transformer is used on sentiment classification tasks to evaluate the classification performance of the augmented Arabic dataset. The experiments were conducted on four sentiment Arabic datasets, namely AraSarcasm, ASTD, ATT, and MOVIE. The selected datasets vary in size, label number, and unbalanced classes. The results show that the proposed methodology enhanced the Arabic sentiment text classification on all datasets with an increase in F1 score by 4% in AraSarcasm, 6% in ASTD, 9% in ATT, and 13% in MOVIE.
translated by 谷歌翻译
Researchers are doing intensive work on satellite images due to the information it contains with the development of computer vision algorithms and the ease of accessibility to satellite images. Building segmentation of satellite images can be used for many potential applications such as city, agricultural, and communication network planning. However, since no dataset exists for every region, the model trained in a region must gain generality. In this study, we trained several models in China and post-processing work was done on the best model selected among them. These models are evaluated in the Chicago region of the INRIA dataset. As can be seen from the results, although state-of-art results in this area have not been achieved, the results are promising. We aim to present our initial experimental results of a building segmentation from satellite images in this study.
translated by 谷歌翻译
Traditionally, data analysis and theory have been viewed as separate disciplines, each feeding into fundamentally different types of models. Modern deep learning technology is beginning to unify these two disciplines and will produce a new class of predictively powerful space weather models that combine the physical insights gained by data and theory. We call on NASA to invest in the research and infrastructure necessary for the heliophysics' community to take advantage of these advances.
translated by 谷歌翻译
The field of robotics, and more especially humanoid robotics, has several established competitions with research oriented goals in mind. Challenging the robots in a handful of tasks, these competitions provide a way to gauge the state of the art in robotic design, as well as an indicator for how far we are from reaching human performance. The most notable competitions are RoboCup, which has the long-term goal of competing against a real human team in 2050, and the FIRA HuroCup league, in which humanoid robots have to perform tasks based on actual Olympic events. Having robots compete against humans under the same rules is a challenging goal, and, we believe that it is in the sport of archery that humanoid robots have the most potential to achieve it in the near future. In this work, we perform a first step in this direction. We present a humanoid robot that is capable of gripping, drawing and shooting a recurve bow at a target 10 meters away with considerable accuracy. Additionally, we show that it is also capable of shooting distances of over 50 meters.
translated by 谷歌翻译