The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation.
translated by 谷歌翻译
Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms are enabling unprecedented improvements in providing decision support systems that can aid in csPCa diagnosis and understanding. However, existing state of the art AI algorithms which are based on deep learning technology are often limited to 2D images that fails to capture inter-slice correlations in 3D volumetric images. The use of 3D convolutional neural networks (CNNs) partly overcomes this limitation, but it does not adapt to the anisotropy of images, resulting in sub-optimal semantic representation and poor generalization. Furthermore, due to the limitation of the amount of labelled data of bpMRI and the difficulty of labelling, existing CNNs are built on relatively small datasets, leading to a poor performance. To address the limitations identified above, we propose a new Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptatively fuses multiple 2D, 2.5D and 3D CNNs to effectively balance representation for sparse inter-slice information and dense intra-slice information in bpMRI. A self-supervised learning (SSL) technique is further introduced to pre-train our network using unlabelled data to learn the generalizable image features. Furthermore, we constrained our network to understand the zonal specific domain knowledge to improve the diagnosis precision of csPCa. Experiments on the PI-CAI Challenge dataset demonstrate our proposed method achieves better performance for csPCa detection and diagnosis in bpMRI.
translated by 谷歌翻译
Learning from changing tasks and sequential experience without forgetting the obtained knowledge is a challenging problem for artificial neural networks. In this work, we focus on two challenging problems in the paradigm of Continual Learning (CL) without involving any old data: (i) the accumulation of catastrophic forgetting caused by the gradually fading knowledge space from which the model learns the previous knowledge; (ii) the uncontrolled tug-of-war dynamics to balance the stability and plasticity during the learning of new tasks. In order to tackle these problems, we present Progressive Learning without Forgetting (PLwF) and a credit assignment regime in the optimizer. PLwF densely introduces model functions from previous tasks to construct a knowledge space such that it contains the most reliable knowledge on each task and the distribution information of different tasks, while credit assignment controls the tug-of-war dynamics by removing gradient conflict through projection. Extensive ablative experiments demonstrate the effectiveness of PLwF and credit assignment. In comparison with other CL methods, we report notably better results even without relying on any raw data.
translated by 谷歌翻译
In this paper we present a novel multi-attribute face manipulation method based on textual descriptions. Previous text-based image editing methods either require test-time optimization for each individual image or are restricted to single attribute editing. Extending these methods to multi-attribute face image editing scenarios will introduce undesired excessive attribute change, e.g., text-relevant attributes are overly manipulated and text-irrelevant attributes are also changed. In order to address these challenges and achieve natural editing over multiple face attributes, we propose a new decoupling training scheme where we use group sampling to get text segments from same attribute categories, instead of whole complex sentences. Further, to preserve other existing face attributes, we encourage the model to edit the latent code of each attribute separately via an entropy constraint. During the inference phase, our model is able to edit new face images without any test-time optimization, even from complex textual prompts. We show extensive experiments and analysis to demonstrate the efficacy of our method, which generates natural manipulated faces with minimal text-irrelevant attribute editing. Code and pre-trained model will be released.
translated by 谷歌翻译
变压器验证引起了机器学习研究和行业的越来越多的关注。它正式验证了变压器对对抗性攻击的鲁棒性,例如用同义词交换单词。但是,由于以中线为中心的计算,变压器验证的性能仍然不令人满意,这与标准神经网络有显着差异。在本文中,我们提出了信仰,这是用于GPU的变压器验证的有效框架。我们首先提出一个语义意识的计算图转换,以识别语义信息,例如变压器验证中的结合计算。我们利用此类语义信息,以在计算图级别启用有效的内核融合。其次,我们提出了一个验证专门的内核手工艺品,以有效地将变压器验证映射到现代GPU。该手工艺者利用了一组GPU硬件支持,以加速通常是内存密集型的验证专业操作。第三,我们提出了一个专家指导的自动调整,以纳入有关GPU后端的专家知识,以促进大型搜索空间探索。广泛的评估表明,Faith在最先进的框架上实现了$ 2.1 \ times $至$ 3.4 \ times $($ 2.6 \ times $)的加速。
translated by 谷歌翻译
现有的基于视频的人重新识别(REID)的方法主要通过功能提取器和功能聚合器来了解给定行人的外观特征。但是,当不同的行人外观相似时,外观模型将失败。考虑到不同的行人具有不同的步行姿势和身体比例,我们建议学习视频检索的外观功能之外的歧视性姿势功能。具体而言,我们实现了一个两分支的体系结构,以单独学习外观功能和姿势功能,然后将它们串联在一起进行推理。为了学习姿势特征,我们首先通过现成的姿势检测器检测到每个框架中的行人姿势,并使用姿势序列构建时间图。然后,我们利用复发图卷积网络(RGCN)来学习时间姿势图的节点嵌入,该姿势图设计了一种全局信息传播机制,以同时实现框内节点的邻域聚集,并在框架间图之间传递消息。最后,我们提出了一种由节点注意和时间注意的双重意见方法,以从节点嵌入中获得时间图表示,其中采用自我注意机制来了解每个节点和每个帧的重要性。我们在三个基于视频的REID数据集(即火星,Dukemtmc和Ilids-Vid)上验证了所提出的方法,其实验结果表明,学习的姿势功能可以有效地改善现有外观模型的性能。
translated by 谷歌翻译
人类对象相互作用(HOI)检测的任务目标是人类与环境相互作用的细粒度视觉解析,从而实现了广泛的应用。先前的工作证明了有效的体系结构设计和相关线索的集成的好处,以进行更准确的HOI检测。但是,现有方法的设计适当的预训练策略的设计仍未得到充实。为了解决这一差距,我们提出了关系语言图像预训练(RLIP),这是一种利用实体和关系描述的对比预训练的策略。为了有效利用此类预训练,我们做出了三个技术贡献:(1)一种新的并行实体检测和顺序关系推理(Parse)体系结构,可在整体优化的预训练期间使用实体和关系描述; (2)合成数据生成框架,标签序列扩展,扩展了每个Minibatch中可用的语言数据的规模; (3)解释歧义,关系质量标签和关系伪标签的机制,以减轻训练数据中模棱两可/嘈杂样本的影响。通过广泛的实验,我们证明了这些贡献的好处,共同称为rlip-parse,以改善零射击,很少射击和微调的HOI检测性能以及从噪音注释中学习的鲁棒性。代码将在\ url {https://github.com/jacobyuan7/rlip}上找到。
translated by 谷歌翻译
最近,深度加固学习(RL)在机器人操作应用中表现出了一些令人印象深刻的成功。但是,由于样本效率和安全性问题,现实世界中的培训机器人是不平凡的。提出了SIM到现实的转移来解决上述问题,但引入了一个名为“现实差距”的新问题。在这项工作中,我们通过使用单个摄像头的输入来解决上述问题,为基于视觉的组装任务引入SIM模型学习框架,并在模拟环境中进行培训。我们提出了一种基于循环一致的生成对抗网络(CycleGAN)和力量控制转移方法来弥合现实差距的域适应方法。我们证明,在模拟环境中训练有训练的拟议框架可以成功地转移到真实的孔洞设置中。
translated by 谷歌翻译
我们研究了一个实用的问题,但尚未探讨问题:从不同飞行高度的角度来看,无人机如何在环境中感知。与始终从地面角度进行感知的自动驾驶不同,由于特定的任务,飞行无人机可能会灵活地改变其飞行高度,这需要能力才能使视点不变感知。为了减少飞行数据注释的努力,我们考虑了一种地面到意见知识蒸馏方法,同时仅使用地面视点的标记数据和飞行视点的未标记数据。为此,我们提出了一个渐进的半监督学习框架,该框架具有四个核心组成部分:一个密集的观点采样策略,将垂直飞行高度的范围分配为一组均匀分布的小部分,在每个高度下,我们采样了从该角度来看的数据;最近的邻居伪标记,以在前一个视点上学习的模型来注入最近的邻居视点的标签; MixView在不同观点之间生成增强图像以减轻观点差异;以及逐渐学习的渐进蒸馏策略,直到达到最大飞行高度为止。我们收集一个合成的数据集和一个现实世界数据集,我们进行了广泛的实验,以表明我们的方法为不同的飞行高度带来了有希望的结果。
translated by 谷歌翻译
估计到达时间(ETA)预测时间(也称为旅行时间估计)是针对各种智能运输应用程序(例如导航,路线规划和乘车服务)的基本任务。为了准确预测一条路线的旅行时间,必须考虑到上下文和预测因素,例如空间 - 周期性的互动,驾驶行为和交通拥堵传播的推断。先前在百度地图上部署的ETA预测模型已经解决了时空相互作用(constgat)和驾驶行为(SSML)的因素。在这项工作中,我们专注于建模交通拥堵传播模式以提高ETA性能。交通拥堵的传播模式建模具有挑战性,它需要考虑到随着时间的推移影响区域的影响区域,以及延迟变化随时间变化的累积影响,这是由于道路网络上的流量事件引起的。在本文中,我们提出了一个实用的工业级ETA预测框架,名为Dueta。具体而言,我们基于交通模式的相关性构建了一个对拥堵敏感的图,并开发了一种路线感知图形变压器,以直接学习路段的长距离相关性。该设计使Dueta能够捕获空间遥远但与交通状况高度相关的路段对之间的相互作用。广泛的实验是在从百度地图收集的大型现实世界数据集上进行的。实验结果表明,ETA预测可以从学习的交通拥堵传播模式中显着受益。此外,Dueta已经在Baidu Maps的生产中部署,每天都有数十亿个请求。这表明Dueta是用于大规模ETA预测服务的工业级和强大的解决方案。
translated by 谷歌翻译