关节2D心脏分割和3D体积重建是建立统计心脏解剖模型的基础,并了解运动模式的功能机制。但是,由于CINE MR和高主体间方差的平面分辨率低,精确分割心脏图像并重建3D体积是具有挑战性的。在这项研究中,我们提出了一个基于潜在空间的端到端框架DeepRecon,该框架会产生多个临床上基本的结果,包括准确的图像分割,合成高分辨率3D图像和3D重建体积。我们的方法确定了Cine图像的最佳潜在表示,其中包含心脏结构的准确语义信息。特别是,我们的模型共同生成具有准确的语义信息的合成图像,并使用最佳潜在表示对心脏结构进行分割。我们进一步探索了3D形状重建和4D运动模式通过不同的潜在空间操纵策略进行适应的下游应用。同时生成的高分辨率图像具有评估心脏形状和运动的高可解释价值。实验性结果证明了我们的有效性在多个方面的方法,包括2D分割,3D重建,下游4D运动模式适应性。
translated by 谷歌翻译
组合来自多视图图像的信息对于提高自动化方法的疾病诊断方法的性能和鲁棒性至关重要。但是,由于多视图图像的非对齐特性,跨视图的构建相关性和数据融合在很大程度上仍然是一个开放的问题。在这项研究中,我们提出了输血,这是一种基于变压器的体系结构,可使用卷积层和强大的注意机制合并不同的多视图成像信息。特别是,针对丰富的跨视图上下文建模和语义依赖性挖掘,提出了发散的融合注意(DIFA)模块,以解决从不同图像视图中捕获未对齐数据之间的长期相关性的关键问题。我们进一步提出了多尺度注意(MSA),以收集多尺度特征表示的全局对应关系。我们评估了心脏MRI(M \&MS-2)挑战队列中多疾病,多视图\&多中心右心室分段的输血。输血表明了针对最先进方法的领先绩效,并为多视图成像集成的新观点打开了稳健的医学图像分割。
translated by 谷歌翻译
作为新一代神经体系结构的变形金刚在自然语言处理和计算机视觉方面表现出色。但是,现有的视觉变形金刚努力使用有限的医学数据学习,并且无法概括各种医学图像任务。为了应对这些挑战,我们将Medformer作为数据量表变压器呈现为可推广的医学图像分割。关键设计结合了理想的电感偏差,线性复杂性的层次建模以及以空间和语义全局方式以线性复杂性的关注以及多尺度特征融合。 Medformer可以在不预训练的情况下学习微小至大规模的数据。广泛的实验表明,Medformer作为一般分割主链的潜力,在三个具有多种模式(例如CT和MRI)和多样化的医学靶标(例如,健康器官,疾病,疾病组织和肿瘤)的三个公共数据集上优于CNN和视觉变压器。我们将模型和评估管道公开可用,为促进广泛的下游临床应用提供固体基线和无偏比较。
translated by 谷歌翻译
Multivariate time series forecasting (MTSF) is a fundamental problem in numerous real-world applications. Recently, Transformer has become the de facto solution for MTSF, especially for the long-term cases. However, except for the one forward operation, the basic configurations in existing MTSF Transformer architectures were barely carefully verified. In this study, we point out that the current tokenization strategy in MTSF Transformer architectures ignores the token uniformity inductive bias of Transformers. Therefore, the vanilla MTSF transformer struggles to capture details in time series and presents inferior performance. Based on this observation, we make a series of evolution on the basic architecture of the vanilla MTSF transformer. We vary the flawed tokenization strategy, along with the decoder structure and embeddings. Surprisingly, the evolved simple transformer architecture is highly effective, which successfully avoids the over-smoothing phenomena in the vanilla MTSF transformer, achieves a more detailed and accurate prediction, and even substantially outperforms the state-of-the-art Transformers that are well-designed for MTSF.
translated by 谷歌翻译
This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system.
translated by 谷歌翻译
Optimal Transport (OT) provides a useful geometric framework to estimate the permutation matrix under unsupervised cross-lingual word embedding (CLWE) models that pose the alignment task as a Wasserstein-Procrustes problem. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix come with a significant computational burden since they scale cubically and quadratically, respectively, in the input size. This makes it slow and infeasible to compute OT distances exactly for a larger input size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP). qWP relies on a quantization step of both the source and target monolingual embedding space to estimate the permutation matrix given a cheap sampling procedure. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.
translated by 谷歌翻译
Automatic parsing of human anatomies at instance-level from 3D computed tomography (CT) scans is a prerequisite step for many clinical applications. The presence of pathologies, broken structures or limited field-of-view (FOV) all can make anatomy parsing algorithms vulnerable. In this work, we explore how to exploit and conduct the prosperous detection-then-segmentation paradigm in 3D medical data, and propose a steerable, robust, and efficient computing framework for detection, identification, and segmentation of anatomies in CT scans. Considering complicated shapes, sizes and orientations of anatomies, without lose of generality, we present the nine degrees-of-freedom (9-DoF) pose estimation solution in full 3D space using a novel single-stage, non-hierarchical forward representation. Our whole framework is executed in a steerable manner where any anatomy of interest can be directly retrieved to further boost the inference efficiency. We have validated the proposed method on three medical imaging parsing tasks of ribs, spine, and abdominal organs. For rib parsing, CT scans have been annotated at the rib instance-level for quantitative evaluation, similarly for spine vertebrae and abdominal organs. Extensive experiments on 9-DoF box detection and rib instance segmentation demonstrate the effectiveness of our framework (with the identification rate of 97.0% and the segmentation Dice score of 90.9%) in high efficiency, compared favorably against several strong baselines (e.g., CenterNet, FCOS, and nnU-Net). For spine identification and segmentation, our method achieves a new state-of-the-art result on the public CTSpine1K dataset. Last, we report highly competitive results in multi-organ segmentation at FLARE22 competition. Our annotations, code and models will be made publicly available at: https://github.com/alibaba-damo-academy/Med_Query.
translated by 谷歌翻译
Both goal-agnostic and goal-oriented tasks have practical value for robotic grasping: goal-agnostic tasks target all objects in the workspace, while goal-oriented tasks aim at grasping pre-assigned goal objects. However, most current grasping methods are only better at coping with one task. In this work, we propose a bifunctional push-grasping synergistic strategy for goal-agnostic and goal-oriented grasping tasks. Our method integrates pushing along with grasping to pick up all objects or pre-assigned goal objects with high action efficiency depending on the task requirement. We introduce a bifunctional network, which takes in visual observations and outputs dense pixel-wise maps of Q values for pushing and grasping primitive actions, to increase the available samples in the action space. Then we propose a hierarchical reinforcement learning framework to coordinate the two tasks by considering the goal-agnostic task as a combination of multiple goal-oriented tasks. To reduce the training difficulty of the hierarchical framework, we design a two-stage training method to train the two types of tasks separately. We perform pre-training of the model in simulation, and then transfer the learned model to the real world without any additional real-world fine-tuning. Experimental results show that the proposed approach outperforms existing methods in task completion rate and grasp success rate with less motion number. Supplementary material is available at https: //github.com/DafaRen/Learning_Bifunctional_Push-grasping_Synergistic_Strategy_for_Goal-agnostic_and_Goal-oriented_Tasks
translated by 谷歌翻译
Facial attractiveness prediction (FAP) aims to assess the facial attractiveness automatically based on human aesthetic perception. Previous methods using deep convolutional neural networks have boosted the performance, but their giant models lead to a deficiency in flexibility. Besides, most of them fail to take full advantage of the dataset. In this paper, we present a novel end-to-end FAP approach integrating dual label distribution and lightweight design. To make the best use of the dataset, the manual ratings, attractiveness score, and standard deviation are aggregated explicitly to construct a dual label distribution, including the attractiveness distribution and the rating distribution. Such distributions, as well as the attractiveness score, are optimized under a joint learning framework based on the label distribution learning (LDL) paradigm. As for the lightweight design, the data processing is simplified to minimum, and MobileNetV2 is selected as our backbone. Extensive experiments are conducted on two benchmark datasets, where our approach achieves promising results and succeeds in striking a balance between performance and efficiency. Ablation studies demonstrate that our delicately designed learning modules are indispensable and correlated. Additionally, the visualization indicates that our approach is capable of perceiving facial attractiveness and capturing attractive facial regions to facilitate semantic predictions.
translated by 谷歌翻译
We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art up to 90% noise ratio.
translated by 谷歌翻译