Machine learning-based segmentation in medical imaging is widely used in clinical applications from diagnostics to radiotherapy treatment planning. Segmented medical images with ground truth are useful for investigating the properties of different segmentation performance metrics to inform metric selection. Regular geometrical shapes are often used to synthesize segmentation errors and illustrate properties of performance metrics, but they lack the complexity of anatomical variations in real images. In this study, we present a tool to emulate segmentations by adjusting the reference (truth) masks of anatomical objects extracted from real medical images. Our tool is designed to modify the defined truth contours and emulate different types of segmentation errors with a set of user-configurable parameters. We defined the ground truth objects from 230 patient images in the Glioma Image Segmentation for Radiotherapy (GLIS-RT) database. For each object, we used our segmentation synthesis tool to synthesize 10 versions of segmentation (i.e., 10 simulated segmentors or algorithms), where each version has a pre-defined combination of segmentation errors. We then applied 20 performance metrics to evaluate all synthetic segmentations. We demonstrated the properties of these metrics, including their ability to capture specific types of segmentation errors. By analyzing the intrinsic properties of these metrics and categorizing the segmentation errors, we are working toward the goal of developing a decision-tree tool for assisting in the selection of segmentation performance metrics.
translated by 谷歌翻译
近年来,随着新颖的策略和应用,神经网络一直在迅速扩展。然而,尽管不可避免地会针对关键应用程序来解决这些挑战,例如神经网络技术诸如神经网络技术中仍未解决诸如神经网络技术的挑战。已经尝试通过用符号表示来表示和嵌入域知识来克服神经网络计算中的挑战。因此,出现了神经符号学习(Nesyl)概念,其中结合了符号表示的各个方面,并将常识带入神经网络(Nesyl)。在可解释性,推理和解释性至关重要的领域中,例如视频和图像字幕,提问和推理,健康信息学和基因组学,Nesyl表现出了有希望的结果。这篇综述介绍了一项有关最先进的Nesyl方法的全面调查,其原理,机器和深度学习算法的进步,诸如Opthalmology之类的应用以及最重要的是该新兴领域的未来观点。
translated by 谷歌翻译
而不是使用当前的深度学习分段模型(如UNET和VIANTS),我们使用培训的卷积神经网络(CNN)分类器来接近分割问题,其自动从分类目标中提取重要特征以进行图像分类。这些提取的特征可以使用梯度加权类激活映射(GRAC-CAM)来可视化和形成热手。本研究测试了热量可以用来分割分类目标。我们还提出了一种热量的评价方法;也就是说,要使用加热器过滤的图像重新列车并检查其性能。我们使用平均骰子系数来评估分段结果。我们的实验结果表明,热带可以定位和分段部分肿瘤区域。但是只使用来自CNN分类器的热量可以是分割的最佳方法。此外,我们已经验证了CNN分类器的预测主要取决于肿瘤区域,毕业凸轮热插拔中的暗区也有助于分类。
translated by 谷歌翻译
精确可靠地分割医学图像对于疾病诊断和治疗是重要的。由于各种各样的物体尺寸,形状和扫​​描方式,这是一个具有挑战性的任务。最近,许多卷积神经网络(CNN)设计用于分割任务,取得了巨大的成功。然而,很少有研究完全考虑了物体的大小,因此大多数表现出对小物体分割的分割的性能不佳。这对早期检测疾病产生重大影响。本文提出了一种上下文轴向储备注意网络(Caranet),与最近最先进的模型相比,在小对象上提高小物体的分割性能。我们在脑肿瘤(Brats 2018)和息肉(Kvasir-Seg,CVC-Colondb,CVC-ClinicDB,CVC-300和ETIS-LaribpolypdB)进行测试。我们的加麻不仅达到了顶级的骰子分割精度,而且还显示出小医疗物体的分割的明显优势。
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
translated by 谷歌翻译
Event cameras that asynchronously output low-latency event streams provide great opportunities for state estimation under challenging situations. Despite event-based visual odometry having been extensively studied in recent years, most of them are based on monocular and few research on stereo event vision. In this paper, we present ESVIO, the first event-based stereo visual-inertial odometry, which leverages the complementary advantages of event streams, standard images and inertial measurements. Our proposed pipeline achieves temporal tracking and instantaneous matching between consecutive stereo event streams, thereby obtaining robust state estimation. In addition, the motion compensation method is designed to emphasize the edge of scenes by warping each event to reference moments with IMU and ESVIO back-end. We validate that both ESIO (purely event-based) and ESVIO (event with image-aided) have superior performance compared with other image-based and event-based baseline methods on public and self-collected datasets. Furthermore, we use our pipeline to perform onboard quadrotor flights under low-light environments. A real-world large-scale experiment is also conducted to demonstrate long-term effectiveness. We highlight that this work is a real-time, accurate system that is aimed at robust state estimation under challenging environments.
translated by 谷歌翻译
Human Activity Recognition (HAR) is one of the core research areas in mobile and wearable computing. With the application of deep learning (DL) techniques such as CNN, recognizing periodic or static activities (e.g, walking, lying, cycling, etc.) has become a well studied problem. What remains a major challenge though is the sporadic activity recognition (SAR) problem, where activities of interest tend to be non periodic, and occur less frequently when compared with the often large amount of irrelevant background activities. Recent works suggested that sequential DL models (such as LSTMs) have great potential for modeling nonperiodic behaviours, and in this paper we studied some LSTM training strategies for SAR. Specifically, we proposed two simple yet effective LSTM variants, namely delay model and inverse model, for two SAR scenarios (with and without time critical requirement). For time critical SAR, the delay model can effectively exploit predefined delay intervals (within tolerance) in form of contextual information for improved performance. For regular SAR task, the second proposed, inverse model can learn patterns from the time series in an inverse manner, which can be complementary to the forward model (i.e.,LSTM), and combining both can boost the performance. These two LSTM variants are very practical, and they can be deemed as training strategies without alteration of the LSTM fundamentals. We also studied some additional LSTM training strategies, which can further improve the accuracy. We evaluated our models on two SAR and one non-SAR datasets, and the promising results demonstrated the effectiveness of our approaches in HAR applications.
translated by 谷歌翻译
Given a natural language that describes the user's demands, the NL2Code task aims to generate code that addresses the demands. This is a critical but challenging task that mirrors the capabilities of AI-powered programming. The NL2Code task is inherently versatile, diverse and complex. For example, a demand can be described in different languages, in different formats, and at different levels of granularity. This inspired us to do this survey for NL2Code. In this survey, we focus on how does neural network (NN) solves NL2Code. We first propose a comprehensive framework, which is able to cover all studies in this field. Then, we in-depth parse the existing studies into this framework. We create an online website to record the parsing results, which tracks existing and recent NL2Code progress. In addition, we summarize the current challenges of NL2Code as well as its future directions. We hope that this survey can foster the evolution of this field.
translated by 谷歌翻译
In recent years, vision-centric perception has flourished in various autonomous driving tasks, including 3D detection, semantic map construction, motion forecasting, and depth estimation. Nevertheless, the latency of vision-centric approaches is too high for practical deployment (e.g., most camera-based 3D detectors have a runtime greater than 300ms). To bridge the gap between ideal research and real-world applications, it is necessary to quantify the trade-off between performance and efficiency. Traditionally, autonomous-driving perception benchmarks perform the offline evaluation, neglecting the inference time delay. To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving. On the basis of the 2Hz annotated nuScenes dataset, we first propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images. Referring to the practical deployment, the Streaming Perception Under constRained-computation (SPUR) evaluation protocol is further constructed, where the 12Hz inputs are utilized for streaming evaluation under the constraints of different computational resources. In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints, suggesting that the model latency and computation budget should be considered as design choices to optimize the practical deployment. To facilitate further research, we establish baselines for camera-based streaming 3D detection, which consistently enhance the streaming performance across various hardware. ASAP project page: https://github.com/JeffWang987/ASAP.
translated by 谷歌翻译