Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
使用相对比心脏磁共振成像(PC-CMR)进行的流量分析可以量化用于评估心血管功能的重要参数。该分析的重要部分是鉴定正确的CMR视图和质量控制(QC),以检测可能影响流量定量的伪像。我们提出了一个新型的基于深度学习的框架,用于对完整CMR扫描的流量进行完全自动化的分析,该框架首先使用两个顺序卷积神经网络进行这些视图选择和QC步骤,然后进行自动主动脉和肺动脉分段,以实现对量化的量化。钥匙流参数。对于观察分类和QC,获得了0.958和0.914的精度值。对于细分,骰子分数为$> $ 0.969,而平淡的altman情节表示手动和自动峰流量值之间的一致性很高。此外,我们在外部验证数据集上测试了管道,结果表明管道的鲁棒性。这项工作是使用由986例病例组成的多生临床数据进行的,表明在临床环境中使用该管道的潜力。
translated by 谷歌翻译
简介:人工智能(AI)有可能促进CMR分析以进行生物标志物提取的自动化。但是,大多数AI算法都经过特定输入域(例如单扫描仪供应商或医院量化成像协议)的培训,并且当从其他输入域中应用于CMR数据时,缺乏最佳性能的鲁棒性。方法:我们提出的框架包括一种基于AI的算法,用于对短轴图像的双脑室分割,然后进行分析后质量控制,以检测错误的结果。分割算法在来自两家NHS医院(n = 2793)的大型临床CMR扫描数据集上进行了培训,并在此数据集(n = 441)和五个外部数据集(n = 6808)上进行了验证。验证数据包括使用所有主要供应商的CMR扫描仪在12个不同中心获得的一系列疾病的患者的CMR扫描。结果:我们的方法产生的中位骰子得分超过87%,转化为观察者间变异范围内心脏生物标志物中的中值绝对错误:<8.4ml(左心室),<9.2ml(右心室),<13.3G(左心室),<13.3G(左心室所有数据集的心室质量),<5.9%(射血分数)。根据心脏疾病和扫描仪供应商的表型的病例分层显示出良好的一致性。结论:我们表明,我们提出的工具结合了在大规模多域CMR数据集中训练的最先进的AI算法和分析后质量控制,使我们能够从多个中心,供应商和心脏病。这是AI算法临床翻译的基本步骤。此外,我们的方法以无需额外的计算成本而产生一系列心脏功能(填充和弹出率,区域壁运动和应变)的附加生物标志物。
translated by 谷歌翻译
基于多种假设,现实世界中的数据通常位于低维的流形上,而将流动作为基于可能性的生成模型的标准化是由于其结构约束而无法找到这种歧管的能力。因此,出现了一个有趣的问题:$ \ textit {“我们可以在标准化流程中找到数据的子manifold(s),并估计子序列上的数据密度吗?”} $。在本文中,我们介绍了两种方法,即每像素的惩罚对数类样和等级培训,以回答上述问题。我们提出了一种单步方法,用于通过将流量标准化为歧管和偏移部分获得的转换空间,来进行关节流形学习和密度估计。这是由每像素惩罚的可能性函数来完成数据的,以学习数据的子字符。标准化流程假设转换的数据是高斯化的,但是这种施加的假设不一定是正确的,尤其是在高维度中。为了解决这个问题,采用了一种分层培训方法来改善子序列的密度估计。结果验证了在产生的图像质量和可能性方面使用归一化流的同时流动学习和密度估算中提出方法的优越性。
translated by 谷歌翻译
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
translated by 谷歌翻译
左心室(LV)功能是心脏病患者的患者管理,结局和长期存活方面的重要因素。最近发表的心力衰竭临床指南认识到,仅依赖一种心脏功能(LV射血分数)作为诊断和治疗分层生物标志物的依赖是次优。基于AI的超声心动图分析的最新进展已在LV体积和LV射血分数的自动估计上显示出良好的结果。但是,从随时间变化的2D超声心动图摄取,可以通过从完整的心脏周期中估算功能性生物标志物来获得对心脏功能的更丰富的描述。在这项工作中,我们首次提出了一种基于全心脏周期分割的2D超声心动图的AI方法,用于从2D超声心动图中得出高级生物标志物。这些生物标志物将允许临床医生获得健康和疾病中心脏的丰富图片。 AI模型基于“ NN-UNET”框架,并使用四个不同的数据库进行了训练和测试。结果表明,手动分析和自动分析之间的一致性很高,并展示了晚期收缩期和舒张期生物标志物在患者分层中的潜力。最后,对于50例病例的子集,我们在超声心动图和CMR的临床生物标志物之间进行了相关分析,我们在两种方式之间表现出了极好的一致性。
translated by 谷歌翻译
道路网络是连接和自动车辆的核心基础设施,但为机器学习应用程序创建有意义的表示是一个具有挑战性的任务。在这项工作中,我们建议将遥感视觉数据集成到道路网络数据中,以改进具有图形神经网络的嵌入式。我们基于时空道路和交通特性提出了道路边缘的分割,这允许通过卫星图像和数字表面模型的视觉特征来丰富一组道路网络。我们展示了这两者,分段和视觉数据的整合可以提高道路类型分类任务的性能,我们在中国成都的OSM + Didi Chuxing DataSet上实现了最先进的表现。
translated by 谷歌翻译
就起搏器提供的信号(即,神心电图电测(EGM))和信号医生使用(即12-铅心电图(ECG))而言,存在差距以诊断出异常节律。因此,前者,即使远程传输,医生也不足以提供精确的诊断,更不用说更及时干预。为了缩短这种差距,并对即时响应不规则和不频繁的心室节律的即时反应进行启发式步骤,我们提出了一个新的框架被称为RT-RCG,以自动搜索(1)高效的深神经网络(DNN)结构和然后(2)相应的加速器,能够实现来自EGM信号的ECG信号的实时和高质量的重建。具体地,RT-RCG提出了一种针对EGM信号的ECG重建量身定制的新的DNN搜索空间,并结合了可分辨率的加速搜索(DAS)发动机,以有效地导航大而离散的加速器设计空间以产生优化的加速器。各种环境下的广泛实验和消融研究一致地验证了RT-RCG的有效性。据我们所知,RT-RCG是第一个利用神经结构搜索(NAS)来同时解决重建效能和效率的效率。
translated by 谷歌翻译
通常,语音处理模型包括语言模型以及声学模型。无论语言模型的复杂性和变体如何,语言模型需要三个关键的预处理步骤:清洁,标准化和标记。在提到的步骤中,归一化步骤对于在纯文本应用程序中格式化统一是必要的。然而,对于语音处理模块中的嵌入式语言模型,归一化不限于格式化统一。此外,它必须将每个可读符号,数字等转换为它们的发音方式。据我们所知,语音处理模块中没有用于嵌入式语言模型的波斯标准化工具包,因此在本文中,我们提出了一个用于语音应用程序中的文本处理的开源归一化工具包。简而言之,我们考虑不同的可读波斯文,如符号(常见的货币,#,@,URL等),数字(日期,时间,电话号码,国家代码等)等。与其他可用波斯文本规范化工具的比较表明了语音处理中提出的方法的优越性。此外,将模型的性能与其他常见的自然语言库(如HATM和Parsivar)的其他常见的自然语言库进行比较,指示所提出的方法的正确性能。此外,它对一些波斯维基百科数据的评估证实了该方法的适当性能。
translated by 谷歌翻译