使用增强现实(AR)用于导航目的,这表明在手术手术过程中协助医生有益。这些应用通常需要知道外科手术工具和患者的姿势,以提供外科医生在任务执行过程中可以使用的视觉信息。现有的医学级跟踪系统使用放置在手术室内的红外摄像头(OR)来识别感兴趣的对象附加并计算其姿势的复古反射标记。一些市售的AR头式显示器(HMD)使用类似的摄像头进行自定位,手动跟踪和估算对象的深度。这项工作提出了一个使用AR HMD的内置摄像机来准确跟踪复古反射标记的框架,例如在手术过程中使用的标记,而无需集成任何其他组件。该框架还能够同时跟踪多个工具。我们的结果表明,横向翻译的准确度为0.09 +-0.06毫米,可以实现标记的跟踪和检测,纵向翻译的0.42 +-0.32 mm,绕垂直轴旋转的0.80 +-0.39 ver。此外,为了展示所提出的框架的相关性,我们在手术程序的背景下评估了系统的性能。该用例旨在在骨科过程中复制K-Wire插入的场景。为了进行评估,为两名外科医生和一名生物医学研究人员提供了视觉导航,每次都进行了21次注射。该用例的结果提供了与基于AR的导航程序报告的相当精度。
translated by 谷歌翻译
Covid-19的传播给世界带来了巨大的灾难,自动分割感染区域可以帮助医生快速诊断并减少工作量。但是,准确和完整的分割面临一些挑战,例如散射的感染区分布,复杂的背景噪声和模糊的分割边界。为此,在本文中,我们提出了一个新的网络,用于从CT图像(名为BCS-NET)的自动covid-19肺部感染分割,该网络考虑了边界,上下文和语义属性。 BCS-NET遵循编码器架构,更多的设计集中在解码器阶段,该阶段包括三个逐渐边界上下文 - 语义重建(BCSR)块。在每个BCSR块中,注意引导的全局上下文(AGGC)模块旨在通过突出显示重要的空间和边界位置并建模全局上下文依赖性来学习解码器最有价值的编码器功能。此外,语义指南(SG)单元通过在中间分辨率上汇总多规模的高级特征来生成语义指南图来完善解码器特征。广泛的实验表明,我们提出的框架在定性和定量上都优于现有竞争对手。
translated by 谷歌翻译
Bayesian methods, distributionally robust optimization methods, and regularization methods are three pillars of trustworthy machine learning hedging against distributional uncertainty, e.g., the uncertainty of an empirical distribution compared to the true underlying distribution. This paper investigates the connections among the three frameworks and, in particular, explores why these frameworks tend to have smaller generalization errors. Specifically, first, we suggest a quantitative definition for "distributional robustness", propose the concept of "robustness measure", and formalize several philosophical concepts in distributionally robust optimization. Second, we show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense; In addition, by constructing a Dirichlet-process-like prior in Bayesian nonparametrics, it can be proven that any regularized empirical risk minimization method is equivalent to a Bayesian method. Third, we show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution and the robustness measures of these machine learning models, which is a new perspective to bound generalization errors, and therefore, explain the reason why distributionally robust machine learning models, Bayesian models, and regularization models tend to have smaller generalization errors.
translated by 谷歌翻译
多跳跃知识基础问题答案(KBQA)旨在在知识库中找到答案实体,这是问题中提到的主题实体的几个啤酒花。现有基于检索的方法首先从问题中生成指令,然后使用它们来指导知识图上的多跳推理。由于指令是在整个推理过程中固定的,并且在指令生成中未考虑知识图,因此一旦错误地预测中间实体,模型就无法修改其错误。为了解决这个问题,我们提出了Kbiger(知识库迭代指令生成和推理),这是一种新颖有效的方法,可以在推理图的帮助下动态生成指令。我们没有在推理之前生成所有指令,而是考虑(k-1)推理图来构建k-th指令。通过这种方式,模型可以检查图表的预测并生成新指令,以修改中间实体的不正确预测。我们对两个多跳KBQA基准测试进行实验,并胜过现有方法,并成为新州。进一步的实验表明,我们的方法确实检测到中间实体的不正确预测,并具有修改此类错误的能力。
translated by 谷歌翻译
显微镜交通模拟为自动驾驶汽车(AVS)提供了可控,可重复且有效的测试环境。为了公正地评估AVS的安全性能,在模拟自然主义驾驶环境(NDE)中,环境统计数据的概率分布必须与现实世界中驾驶环境的统计数据一致。但是,尽管人类驾驶行为已经在运输工程领域进行了广泛的研究,但大多数现有模型都是用于交通流量分析的,而无需考虑驾驶行为的分布一致性,这可能会导致AV测试的重大评估偏见。为了填补这一研究差距,本文提出了分布一致的NDE建模框架。使用大规模的自然驾驶数据,获得了经验分布,以在不同条件下构建随机的人类驾驶行为模型。为了解决仿真过程中的误差积累问题,进一步设计了一种基于优化的方法来完善经验行为模型。具体而言,车辆状态的演变被建模为马尔可夫链,其固定分布被扭曲以匹配现实世界驾驶环境的分布。在多车道高速公路驾驶模拟的案例研究中评估了该框架,其中验证了生成的NDE的分布精度,并有效地评估了AV模型的安全性能。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译