3D牙齿分割是数字正畸技术的重要任务。已经提出了几种深度学习方法,用于从3D牙科模型或口腔内扫描中进行自动牙齿分割。这些方法需要注释的3D口内扫描。手动注释3D口腔内扫描是一项费力的任务。一种方法是设计自学方法来减少手动标签工作。与其他类型的点云数据(例如场景点云或形状点云数据)相比,3D牙齿点云数据具有非常规定的结构和强大的形状。我们查看可以从单个3D口内扫描中学到多少代表性信息。我们借助十种不同的方法来定量评估,其中六种是通用点云分割方法,而其他四种是特定于牙齿分割的方法。令人惊讶的是,我们发现,在单个3D口内扫描训练中,骰子得分可以高达0.86,而完整的训练组可得分为0.94。我们得出的结论是,分割方法可以从单个3D牙齿点云扫描中学习大量信息,例如数据增强。我们是第一个从单个3D口内扫描中进行定量评估并证明深度学习方法的表示能力的人。这可以通过最大程度地利用可用的数据来实现在极端数据限制方案下构建牙齿分割的自学方法。
translated by 谷歌翻译
The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.
translated by 谷歌翻译
In recent years several learning approaches to point goal navigation in previously unseen environments have been proposed. They vary in the representations of the environments, problem decomposition, and experimental evaluation. In this work, we compare the state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem. We adapt the (POMDP) sub-goal framework proposed by [1] and modify the component that estimates frontier properties by using partial semantic maps of indoor scenes built from images' semantic segmentation. In addition to the well-known completeness of the model-based approach, we demonstrate that it is robust and efficient in that it leverages informative, learned properties of the frontiers compared to an optimistic frontier-based planner. We also demonstrate its data efficiency compared to the end-to-end deep reinforcement learning approaches. We compare our results against an optimistic planner, ANS and DD-PPO on Matterport3D dataset using the Habitat Simulator. We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data.
translated by 谷歌翻译
Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by $2.3\%$ ID and $2.7\%$ OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of $4.2\%$ OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than $1\%$ both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to $4.6\%$ over standard finetuning and $4.4\%$ over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP.
translated by 谷歌翻译
The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.
translated by 谷歌翻译
Tumor segmentation in histopathology images is often complicated by its composition of different histological subtypes and class imbalance. Oversampling subtypes with low prevalence features is not a satisfactory solution since it eventually leads to overfitting. We propose to create synthetic images with semantically-conditioned deep generative networks and to combine subtype-balanced synthetic images with the original dataset to achieve better segmentation performance. We show the suitability of Generative Adversarial Networks (GANs) and especially diffusion models to create realistic images based on subtype-conditioning for the use case of HER2-stained histopathology. Additionally, we show the capability of diffusion models to conditionally inpaint HER2 tumor areas with modified subtypes. Combining the original dataset with the same amount of diffusion-generated images increased the tumor Dice score from 0.833 to 0.854 and almost halved the variance between the HER2 subtype recalls. These results create the basis for more reliable automatic HER2 analysis with lower performance variance between individual HER2 subtypes.
translated by 谷歌翻译
Deep Ensemble Convolutional Neural Networks has become a methodology of choice for analyzing medical images with a diagnostic performance comparable to a physician, including the diagnosis of Diabetic Retinopathy. However, commonly used techniques are deterministic and are therefore unable to provide any estimate of predictive uncertainty. Quantifying model uncertainty is crucial for reducing the risk of misdiagnosis. A reliable architecture should be well-calibrated to avoid over-confident predictions. To address this, we propose a UATTA-ENS: Uncertainty-Aware Test-Time Augmented Ensemble Technique for 5 Class PIRC Diabetic Retinopathy Classification to produce reliable and well-calibrated predictions.
translated by 谷歌翻译
这项研究开发了一个无人驾驶系统(UASS)的框架,以监测高层建筑项目中未受保护的边缘和开口附近的跌落危险系统。开发并测试了一个三步基于机器学习的框架,以检测UAS捕获的图像的护栏柱。首先,对护栏探测器进行了培训,以定位支撑护栏的职位的候选位置。由于从实际的工作现场收集的此过程中使用了图像,因此确定了几个错误检测。因此,在以下步骤中引入了其他约束,以滤除错误检测。其次,研究团队将水平线检测器应用于图像,以正确检测地板并删除离地板不近的检测。最后,由于每个帖子之间安装了护栏柱,它们之间的分布差异大致,因此它们之间的空间被估算并用于找到两个帖子之间最有可能的距离。研究团队使用了开发方法的各种组合来监视高层建筑项目的捕获图像中的护栏系统。比较精度和召回指标表明,级联分类器通过落地检测和护栏间距估计来取得更好的性能。研究结果表明,拟议的护栏识别系统可以改善护栏的评估,并促进安全工程师确定高层建筑项目中跌落危害的任务。
translated by 谷歌翻译
我们提出了一种新的抽样策略,称为Smart Active Sapling,以在生产线之外进行质量检查。根据主动学习的原则,机器学习模型决定将哪些样品发送到质量检查。一方面,由于较早发现质量违规行为,这可以最大程度地减少废料零件的产生。另一方面,质量检查成本降低了,以进行平稳运行。
translated by 谷歌翻译
预测行人运动对于开发在拥挤的环境中相互作用的社会意识的机器人至关重要。虽然社交互动环境的自然视觉观点是一种自然的观点,但轨迹预测中的大多数现有作品纯粹是在自上而下的轨迹空间中进行的。为了支持第一人称视图轨迹预测研究,我们提出了T2FPV,这是一种构建高保真的第一人称视图数据集的方法,给定真实的,自上而下的轨迹数据集;我们在ETH/UCY行人数据集上展示了我们的方法,以生成所有互动行人的以自我为中心的视觉数据。我们报告说,原始的ETH/UCY数据集中使用的鸟眼视图假设,即代理可以用完美的信息观察场景中的每个人,而不会在第一人称视图中保持;在现有作品中通常使用的每个20个磁场场景中,只有一小部分的代理都可以完全看到。我们评估现有的轨迹预测方法在不同的现实感知水平下 - 与自上而下的完美信息设置相比,位移错误增加了356%。为了促进第一人称视图轨迹预测的研究,我们发布了T2FPV-ETH数据集和软件工具。
translated by 谷歌翻译