机器人外科助理(RSAs)通常用于通过专家外科医生进行微创手术。然而,长期以来充满了乏味和重复的任务,如缝合可以导致外科医生疲劳,激励缝合的自动化。随着薄反射针的视觉跟踪极具挑战性,在未反射对比涂料的情况下修改了针。作为朝向无修改针的缝合子任务自动化的步骤,我们提出了休斯顿:切换未经修改,外科手术,工具障碍针,一个问题和算法,它使用学习的主动传感策略与立体声相机本地化并对齐针头进入另一臂的可见和可访问的姿势。为了补偿机器人定位和针头感知误差,然后算法执行使用多个摄像机的高精度抓握运动。在使用Da Vinci研究套件(DVRK)的物理实验中,休斯顿成功通过了96.7%的成功率,并且能够在故障前平均地在臂32.4倍之间顺序地执行切换。在培训中看不见的针头,休斯顿实现了75-92.9%的成功率。据我们所知,这项工作是第一个研究未修改的手术针的切换。查看https://tinyurl.com/huston-surgery用于额外​​的材料。
translated by 谷歌翻译
Mirror descent is a gradient descent method that uses a dual space of parametric models. The great idea has been developed in convex optimization, but not yet widely applied in machine learning. In this study, we provide a possible way that the mirror descent can help data-driven parameter initialization of neural networks. We adopt the Hopfield model as a prototype of neural networks, we demonstrate that the mirror descent can train the model more effectively than the usual gradient descent with random parameter initialization.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
Bayesian optimization~(BO) is often used for accelerator tuning due to its high sample efficiency. However, the computational scalability of training over large data-set can be problematic and the adoption of historical data in a computationally efficient way is not trivial. Here, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
translated by 谷歌翻译
Self-supervised pre-training of a speech foundation model, followed by supervised fine-tuning, has shown impressive quality improvements on automatic speech recognition (ASR) tasks. Fine-tuning separate foundation models for many downstream tasks are expensive since the foundation model is usually very big. Parameter-efficient fine-tuning methods (e.g. adapter, sparse update methods) offer an alternative paradigm where a small set of parameters are updated to adapt the foundation model to new tasks. However, these methods still suffer from a high computational memory cost and slow training speed because they require backpropagation through the entire neural network at each step. In the paper, we analyze the performance of features at different layers of a foundation model on the speech recognition task and propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models. Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms with fewer number of trainable parameters, less computational memory cost and faster training speed. After combining with Adapters at all layers, the proposed method can achieve the same performance as fine-tuning the whole model with $97\%$ fewer trainable encoder parameters and $53\%$ faster training speed.
translated by 谷歌翻译
Neural networks trained with ERM (empirical risk minimization) sometimes learn unintended decision rules, in particular when their training data is biased, i.e., when training labels are strongly correlated with undesirable features. To prevent a network from learning such features, recent methods augment training data such that examples displaying spurious correlations (i.e., bias-aligned examples) become a minority, whereas the other, bias-conflicting examples become prevalent. However, these approaches are sometimes difficult to train and scale to real-world data because they rely on generative models or disentangled representations. We propose an alternative based on mixup, a popular augmentation that creates convex combinations of training examples. Our method, coined SelecMix, applies mixup to contradicting pairs of examples, defined as showing either (i) the same label but dissimilar biased features, or (ii) different labels but similar biased features. Identifying such pairs requires comparing examples with respect to unknown biased features. For this, we utilize an auxiliary contrastive model with the popular heuristic that biased features are learned preferentially during training. Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples.
translated by 谷歌翻译
随着对象检测和重新识别的发展,对人类的多对象跟踪已迅速改善。但是,即使对于最先进的跟踪算法,对具有相似外观和非线性运动的人类的多演员跟踪仍然非常具有挑战性。当前基于运动的跟踪算法通常使用Kalman滤波器来预测对象的运动,但是,当目标不线性移动时,其线性运动假设可能会导致跟踪失败。对于在运动场上跟踪的多玩家来说,因为同一团队中的球员通常穿着相同的球衣,因此在短期和长期跟踪过程中,重新识别甚至更加困难。在这项工作中,我们提出了一种基于运动的跟踪算法和三个针对三项运动在内的后处理管道,包括篮球,足球和排球,我们成功地处理了运动场上球员非线性运动的跟踪。实验导致ECCV DeeperAction挑战的测试集SportsMot数据集证明了我们的方法的有效性,该方法的有效性为73.968,在2022年2022年SportsMot Workshop最终排行榜上排名第三。
translated by 谷歌翻译
会话问题 - 转移生成是一项任务,它会自动生成一个基于输入段落的大规模对话问题回答数据集。在本文中,我们介绍了一个新颖的框架,该框架从一段段落中提取了值得问候的短语,然后在考虑以前的对话时产生相应的问题。特别是,我们的框架在生成问题后修改了提取的答案,以便答案与配对的问题完全匹配。实验结果表明,我们简单的答案修订方法可显着改善合成数据的质量。此外,我们证明我们的框架可以有效地用于域的适应会话问答。
translated by 谷歌翻译
预测行人运动对于开发在拥挤的环境中相互作用的社会意识的机器人至关重要。虽然社交互动环境的自然视觉观点是一种自然的观点,但轨迹预测中的大多数现有作品纯粹是在自上而下的轨迹空间中进行的。为了支持第一人称视图轨迹预测研究,我们提出了T2FPV,这是一种构建高保真的第一人称视图数据集的方法,给定真实的,自上而下的轨迹数据集;我们在ETH/UCY行人数据集上展示了我们的方法,以生成所有互动行人的以自我为中心的视觉数据。我们报告说,原始的ETH/UCY数据集中使用的鸟眼视图假设,即代理可以用完美的信息观察场景中的每个人,而不会在第一人称视图中保持;在现有作品中通常使用的每个20个磁场场景中,只有一小部分的代理都可以完全看到。我们评估现有的轨迹预测方法在不同的现实感知水平下 - 与自上而下的完美信息设置相比,位移错误增加了356%。为了促进第一人称视图轨迹预测的研究,我们发布了T2FPV-ETH数据集和软件工具。
translated by 谷歌翻译
我们研究了解释性优先聚类的问题,其中解释性成为聚类的一流公民。先前的聚类方法使用决策树进行解释,但仅在群集完成后。相比之下,我们的方法是整体上进行聚类和决策树训练,在此,决策树的性能和大小也会影响聚类结果。我们假设聚类和解释的属性是不同的,尽管这不是必需的。我们观察到我们的问题是一个单调优化,其中目标函数是单调函数的差异。然后,我们提出了一种有效的分支和结合算法,用于查找最佳参数,从而导致群集失真和决策树的解释性平衡。我们的实验表明,我们的方法可以提高适合我们框架的聚类的解释性。
translated by 谷歌翻译