我们提出了一种新颖的方法来重新定位或放置识别,这是许多机器人技术,自动化和AR应用中要解决的基本问题。我们不依靠通常不稳定的外观信息,而是考虑以局部对象形式给出参考图的情况。我们的本地化框架依赖于3D语义对象检测,然后与地图中的对象关联。可能的配对关联集是基于评估空间兼容性的合并度量的层次聚类而生长的。后者特别使用有关​​相对对象配置的信息,该信息相对于全局转换是不变的。随着相机逐步探索环境并检测更多对象,关联集将进行更新和扩展。我们在几种具有挑战性的情况下测试我们的算法,包括动态场景,大型视图变化以及具有重复实例的场景。我们的实验表明,我们的方法在鲁棒性和准确性方面都优于先前的艺术。
translated by 谷歌翻译
Robots are traditionally bounded by a fixed embodiment during their operational lifetime, which limits their ability to adapt to their surroundings. Co-optimizing control and morphology of a robot, however, is often inefficient due to the complex interplay between the controller and morphology. In this paper, we propose a learning-based control method that can inherently take morphology into consideration such that once the control policy is trained in the simulator, it can be easily deployed to robots with different embodiments in the real world. In particular, we present the Embodiment-aware Transformer (EAT), an architecture that casts this control problem as conditional sequence modeling. EAT outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired robot embodiment, past states, and actions, our EAT model can generate future actions that best fit the current robot embodiment. Experimental results show that EAT can outperform all other alternatives in embodiment-varying tasks, and succeed in an example of real-world evolution tasks: stepping down a stair through updating the morphology alone. We hope that EAT will inspire a new push toward real-world evolution across many domains, where algorithms like EAT can blaze a trail by bridging the field of evolutionary robotics and big data sequence modeling.
translated by 谷歌翻译
我们提出了算法贡献,可提高在异常值影响的几何回归问题中稳健的修剪效率。该方法在很大程度上依赖于快速排序算法,我们提出了两个重要的见解。首先,部分排序足以进行x-TheThepile值的增量计算。其次,线性拟合问题中的正常方程可能会通过在排序过程中跨x-TheStile边界上记录交换操作来逐渐更新。除了线性拟合问题外,我们还展示了如何将技术额外应用于封闭形式的非线性能量最小化问题,从而在几何最佳目标下实现有效的修剪拟合。我们将方法应用于两种不同的摄像机切除算法,并展示了高效和可靠的几何修剪拟合。
translated by 谷歌翻译
事件摄像机最近在高动力或具有挑战性的照明情况下具有强大的常规摄像头的潜力,因此摄影机最近变得越来越受欢迎。通过同时定位和映射(SLAM)给出了可能受益于事件摄像机的重要问题。但是,为了确保在包含事件的多传感器大满贯上进展,需要新颖的基准序列。我们的贡献是使用包含基于事件的立体声摄像机,常规立体声摄像机,多个深度传感器和惯性测量单元的多传感器设置捕获的第一组基准数据集。该设置是完全硬件同步的,并且经过了准确的外部校准。所有序列都均均均均由高度准确的外部参考设备(例如运动捕获系统)捕获的地面真相数据。各个序列都包括小型和大型环境,并涵盖动态视觉传感器针对的特定挑战。
translated by 谷歌翻译
我们为从大规模数据库中清洁CAD模型的细粒度检索提供了一种新解决方案,以恢复RGBD扫描的详细对象形状几何形状。与以前的工作不同,只需使用对象形状描述符并接受顶部检索结果,将其索引到中等小的数据库中,我们认为在大规模数据库的情况下,可以在描述符的邻域中找到更准确的模型。更重要的是,我们建议,可以通过基于几何形状的重新排列其在实例级别上的形状描述符的独特性缺陷。我们的方法首先利用了学习表示的判别能力来区分不同类别的模型,然后使用一种新颖的稳健点设置距离度量度量来重新置于CAD邻域,从而在大型数据库中实现了细粒度的检索。对现实世界数据集的评估表明,我们基于几何的重新排列是一种概念上简单但高效的方法,与最先进的方法相比,检索准确性可以显着提高。
translated by 谷歌翻译
事件摄像机是受生物启发的传感器,在具有挑战性的照明条件下表现良好,并且具有高时间分辨率。但是,他们的概念与传统的基于框架的相机根本不同。事件摄像机的像素独立和不同步。他们测量对数亮度的变化,并以高度离散的时间stamp事件形式返回它们,表明自上次事件以来一定数量的相对变化。需要新的模型和算法来处理这种测量。目前的工作着眼于事件摄像机的几个运动估计问题。事件的流以时空量的一般均应翘曲为模型,并且该目标被提出为扭曲事件图像中对比度的最大化。我们的核心贡献包括针对这些通常非凸的问题得出全球最佳解决方案,从而消除了对困扰现有方法的良好初始猜测的依赖。我们的方法依赖于分支和结合的优化,并采用了针对六个不同的对比度估计函数得出的新颖和高效的递归上限和下限。通过成功应用于三个不同的事件摄像机运动估计问题,我们的方法的实际有效性证明了这一点。
translated by 谷歌翻译
近年来,目睹了直接建立在点云上的学识渊博的代表。尽管变得越来越表现力,但大多数现有的表示仍然很难产生有序的点集。受到球形多视图扫描仪的启发,我们提出了一种称为Spotlights的新型采样模型,代表3D形状作为深度值的紧凑型1D阵列。它模拟了均匀分布在球体上的摄像机的配置,在该球体上,每个虚拟摄像机都会通过小同心球形盖上的样品点从主要点施放光线,以探测可能与球体包围的物体的相交。因此,结构化点云被隐式地作为深度的函数。我们提供了该新样本方案的详细几何分析,并在点云完成任务的背景下证明了其有效性。合成数据和真实数据的实验结果表明,我们的方法可以达到竞争精度和一致性,同时显着降低了计算成本。此外,我们在下游点云注册任务上显示出优于最新完成方法的性能。
translated by 谷歌翻译
在分布式或联合的优化和学习中,不同计算单元之间的通信通常是瓶颈和梯度压缩,可广泛用于减少每个迭代方法中每个通信回合中发送的位数。有两类的压缩操作员和单独的算法利用它们。在具有有界方差的无偏随机压缩机(例如Rand-K)的情况下,Mishchenko等人的Diana算法。 (2019年),它实现了一种减少差异技术来处理压缩引入的差异,是当前的最新状态。在偏见和承包压缩机(例如TOP-K)的情况下,Richt \'Arik等人的EF21算法。 (2021)而不是实现错误反馈机制,是当前的最新状态。这两类的压缩方案和算法是不同的,具有不同的分析和证明技术。在本文中,我们将它们统一成一个框架,并提出了一种新算法,将Diana和EF21恢复为特定情况。我们的一般方法与新的,较大的压缩机类别一起使用,该类别具有两个参数,分别是偏见和方差,并包括无偏见和偏见的压缩机作为特定情况。这使我们能够继承两个世界中最好的:例如EF21,与戴安娜(Diana)不同,可以使用偏见的压缩机,例如Top-k,可以使用其在实践中的良好表现。就像戴安娜(Diana)和EF21不同一样,压缩机的独立随机性可以减轻压缩的影响,当平行工人的数量较大时,收敛速率提高。这是第一次提出具有所有这些功能的算法。我们证明其在某些条件下的线性收敛。我们的方法朝着更好地理解两个SO-FAR不同的沟通效率分布式学习的世界迈出了一步。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译