通过Perspective-N点(PNP)从单个RGB图像找到3D对象是计算机视觉中的长期问题。在端到端的深度学习的驱动下,最近的研究表明将PNP解释为一个可区分的层,因此可以通过反向传播梯度W.R.T.可以部分学习2d-3d点对应。对象姿势。然而,由于确定性姿势本质上是非差异的,因此学习整个不受限制的2D-3D点无法与现有方法融合。在本文中,我们提出了EPRO-PNP,这是用于一般端到端姿势估计的概率PNP层,该阶段估计输出了SE(3)歧管上的姿势分布,从本质上讲,将分类软效量带到连续域。 2d-3d坐标和相应的权重被视为通过最大程度地减少预测姿势分布和目标姿势分布之间的KL差异来学习的中间变量。基本原则统一了现有方法并类似于注意机制。 EPRO-PNP显着胜过竞争基线,缩小基于PNP的方法与LineMod 6DOF姿势估计和NUSCENES 3D对象检测基准的差距。
translated by 谷歌翻译
巨大的数据分析变得越来越普遍,像BLB(小袋子袋)这样的数据采样方法用作评估大规模数据估算质量的强大工具。然而,对数据采样方法的性能受到调整参数的选择(例如,子集大小,每个子集的重建数)的影响。在本文中,我们开发了一个高参数选择方法,可用于选择用于子采样方法的调整参数。具体而言,通过仔细的理论分析,我们在各种子采样估算器的渐近效率与近双数目之间找到了分析简单而优雅的关系。这导致了普瑞达格的最佳选择。更具体地说,对于任意指定的超参数集,我们可以将其改进为一组新的超参数,没有额外的CPU时间成本,但结果估计的统计效率可以得到很大改善。仿真研究和实际数据分析都展示了我们方法的优势优势。
translated by 谷歌翻译
量子计算预计会对许多领域产生变革性的影响,但是其对行业问题的实际部署却没有得到充实的解放。我们专注于将量子计算应用于行业的运营管理问题,尤其是供应链管理。供应链管理中的许多问题都涉及大型州和行动空间,并在经典计算机上构成计算挑战。我们开发了一种量化的政策迭代算法来解决库存控制问题并证明其有效性。我们还深入讨论了在短期内实施该量子算法的硬件要求和潜在挑战。我们的模拟和实验由IBM Qiskit和Qbraid系统提供动力。
translated by 谷歌翻译
在许多环境中(来自人体肠道到海洋生态系统)的混合群落发现了生物体,并且可以对人类健康和环境产生深远的影响。 Metagenomics通过高通量测序研究这种群体的基因组材料,得到用于随后分析的DNA子序列。标准工作流程中称为啤酒的基本问题是发现与未知构成生物相关的基因组子组的群集。随后的固有噪声,需要对它们施加的各种生物限制以及偏斜簇大小分布加剧了这种无监督的学习问题的难度。在本文中,我们使用曲线图提出了一种新的配方,其中节点是子序列的,并且边缘代表同意信息。此外,我们模拟了提供了关于不能聚集在一起的节点的异细信号的生物限制。我们通过开发(i)图表示学习的新算法来解决融合问题,这些算法保留了奇妙关系和基于异语的基于约束的基于曲线的图形聚类方法,该方法解决了串簇大小分布的问题。在实际和合成数据集上的广泛实验证明我们的方法称为Repbin,优于各种各样的竞争方法。我们的约束图形表示学习和聚类方法,其在其他域中也可以是有用的,也可以推进距离偏心神经融合和图形表示学习的最先进。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译