我们提出了一种适用于一般3D点云数据的新型可区分加权的广义最接近点(WGICP)方法,包括来自LIDAR的数据。我们的方法建立在可区分的通用ICP(GICP)的基础上,我们建议使用可区分的k-neartient(KNN)算法来增强可怜性。可区分的GICP算法提供了相对于每个输入点的输出姿势估计的梯度,这使我们能够训练神经网络以预测其在估计正确姿势时的重要性或权重。与其他基于ICP的方法相反,这些方法使用基于体素的下采样或匹配方法来降低计算成本,我们的方法直接通过仅选择具有最高权重并忽略冗余较低权重的人来直接减少GICP使用的点数。我们表明,我们的方法提高了KITTI数据集的GICP算法的准确性和速度,可用于开发更强大,更有效的SLAM系统。
translated by 谷歌翻译
近年来,随着新颖的策略和应用,神经网络一直在迅速扩展。然而,尽管不可避免地会针对关键应用程序来解决这些挑战,例如神经网络技术诸如神经网络技术中仍未解决诸如神经网络技术的挑战。已经尝试通过用符号表示来表示和嵌入域知识来克服神经网络计算中的挑战。因此,出现了神经符号学习(Nesyl)概念,其中结合了符号表示的各个方面,并将常识带入神经网络(Nesyl)。在可解释性,推理和解释性至关重要的领域中,例如视频和图像字幕,提问和推理,健康信息学和基因组学,Nesyl表现出了有希望的结果。这篇综述介绍了一项有关最先进的Nesyl方法的全面调查,其原理,机器和深度学习算法的进步,诸如Opthalmology之类的应用以及最重要的是该新兴领域的未来观点。
translated by 谷歌翻译
多语种预训练模型在许多多语言NLP任务中展示了它们的有效性,并使从高资源语言到低资源的零射击或几秒钟传输。然而,由于某种语言之间的显着的类型差异和矛盾,这些模型通常在许多语言和交叉语言设置上表现不佳,这表明了学习单一模型同时处理大规模不同语言的难度。为了减轻这个问题,我们提出了一个新的多语言预训练管道。我们建议从多语言预先训练的模型产生语言表示,并进行语言分析,以表明语言表示相似度反映了从多个角度来看的语言相似度,包括语言家庭,地理蓝星,词汇表演和语法。然后,我们将所有目标语言集成到多个组中,并将每个组名称为表示SprachBund。因此,在同一表示SprachBund中的语言应该在培训和微调中互相提升,因为它们共享丰富的语言相似性。我们预先列车为每个代表斯普拉克班达一个多语言模型。实验在交叉基准上进行,与强基线相比,实现了显着的改进。
translated by 谷歌翻译
已经证明了现代自动驾驶感知系统在处理互补输入之类的利用图像时,已被证明可以改善互补投入。在孤立中,已发现2D图像非常容易受到对抗性攻击的影响。然而,有有限的研究与图像特征融合的多模态模型的对抗鲁棒性。此外,现有的作品不考虑跨输入方式一致的物理上可实现的扰动。在本文中,我们通过将对抗物体放在主车辆的顶部上展示多传感器检测的实际敏感性。我们专注于身体上可实现的和输入 - 不可行的攻击,因为它们是在实践中执行的可行性,并且表明单个通用对手可以隐藏来自最先进的多模态探测器的不同主机。我们的实验表明,成功的攻击主要是由易于损坏的图像特征引起的。此外,我们发现,在将图像特征中的现代传感器融合方法中,对抗攻击可以利用投影过程来在3D中跨越区域产生误报。朝着更强大的多模态感知系统,我们表明,具有特征剥夺的对抗训练可以显着提高对这种攻击的鲁棒性。然而,我们发现标准的对抗性防御仍然努力防止由3D LIDAR点和2D像素之间不准确的关联引起的误报。
translated by 谷歌翻译
In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. Towards this goal, we design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. Our proposed continuous fusion layer encode both discrete-state image features as well as continuous geometric information. This enables us to design a novel, reliable and efficient end-to-end learnable 3D object detector based on multiple sensors. Our experimental evaluation on both KITTI as well as a large scale 3D object detection benchmark shows significant improvements over the state of the art.
translated by 谷歌翻译
We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. Instead of encoding vectorized maps as raster images, we construct a lane graph from raw map data to explicitly preserve the map structure. To capture the complex topology and long range dependencies of the lane graph, we propose LaneGCN which extends graph convolutions with multiple adjacency matrices and along-lane dilation. To capture the complex interactions between actors and maps, we exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, laneto-actor and actor-to-actor. Powered by LaneGCN and actor-map interactions, our model is able to predict accurate and realistic multi-modal trajectories. Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.
translated by 谷歌翻译
Neural networks are vulnerable to adversarial examples, which poses a threat to their application in security sensitive systems. We propose high-level representation guided denoiser (HGD) as a defense for image classification. Standard denoiser suffers from the error amplification effect, in which small residual adversarial noise is progressively amplified and leads to wrong classifications. HGD overcomes this problem by using a loss function defined as the difference between the target model's outputs activated by the clean image and denoised image. Compared with ensemble adversarial training which is the state-of-the-art defending method on large images, HGD has three advantages. First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes. Third, HGD can be transferred to defend models other than the one guiding it. In NIPS competition on defense against adversarial attacks, our HGD solution won the first place and outperformed other models by a large margin. 1 * Equal contribution.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译