In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does. Our experiments suggest that CNN-style RevCol models can achieve very competitive performances on multiple computer vision tasks such as image classification, object detection and semantic segmentation, especially with large parameter budget and large dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2% ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0% mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection and ADE20k segmentation result among pure (static) CNN models. Moreover, as a general macro architecture fashion, RevCol can also be introduced into transformers or other neural networks, which is demonstrated to improve the performances in both computer vision and NLP tasks. We release code and models at https://github.com/megvii-research/RevCol
translated by 谷歌翻译
We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities. We find that natural backdoors are widely existing, with most injected backdoor attacks having natural correspondences. We categorize these natural backdoors and propose a general detection framework. It finds 315 natural backdoors in the 56 normally trained models downloaded from the Internet, covering all the different categories, while existing scanners designed for injected backdoors can at most detect 65 backdoors. We also study the root causes and defense of natural backdoors.
translated by 谷歌翻译
Extremely large-scale massive MIMO (XL-MIMO) has been reviewed as a promising technology for future wireless communications. The deployment of XL-MIMO, especially at high-frequency bands, leads to users being located in the near-field region instead of the conventional far-field. This letter proposes efficient model-based deep learning algorithms for estimating the near-field wireless channel of XL-MIMO communications. In particular, we first formulate the XL-MIMO near-field channel estimation task as a compressed sensing problem using the spatial gridding-based sparsifying dictionary, and then solve the resulting problem by applying the Learning Iterative Shrinkage and Thresholding Algorithm (LISTA). Due to the near-field characteristic, the spatial gridding-based sparsifying dictionary may result in low channel estimation accuracy and a heavy computational burden. To address this issue, we further propose a new sparsifying dictionary learning-LISTA (SDL-LISTA) algorithm that formulates the sparsifying dictionary as a neural network layer and embeds it into LISTA neural network. The numerical results show that our proposed algorithms outperform non-learning benchmark schemes, and SDL-LISTA achieves better performance than LISTA with ten times atoms reduction.
translated by 谷歌翻译
Word alignment is to find translationally equivalent words between source and target sentences. Previous work has demonstrated that self-training can achieve competitive word alignment results. In this paper, we propose to use word alignments generated by a third-party word aligner to supervise the neural word alignment training. Specifically, source word and target word of each word pair aligned by the third-party aligner are trained to be close neighbors to each other in the contextualized embedding space when fine-tuning a pre-trained cross-lingual language model. Experiments on the benchmarks of various language pairs show that our approach can surprisingly do self-correction over the third-party supervision by finding more accurate word alignments and deleting wrong word alignments, leading to better performance than various third-party word aligners, including the currently best one. When we integrate all supervisions from various third-party aligners, we achieve state-of-the-art word alignment performances, with averagely more than two points lower alignment error rates than the best third-party aligner. We released our code at https://github.com/sdongchuanqi/Third-Party-Supervised-Aligner.
translated by 谷歌翻译
Synthetic datasets are often used to pretrain end-to-end optical flow networks, due to the lack of a large amount of labeled, real-scene data. But major drops in accuracy occur when moving from synthetic to real scenes. How do we better transfer the knowledge learned from synthetic to real domains? To this end, we propose CLIP-FLow, a semi-supervised iterative pseudo-labeling framework to transfer the pretraining knowledge to the target real domain. We leverage large-scale, unlabeled real data to facilitate transfer learning with the supervision of iteratively updated pseudo-ground truth labels, bridging the domain gap between the synthetic and the real. In addition, we propose a contrastive flow loss on reference features and the warped features by pseudo ground truth flows, to further boost the accurate matching and dampen the mismatching due to motion, occlusion, or noisy pseudo labels. We adopt RAFT as the backbone and obtain an F1-all error of 4.11%, i.e. a 19% error reduction from RAFT (5.10%) and ranking 2$^{nd}$ place at submission on the KITTI 2015 benchmark. Our framework can also be extended to other models, e.g. CRAFT, reducing the F1-all error from 4.79% to 4.66% on KITTI 2015 benchmark.
translated by 谷歌翻译
本文提出了针对四方的通用自适应控制器,可以将其部署为零射击到具有截然不同的质量,手臂长度和运动常数的四轮驱动器,并且还显示出对运行时未知干扰的快速适应。核心算法的想法是学习一个单一的策略,该策略不仅可以在测试时间在线适应无人机的干扰,还可以在同一框架中适用于机器人动力学和硬件。我们通过训练神经网络来估计机器人和环境参数的潜在表示,该参数用于调节控制器的行为,也表示为神经网络。我们专门训练两个网络进行模拟,目的是将四轮驱动器飞往目标位置并避免撞击地面。我们直接在模拟中训练了相同的控制器,而没有对两个四肢旋转器进行任何修改,其中质量,惯性差异差异,最大电动机速度最大为4次。此外,我们显示了四肢和惯性的突然和大型干扰(最高35.7%)的快速适应。我们在模拟和物理世界中进行了广泛的评估,在该评估中,我们的表现优于最先进的基于学习的自适应控制器和专门针对每个平台的传统PID控制器。视频结果可以在https://dz298.github.io/universal-drone-controller/上找到。
translated by 谷歌翻译
可区分的架构搜索(飞镖)大大促进了NAS技术的发展,因为其搜索效率很高,但遭受了性能崩溃的影响。在本文中,我们努力从两个方面减轻飞镖的性能崩溃问题。首先,我们研究了飞镖中超级网的表达能力,然后仅使用训练batchnorm来得出新的飞镖范式设置。其次,从理论上讲,随机特征稀释了跳过连接在超网优化中的辅助连接作用,并使搜索算法专注于更公平的操作选择,从而解决了性能崩溃问题。我们具有随机功能的实例化飞镖和PC-Darts,分别为每个命名的RF-Darts和RF-PCDART构建一个改进的版本。实验结果表明,RF-darts在CIFAR-10上获得\ TextBf {94.36 \%}测试精度(这是NAS Bench-201的最接近最佳结果),并实现了最新的最新最先进的TOP-1从CIFAR-10传输时,ImageNet上\ TextBf {24.0 \%}的测试错误。此外,RF-DARTS在三个数据集(CIFAR-10,CIFAR-100和SVHN)和四个搜索空间(S1-S4)上进行稳健性能。此外,RF-PCDARTS在Imagenet上取得了更好的结果,即\ textbf {23.9 \%} top-1和\ textbf {7.1 \%} top-5 top-5测试错误,超越了代表性的方法,例如单路径,训练免费, ,直接在Imagenet上搜索部分通道范例。
translated by 谷歌翻译