When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, they often require large-scale datasets and considerable computational resources, which is time-consuming, computationally expensive, and environmentally unfriendly. To alleviate these limitations, we propose a novel pre-training model for molecular representation learning, Bi-branch Masked Graph Transformer Autoencoder (BatmanNet). BatmanNet features two tailored and complementary graph autoencoders to reconstruct the missing nodes and edges from a masked molecular graph. To our surprise, BatmanNet discovered that the highly masked proportion (60%) of the atoms and bonds achieved the best performance. We further propose an asymmetric graph-based encoder-decoder architecture for either nodes and edges, where a transformer-based encoder only takes the visible subset of nodes or edges, and a lightweight decoder reconstructs the original molecule from the latent representation and mask tokens. With this simple yet effective asymmetrical design, our BatmanNet can learn efficiently even from a much smaller-scale unlabeled molecular dataset to capture the underlying structural and semantic information, overcoming a major limitation of current deep neural networks for molecular representation learning. For instance, using only 250K unlabelled molecules as pre-training data, our BatmanNet with 2.575M parameters achieves a 0.5% improvement on the average AUC compared with the current state-of-the-art method with 100M parameters pre-trained on 11M molecules.
translated by 谷歌翻译
由于其稀疏和细长的性质,估算3D空间中准确的车道线仍然具有挑战性。在这项工作中,我们提出了M^2-3dlanenet,这是一个有效3D车道检测的多模式框架。旨在集成来自多传感器的互补信息,M^2-3dlanenet首先将多模式特征提取具有模态特异性骨架,然后将它们融合在统一的鸟眼视图(BEV)空间中。具体而言,我们的方法由两个核心组成部分组成。 1)要获得准确的2D-3D映射,我们提出了自上而下的BEV生成。其中,使用线条限制的变形(LRDA)模块可用于以自上而下的方式有效地增强图像特征,从而充分捕获车道的细长特征。之后,它使用深度感知的举重将2D锥体特征投入到3D空间中,并通过枕形生成BEV特征。 2)我们进一步提出了自下而上的BEV融合,该融合通过多尺度的级联注意力汇总了多模式特征,从而集成了来自摄像头和激光雷达传感器的互补信息。足够的实验证明了M^2-3dlanenet的有效性,该实验的有效性超过了先前的最先进方法,即在OpenLane数据集上提高了12.1%的F1-SCORE改善。
translated by 谷歌翻译
随着相机和激光雷达传感器捕获用于自主驾驶的互补信息,已经做出了巨大的努力,通过多模式数据融合来开发语义分割算法。但是,基于融合的方法需要配对的数据,即具有严格的点对像素映射的激光点云和相机图像,因为培训和推理的输入都严重阻碍了在实际情况下的应用。因此,在这项工作中,我们建议通过充分利用具有丰富外观的2D图像来提高对点云上的代表性学习的2D先验辅助语义分割(2DPass),以增强对点云的表示。实际上,通过利用辅助模态融合和多尺度融合到单个知识蒸馏(MSFSKD),2DAPS从多模式数据中获取更丰富的语义和结构信息,然后在线蒸馏到纯3D网络。结果,配备了2DAPS,我们的基线仅使用点云输入显示出显着的改进。具体而言,它在两个大规模的基准(即Semantickitti和Nuscenes)上实现了最先进的方法,其中包括TOP-1的semantickitti的单扫描和多次扫描竞赛。
translated by 谷歌翻译
6G无线网络可以预见,以加快物理和网络世界的融合,并以我们部署和利用通信网络的方式实现范式换档。机器学习,尤其是深度学习(DL),将通过提供具有高水平智能的网络的新范式来成为6G的关键技术推动力之一。在本文中,我们介绍了一种新兴的DL体系结构,称为Transformer,并讨论了其对6G网络设计的潜在影响。我们首先讨论变压器和经典DL体系结构之间的差异,并强调变压器的自我发挥机制和强大的代表能力,这使其在应对无线网络设计的各种挑战方面特别有吸引力。具体而言,我们提出了基于变压器的解决方案,用于大规模多输入多输出(MIMO)系统和6G网络中的各种语义通信问题。最后,我们讨论了基于变压器的解决方案中的关键挑战和开放问题,并确定未来在智能6G网络中部署的研究方向。
translated by 谷歌翻译
The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/bytedance/LargeBatchCTR.
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
学习协作对于多机构增强学习(MARL)至关重要。以前的作品通过最大化代理行为的相关性来促进协作,该行为的相关性通常以不同形式的相互信息(MI)为特征。但是,我们揭示了次最佳的协作行为,也出现了强烈的相关性,并且简单地最大化MI可以阻碍学习的学习能力。为了解决这个问题,我们提出了一个新颖的MARL框架,称为“渐进式信息协作(PMIC)”,以进行更有效的MI驱动协作。 PMIC使用全球国家和联合行动之间MI测量的新协作标准。基于此标准,PMIC的关键思想是最大程度地提高与优越的协作行为相关的MI,并最大程度地减少与下等方面相关的MI。这两个MI目标通过促进更好的合作,同时避免陷入次级优势,从而扮演互补的角色。与其他算法相比,在各种MARL基准测试的实验表明,PMIC的表现出色。
translated by 谷歌翻译
在空中杂种大规模多输入多输出(MIMO)和正交频施加多路复用(OFDM)系统中,如何设计具有有限的飞行员和反馈开销的光谱效率宽带多用户混合波束,这是具有挑战性的。为此,通过将关键传输模块建模为端到端(E2E)神经网络,本文提出了一个数据驱动的深度学习(DL)基于时间划分双工(TDD)的基于数据驱动的深度学习(DL)的统一混合边际框架和具有隐式通道状态信息(CSI)的频分隔双链(FDD)系统。对于TDD系统,提出的基于DL的方法共同对上行链路飞行员组合和下行链路混合光束模块作为E2E神经网络。在FDD系统中,我们将下行链路飞行员传输,上行链路CSI反馈和下行链路混合光束形成模块作为E2E神经网络建模。与分别处理不同模块的常规方法不同,提出的解决方案同时以总和速率作为优化对象优化了所有模块。因此,通过感知空对地面大规模MIMO-OFDM通道样本的固有属性,基于DL的E2E神经网络可以建立从通道到波束形式的映射函数,以便可以避免使用显式通道重建,以减少飞行员和反馈开销。此外,实用的低分辨率相变(PSS)引入了量化约束,从而导致训练神经网络时棘手的梯度反向传播。为了减轻阶段量化误差引起的性能损失,我们采用转移学习策略,以基于假定理想的无限分辨率PSS的预训练网络来进一步调整E2E神经网络。数值结果表明,我们的基于DL的方案比最先进的方案具有相当大的优势。
translated by 谷歌翻译
人工智能(AI)为简化Covid-19诊断提供了有前景的替代。然而,涉及周围的安全和可信度的担忧阻碍了大规模代表性的医学数据,对临床实践中训练广泛的模型造成了相当大的挑战。为了解决这个问题,我们启动了统一的CT-Covid AI诊断计划(UCADI),其中AI模型可以在没有数据共享的联合学习框架(FL)下在每个主机机构下分发和独立地在没有数据共享的情况下在每个主机机构上执行。在这里,我们认为我们的FL模型通过大的产量(中国测试敏感性/特异性:0.973 / 0.951,英国:0.730 / 0.942),与专业放射科医师的面板实现可比性表现。我们进一步评估了持有的模型(从另外两家医院收集,留出FL)和异构(用造影材料获取)数据,提供了模型所做的决策的视觉解释,并分析了模型之间的权衡联邦培训过程中的性能和沟通成本。我们的研究基于来自位于中国和英国的23家医院的3,336名患者的9,573次胸部计算断层扫描扫描(CTS)。统称,我们的工作提出了利用联邦学习的潜在保留了数字健康的前景。
translated by 谷歌翻译