Accompanying rapid industrialization, humans are suffering from serious air pollution problems. The demand for air quality prediction is becoming more and more important to the government's policy-making and people's daily life. In this paper, We propose GreenEyes -- a deep neural network model, which consists of a WaveNet-based backbone block for learning representations of sequences and an LSTM with a Temporal Attention module for capturing the hidden interactions between features of multi-channel inputs. To evaluate the effectiveness of our proposed method, we carry out several experiments including an ablation study on our collected and preprocessed air quality data near HKUST. The experimental results show our model can effectively predict the air quality level of the next timestamp given any segment of the air quality data from the data set. We have also released our standalone dataset at https://github.com/AI-Huang/IAQI_Dataset The model and code for this paper are publicly available at https://github.com/AI-Huang/AirEvaluation
translated by 谷歌翻译
Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical. However, such constraint not only excludes low-end clients who would otherwise make unique contributions to model training but also restrains clients from training large models due to on-device resource bottlenecks. In this work, we propose FedRolex, a partial training (PT)-based approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model. At its core, FedRolex employs a rolling sub-model extraction scheme that allows different parts of the global server model to be evenly trained, which mitigates the client drift induced by the inconsistency between individual client models and server model architectures. We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods (e.g. Federated Dropout) and reduces the gap between model-heterogeneous and model-homogeneous FL, especially under the large-model large-dataset regime. In addition, we provide theoretical statistical analysis on its advantage over Federated Dropout and evaluate FedRolex on an emulated real-world device distribution to show that FedRolex can enhance the inclusiveness of FL and boost the performance of low-end devices that would otherwise not benefit from FL. Our code is available at https://github.com/MSU-MLSys-Lab/FedRolex.
translated by 谷歌翻译
Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems. Recently, plenty of works are proposed, following a similar paradigm consisting of three essential components, i.e., camera feature extraction, BEV feature construction, and task heads. Among the three components, BEV feature construction is BEV-specific compared with 2D tasks. Existing methods aggregate the multi-view camera features to the flattened grid in order to construct the BEV feature. However, flattening the BEV space along the height dimension fails to emphasize the informative features of different heights. For example, the barrier is located at a low height while the truck is located at a high height. In this paper, we propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights. Instead of flattening the BEV space, we first sample along the height dimension to build the global and local BEV slices. Then, the features of BEV slices are aggregated from the camera features and merged by the attention mechanism. Finally, we fuse the merged local and global BEV features by a transformer to generate the final feature map for task heads. The purpose of local BEV slices is to emphasize informative heights. In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices. Compared with uniform sampling, LiDAR-guided sampling can determine more informative heights. We conduct detailed experiments to demonstrate the effectiveness of BEV-SAN. Code will be released.
translated by 谷歌翻译
Recently, Bird's-Eye-View (BEV) representation has gained increasing attention in multi-view 3D object detection, which has demonstrated promising applications in autonomous driving. Although multi-view camera systems can be deployed at low cost, the lack of depth information makes current approaches adopt large models for good performance. Therefore, it is essential to improve the efficiency of BEV 3D object detection. Knowledge Distillation (KD) is one of the most practical techniques to train efficient yet accurate models. However, BEV KD is still under-explored to the best of our knowledge. Different from image classification tasks, BEV 3D object detection approaches are more complicated and consist of several components. In this paper, we propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner. However, directly applying the teacher-student paradigm to BEV features fails to achieve satisfying results due to heavy background information in RGB cameras. To solve this problem, we propose to leverage the localization advantage of LiDAR points. Specifically, we transform the LiDAR points to BEV space and generate the foreground mask and view-dependent mask for the teacher-student paradigm. It is to be noted that our method only uses LiDAR points to guide the KD between RGB models. As the quality of depth estimation is crucial for BEV perception, we further introduce depth distillation to our framework. Our unified framework is simple yet effective and achieves a significant performance boost. Code will be released.
translated by 谷歌翻译
Vision-Centric Bird-Eye-View (BEV) perception has shown promising potential and attracted increasing attention in autonomous driving. Recent works mainly focus on improving efficiency or accuracy but neglect the domain shift problem, resulting in severe degradation of transfer performance. With extensive observations, we figure out the significant domain gaps existing in the scene, weather, and day-night changing scenarios and make the first attempt to solve the domain adaption problem for multi-view 3D object detection. Since BEV perception approaches are usually complicated and contain several components, the domain shift accumulation on multi-latent spaces makes BEV domain adaptation challenging. In this paper, we propose a novel Multi-level Multi-space Alignment Teacher-Student ($M^{2}ATS$) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Multi-space Feature Aligned (MFA) student model. Specifically, DAT model adopts uncertainty guidance to sample reliable depth information in target domain. After constructing domain-invariant BEV perception, it then transfers pixel and instance-level knowledge to student model. To further alleviate the domain shift at the global level, MFA student model is introduced to align task-relevant multi-space features of two domains. To verify the effectiveness of $M^{2}ATS$, we conduct BEV 3D object detection experiments on four cross domain scenarios and achieve state-of-the-art performance (e.g., +12.6% NDS and +9.1% mAP on Day-Night). Code and dataset will be released.
translated by 谷歌翻译
This paper presents BigCilin, the first Chinese open-domain knowledge graph with fine-grained hypernym-hyponym re-lations which are extracted automatically from multiple sources for Chinese named entities. With the fine-grained hypernym-hyponym relations, BigCilin owns flexible semantic hierarchical structure. Since the hypernym-hyponym paths are automati-cally generated and one entity may have several senses, we provide a path disambi-guation solution to map a hypernym-hyponym path of one entity to its one sense on the condition that the path and the sense express the same meaning. In order to conveniently access our BigCilin Knowle-dge graph, we provide web interface in two ways. One is that it supports querying any Chinese named entity and browsing the extracted hypernym-hyponym paths surro-unding the query entity. The other is that it gives a top-down browsing view to illust-rate the overall hierarchical structure of our BigCilin knowledge graph over some sam-pled entities.
translated by 谷歌翻译
In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods. Our approach is inspired by the recently developed implicit mapping and positioning system and further extends the idea so that it can be freely applied to practical scenarios. Specifically, we leverage a voxel-based neural implicit surface representation to encode and optimize the scene inside each voxel. Furthermore, we adopt an octree-based structure to divide the scene and support dynamic expansion, enabling our system to track and map arbitrary scenes without knowing the environment like in previous works. Moreover, we proposed a high-performance multi-process framework to speed up the method, thus supporting some applications that require real-time performance. The evaluation results show that our methods can achieve better accuracy and completeness than previous methods. We also show that our Vox-Fusion can be used in augmented reality and virtual reality applications. Our source code is publicly available at https://github.com/zju3dv/Vox-Fusion.
translated by 谷歌翻译
道路网络的图结构对于自动驾驶系统的下游任务,例如全球计划,运动预测和控制至关重要。过去,公路网络图通常由人类专家手动注释,这是耗时且劳动力密集的。为了获得更好的有效性和效率的道路网络图,需要进行自动的路网图检测方法。先前的作品要么是后处理的语义分割图,要么提出基于图的算法以直接预测道路网络图。但是,以前的作品遭受了硬编码的启发式处理算法和劣质最终性能。为了增强先前的SOTA(最新方法)方法RNGDET,我们添加了一个实例分割头,以更好地监督模型培训,并使模型能够利用骨干网络的多尺度功能。由于新提出的方法从RNGDET改进,因此命名为RNGDET ++。所有方法均在大型公开数据集上进行评估。 RNGDET ++在几乎所有度量分数上都优于基线模型。它将拓扑正确性APL(平均路径长度相似性)提高了3 \%。演示视频和补充材料可在我们的项目页面\ url {https://tonyxuqaq.github.io/projects/rngdetplusplus/}中获得。
translated by 谷歌翻译
安全与其他交通参与者的互动是自动驾驶的核心要求之一,尤其是在交叉点和遮挡中。大多数现有的方法都是为特定场景设计的,需要大量的人工劳动参数调整,以应用于不同情况。为了解决这个问题,我们首先提出了一个基于学习的交互点模型(IPM),该模型描述了代理与保护时间和交互优先级之间的相互作用以统一的方式。我们将提出的IPM进一步整合到一个新颖的计划框架中,通过在高度动态的环境中的全面模拟来证明其有效性和鲁棒性。
translated by 谷歌翻译
标记级别的高清地图(HD地图)对自动驾驶汽车具有重要意义,尤其是在大规模,外观改变的情况下,自动驾驶汽车依靠标记来定位和车道来安全驾驶。在本文中,我们提出了一个高度可行的框架,用于使用简单的传感器设置(一个或多个单眼摄像机)自动构建标记级别的高清图。我们优化标记角的位置,以适合标记分割的结果,并同时优化相应摄像机的反视角映射(IPM)矩阵,以获得从前视图图像到鸟类视图(BEV)的准确转换。在定量评估中,构建的高清图几乎达到了百厘厘米级的准确性。优化的IPM矩阵的准确性与手动校准相似。该方法还可以概括以通过增加可识别标记的类型来从更广泛的意义上构建高清图。
translated by 谷歌翻译