Radar, the only sensor that could provide reliable perception capability in all weather conditions at an affordable cost, has been widely accepted as a key supplement to camera and LiDAR in modern advanced driver assistance systems (ADAS) and autonomous driving systems. Recent state-of-the-art works reveal that fusion of radar and LiDAR can lead to robust detection in adverse weather, such as fog. However, these methods still suffer from low accuracy of bounding box estimations. This paper proposes a bird's-eye view (BEV) fusion learning for an anchor box-free object detection system, which uses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate the possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector can be further enhanced by employing a novel interactive transformer module. We demonstrated the superior performance of the proposed methods in this paper using the recently published Oxford Radar RobotCar (ORR) dataset. We showed that the accuracy of our system significantly outperforms the other state-of-the-art methods by a large margin.
translated by 谷歌翻译
多对象跟踪(MOT)是现代高级驾驶员辅助系统(ADA)和自动驾驶(AD)系统的关键应用之一。 MOT的大多数解决方案都是基于随机矢量贝叶斯过滤器,例如Global最近的邻居(GNN)以及基于规则的启发轨道维护。随着随机有限集(RFS)理论的发展,最近已将RFS贝叶斯过滤器应用于ADA和AD Systems的MOT任务中。但是,由于计算成本和实施复杂性,它们在实际流量中的有用性是对疑问的。在本文中,据透露,具有基于规则的启发式轨道维护的GNN不足以在ADA和AD系统中基于激光雷达的MOT任务。通过系统地比较几个不同的基于对象过滤器的跟踪框架,包括传统的随机矢量贝叶斯滤波器,以及基于规则的启发式跟踪维护和RFS贝叶斯过滤器,可以说明这种判断。此外,提出了一个简单有效的跟踪器,即使用全局最近邻居(GNN-PMB)跟踪器的Poisson Multi-Bernoulli滤波器,建议用于基于激光雷达的MOT任务。拟议的GNN-PMB跟踪器在Nuscenes测试数据集中取得了竞争性的结果,并显示出优于其他最先进的LIDAR的跟踪性能,而Haver Holly Holling Trackers,Lidar和基于摄像机的基于摄像头的跟踪器。
translated by 谷歌翻译
汽车MMWAVE雷达在高级驾驶员辅助系统(ADA)和自动驾驶中起关键作用。基于深度学习的实例细分可以从雷达检测点实时对象识别。在常规培训过程中,准确的注释是关键。然而,由于雷达检测点的高质量注释,由于其歧义和稀疏性,要实现挑战。为了解决这个问题,我们提出了一种实施基于雷达检测点的实例细分的对比学习方法。我们根据地面真相标签定义正面和负样品,将对比度损失首先训练模型,然后对以下下游任务进行微调。此外,可以将这两个步骤合并为一个,并且可以为未标记的数据生成伪标签,以进一步提高性能。因此,我们的方法有四种不同的培训设置。实验表明,当仅适用于一小部分培训数据时,我们的方法仍然可以与以100%基真实信息进行监督的方式实现可比的性能。
translated by 谷歌翻译
在合作本地化中,通信移动代理使用代理商相对测量来改善其死亡的全局本地化。测量调度使代理能够决定当其计算资源有限时应该处理它应该处理的可用互及的相对测量的子集。最佳测量调度是一种NP-COLLECLICALATIAL优化问题。所谓的顺序贪婪(SG)算法是这个问题的流行次优的多项式解决方案。然而,SG算法的优点函数评估需要访问所有地标代理的状态估计载体和错误协方差矩阵(代理可以从测量的队伍中获取的队友)。本文提出了对SG方法的CL测量调度,但是通过使用基于神经网络的代理模型作为SG算法的优点函数的代理来降低通信和计算成本。该模型的重要性是它由本地信息和来自地标代理的标量元数据驱动。此解决方案以三种方式解决运行SG算法的时间和内存复杂性问题:(a)减少代理间通信消息大小,(b)通过使用更简单的代理(代理)函数来降低功能评估的复杂性( c)减少所需的内存大小.Simulations展示了我们的结果。
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.
translated by 谷歌翻译
Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time.
translated by 谷歌翻译
The Five-hundred-meter Aperture Spherical radio Telescope (FAST) is the world's largest single-dish radio telescope. Its large reflecting surface achieves unprecedented sensitivity but is prone to damage, such as dents and holes, caused by naturally-occurring falling objects. Hence, the timely and accurate detection of surface defects is crucial for FAST's stable operation. Conventional manual inspection involves human inspectors climbing up and examining the large surface visually, a time-consuming and potentially unreliable process. To accelerate the inspection process and increase its accuracy, this work makes the first step towards automating the inspection of FAST by integrating deep-learning techniques with drone technology. First, a drone flies over the surface along a predetermined route. Since surface defects significantly vary in scale and show high inter-class similarity, directly applying existing deep detectors to detect defects on the drone imagery is highly prone to missing and misidentifying defects. As a remedy, we introduce cross-fusion, a dedicated plug-in operation for deep detectors that enables the adaptive fusion of multi-level features in a point-wise selective fashion, depending on local defect patterns. Consequently, strong semantics and fine-grained details are dynamically fused at different positions to support the accurate detection of defects of various scales and types. Our AI-powered drone-based automated inspection is time-efficient, reliable, and has good accessibility, which guarantees the long-term and stable operation of FAST.
translated by 谷歌翻译
3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.
translated by 谷歌翻译
Graph structure learning (GSL), which aims to learn the adjacency matrix for graph neural networks (GNNs), has shown great potential in boosting the performance of GNNs. Most existing GSL works apply a joint learning framework where the estimated adjacency matrix and GNN parameters are optimized for downstream tasks. However, as GSL is essentially a link prediction task, whose goal may largely differ from the goal of the downstream task. The inconsistency of these two goals limits the GSL methods to learn the potential optimal graph structure. Moreover, the joint learning framework suffers from scalability issues in terms of time and space during the process of estimation and optimization of the adjacency matrix. To mitigate these issues, we propose a graph structure refinement (GSR) framework with a pretrain-finetune pipeline. Specifically, The pre-training phase aims to comprehensively estimate the underlying graph structure by a multi-view contrastive learning framework with both intra- and inter-view link prediction tasks. Then, the graph structure is refined by adding and removing edges according to the edge probabilities estimated by the pre-trained model. Finally, the fine-tuning GNN is initialized by the pre-trained model and optimized toward downstream tasks. With the refined graph structure remaining static in the fine-tuning space, GSR avoids estimating and optimizing graph structure in the fine-tuning phase which enjoys great scalability and efficiency. Moreover, the fine-tuning GNN is boosted by both migrating knowledge and refining graphs. Extensive experiments are conducted to evaluate the effectiveness (best performance on six benchmark datasets), efficiency, and scalability (13.8x faster using 32.8% GPU memory compared to the best GSL baseline on Cora) of the proposed model.
translated by 谷歌翻译