不利的天气条件(例如阴霾,雨水和雪)通常会损害被捕获的图像的质量,从而导致在正常图像上训练的检测网络在这些情况下概括了很差。在本文中,我们提出了一个有趣的问题 - 如果图像恢复和对象检测的结合可以提高不利天气条件下尖端探测器的性能。为了回答它,我们提出了一个有效但统一的检测范式,该范式通过动态增强学习将这两个子任务桥接在一起,以在不利的天气条件下辨别对象,称为Togethernet。与现有的努力不同,这些努力将图像除去/der绘制为预处理步骤,而是考虑了一个多任务联合学习问题。遵循联合学习方案,可以共享由恢复网络产生的清洁功能,以在检测网络中学习更好的对象检测,从而有助于TogEthERNET在不利天气条件下增强检测能力。除了联合学习体系结构外,我们还设计了一个新的动态变压器功能增强模块,以提高togethernet的功能提取和表示功能。对合成和现实世界数据集的广泛实验表明,我们的togethernet在定量和质量上都超过了最先进的检测方法。源代码可从https://github.com/yz-wang/togethernet获得。
translated by 谷歌翻译
图像平滑是一项基本的低级视觉任务,旨在保留图像的显着结构,同时删除微不足道的细节。图像平滑中已经探索了深度学习,以应对语义结构和琐碎细节的复杂纠缠。但是,当前的方法忽略了平滑方面的两个重要事实:1)受限数量的高质量平滑地面真相监督的幼稚像素级回归可能会导致域的转移,并导致对现实世界图像的概括问题; 2)纹理外观与对象语义密切相关,因此图像平滑需要意识到语义差异以应用自适应平滑强度。为了解决这些问题,我们提出了一个新颖的对比语义引导的图像平滑网络(CSGIS-NET),该网络在促进强大的图像平滑之前结合了对比的先验和语义。通过利用不希望的平滑效应作为负面教师,并结合分段任务以鼓励语义独特性来增强监督信号。为了实现所提出的网络,我们还使用纹理增强和平滑标签(即VOC-Smooth)丰富了原始的VOC数据集,它们首先桥接图像平滑和语义分割。广泛的实验表明,所提出的CSGI-NET大量优于最先进的算法。代码和数据集可在https://github.com/wangjie6866/csgis-net上找到。
translated by 谷歌翻译
神经网络实施的标准方法具有强大的功能近似功能,但在其预测中学习元表示和理性概率不确定性的能力受到限制。另一方面,高斯流程采用贝叶斯学习计划来估计这种不确定性,但受其效率和近似能力的限制。神经过程家族(NPF)打算通过利用神经网络来提供元学习预测性不确定性来提供两全其美的世界。近年来,这种潜力为家庭带来了重大的研究活动。因此,需要对NPF模型进行全面调查,以组织和联系其动机,方法论和实验。本文打算解决这一差距,同时更深入地研究有关家庭成员的制定,研究主题和应用程序。我们阐明了它们的潜力,即在一个雨伞下带来其他深度学习领域的最新进展。然后,我们提供了对家庭的严格分类法,并从经验上证明了它们对在1-D,2-D和3-D输入域上运行的数据生成功能进行建模的功能。最后,我们通过讨论有关有希望的方向的观点,这些方向可以推动该领域的研究进展。我们的实验代码将在https://github.com/srvcodes/neural-processes-survey上提供。
translated by 谷歌翻译
高信心重叠的预测和准确的对应关系对于以部分到派对方式对齐成对点云至关重要。但是,重叠区域和非重叠区域之间存在固有的不确定性,这些区域一直被忽略并显着影响注册绩效。除了当前的智慧之外,我们提出了一种新颖的不确定性意识到的重叠预测网络,称为Utopic,以解决模棱两可的重叠预测问题。据我们所知,这是第一个明确引入重叠不确定性以指向云注册的人。此外,我们诱导特征提取器通过完成解码器隐式感知形状知识,并为变压器提供几何关系嵌入,以获得转换 - 不变性的几何形状感知特征表示。凭借更可靠的重叠得分和更精确的密度对应关系的优点,即使对于有限的重叠区域的输入,乌托邦也可以实现稳定而准确的注册结果。关于合成和实际基准的广泛定量和定性实验证明了我们的方法优于最先进的方法。代码可从https://github.com/zhileichen99/utopic获得。
translated by 谷歌翻译
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
translated by 谷歌翻译
Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not. This paradigm, although effective, is inefficient because the detectors have to go through all patches, severely hindering the inference speed. This paper presents an Objectness Activation Network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results, enabling a simple and effective solution to object detection in large images. In brief, OAN is a light fully-convolutional network for judging whether each patch contains objects or not, which can be easily integrated into many object detectors and jointly trained with them end-to-end. We extensively evaluate our OAN with five advanced detectors. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets, meanwhile with consistent accuracy improvements. On extremely large Gaofen-2 images (29200$\times$27620 pixels), our OAN improves the detection speed by 70.5%. Moreover, we extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively, without sacrificing the accuracy. Code is available at https://github.com/Ranchosky/OAN.
translated by 谷歌翻译
We study the problem of semantic segmentation calibration. For image classification, lots of existing solutions are proposed to alleviate model miscalibration of confidence. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
translated by 谷歌翻译
In this paper, we propose a large-scale language pre-training for text GENeration using dIffusion modEl, which is named GENIE. GENIE is a pre-training sequence-to-sequence text generation model which combines Transformer and diffusion. The diffusion model accepts the latent information from the encoder, which is used to guide the denoising of the current time step. After multiple such denoise iterations, the diffusion model can restore the Gaussian noise to the diverse output text which is controlled by the input text. Moreover, such architecture design also allows us to adopt large scale pre-training on the GENIE. We propose a novel pre-training method named continuous paragraph denoise based on the characteristics of the diffusion model. Extensive experiments on the XSum, CNN/DailyMail, and Gigaword benchmarks shows that GENIE can achieves comparable performance with various strong baselines, especially after pre-training, the generation quality of GENIE is greatly improved. We have also conduct a lot of experiments on the generation diversity and parameter impact of GENIE. The code for GENIE will be made publicly available.
translated by 谷歌翻译
Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM
translated by 谷歌翻译
Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.
translated by 谷歌翻译