智能论文笔记

RainUNet for Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts

Jinyoung Park , Minseok Son , Seungju Cho , Inyoung Lee , Changick Kim

分类：计算机视觉

2022-12-07

This paper presents a solution to the Weather4cast 2022 Challenge Stage 2. The goal of the challenge is to forecast future high-resolution rainfall events obtained from ground radar using low-resolution multiband satellite images. We suggest a solution that performs data preprocessing appropriate to the challenge and then predicts rainfall movies using a novel RainUNet. RainUNet is a hierarchical U-shaped network with temporal-wise separable block (TS block) using a decoupled large kernel 3D convolution to improve the prediction performance. Various evaluation metrics show that our solution is effective compared to the baseline method. The source codes are available at https://github.com/jinyxp/Weather4cast-2022

translated by 谷歌翻译

Super-resolution Probabilistic Rain Prediction from Satellite Data Using 3D U-Nets and EarthFormers

Yang Li , Haiyu Dong , Zuliang Fang , Jonathan Weyn , Pete Luferenko

分类：计算机视觉 | 人工智能

2022-12-06

Accurate and timely rain prediction is crucial for decision making and is also a challenging task. This paper presents a solution which won the 2 nd prize in the Weather4cast 2022 NeurIPS competition using 3D U-Nets and EarthFormers for 8-hour probabilistic rain prediction based on multi-band satellite images. The spatial context effect of the input satellite image has been deeply explored and optimal context range has been found. Based on the imbalanced rain distribution, we trained multiple models with different loss functions. To further improve the model performance, multi-model ensemble and threshold optimization were used to produce the final probabilistic rain prediction. Experiment results and leaderboard scores demonstrate that optimal spatial context, combined loss function, multi-model ensemble, and threshold optimization all provide modest model gain. A permutation test was used to analyze the effect of each satellite band on rain prediction, and results show that satellite bands signifying cloudtop phase (8.7 um) and cloud-top height (10.8 and 13.4 um) are the best predictors for rain prediction. The source code is available at https://github.com/bugsuse/weather4cast-2022-stage2.

translated by 谷歌翻译

Simple Baseline for Weather Forecasting Using Spatiotemporal Context Aggregation Network

Minseok Seo , Doyi Kim , Seungheon Shin , Eunbin Kim , Sewoong Ahn , Yeji Choi

分类：计算机视觉

2022-12-06

Traditional weather forecasting relies on domain expertise and computationally intensive numerical simulation systems. Recently, with the development of a data-driven approach, weather forecasting based on deep learning has been receiving attention. Deep learning-based weather forecasting has made stunning progress, from various backbone studies using CNN, RNN, and Transformer to training strategies using weather observations datasets with auxiliary inputs. All of this progress has contributed to the field of weather forecasting; however, many elements and complex structures of deep learning models prevent us from reaching physical interpretations. This paper proposes a SImple baseline with a spatiotemporal context Aggregation Network (SIANet) that achieved state-of-the-art in 4 parts of 5 benchmarks of W4C22. This simple but efficient structure uses only satellite images and CNNs in an end-to-end fashion without using a multi-model ensemble or fine-tuning. This simplicity of SIANet can be used as a solid baseline that can be easily applied in weather forecasting using deep learning.

translated by 谷歌翻译

Solving the Weather4cast Challenge via Visual Transformers for 3D Images

Yury Belousov , Sergey Polezhaev , Brian Pulfer

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-05

Accurately forecasting the weather is an important task, as many real-world processes and decisions depend on future meteorological conditions. The NeurIPS 2022 challenge entitled Weather4cast poses the problem of predicting rainfall events for the next eight hours given the preceding hour of satellite observations as a context. Motivated by the recent success of transformer-based architectures in computer vision, we implement and propose two methodologies based on this architecture to tackle this challenge. We find that ensembling different transformers with some baseline models achieves the best performance we could measure on the unseen test data. Our approach has been ranked 3rd in the competition.

translated by 谷歌翻译

WeatherFusionNet: Predicting Precipitation from Satellite Data

Jiří Pihrt , Rudolf Raevskiy , Petr Šimánek , Matej Choma

分类：计算机视觉 | 机器学习

2022-11-30

The short-term prediction of precipitation is critical in many areas of life. Recently, a large body of work was devoted to forecasting radar reflectivity images. The radar images are available only in areas with ground weather radars. Thus, we aim to predict high-resolution precipitation from lower-resolution satellite radiance images. A neural network called WeatherFusionNet is employed to predict severe rain up to eight hours in advance. WeatherFusionNet is a U-Net architecture that fuses three different ways to process the satellite data; predicting future satellite frames, extracting rain information from the current frames, and using the input sequence directly. Using the presented method, we achieved 1st place in the NeurIPS 2022 Weather4Cast Core challenge. The code and trained parameters are available at \url{https://github.com/Datalab-FIT-CTU/weather4cast-2022}.

translated by 谷歌翻译

Region-Conditioned Orthogonal 3D U-Net for Weather4Cast Competition

Taehyeon Kim , Shinhwan Kang , Hyeonjeong Shin , Deukryeol Yoon , Seongha Eom , Kijung Shin , Se-Young Yun

分类：计算机视觉

2022-12-05

The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional layers. Additionally, we facilitate the generalization with a bag of training strategies: mixup data augmentation, self-distillation, and feature-wise linear modulation (FiLM). Presented modifications outperform the baseline algorithms (3D U-Net) by up to 19.54% with less than 1% additional parameters, which won the 4th place in the core test leaderboard.

translated by 谷歌翻译

Domain Generalization Strategy to Train Classifiers Robust to Spatial-Temporal Shift

Minseok Seo , Doyi Kim , Seungheon Shin , Eunbin Kim , Sewoong Ahn , Yeji Choi

分类：计算机视觉

2022-12-06

Deep learning-based weather prediction models have advanced significantly in recent years. However, data-driven models based on deep learning are difficult to apply to real-world applications because they are vulnerable to spatial-temporal shifts. A weather prediction task is especially susceptible to spatial-temporal shifts when the model is overfitted to locality and seasonality. In this paper, we propose a training strategy to make the weather prediction model robust to spatial-temporal shifts. We first analyze the effect of hyperparameters and augmentations of the existing training strategy on the spatial-temporal shift robustness of the model. Next, we propose an optimal combination of hyperparameters and augmentation based on the analysis results and a test-time augmentation. We performed all experiments on the W4C22 Transfer dataset and achieved the 1st performance.

translated by 谷歌翻译

A recurrent CNN for online object detection on raw radar frames

Colin Decourt , Rufin VanRullen , Didier Salle , Thomas Oberlin

分类：计算机视觉

2022-12-21

Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive. The code will be available soon.

translated by 谷歌翻译

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

Shuliang Ning , Mengcheng Lan , Yanran Li , Chaofeng Chen , Qian Chen , Xunlai Chen , Xiaoguang Han , Shuguang Cui

分类：计算机视觉

2022-12-09

The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner. This way often leads to severe performance degradation when they try to extrapolate a longer period of future, thus limiting the practical use of the prediction model. Alternatively, a Multi-In-Multi-Out (MIMO) architecture that outputs all the future frames at one shot naturally breaks the recursive manner and therefore prevents error accumulation. However, only a few MIMO models for video prediction are proposed and they only achieve inferior performance due to the date. The real strength of the MIMO model in this area is not well noticed and is largely under-explored. Motivated by that, we conduct a comprehensive investigation in this paper to thoroughly exploit how far a simple MIMO architecture can go. Surprisingly, our empirical studies reveal that a simple MIMO model can outperform the state-of-the-art work with a large margin much more than expected, especially in dealing with longterm error accumulation. After exploring a number of ways and designs, we propose a new MIMO architecture based on extending the pure Transformer with local spatio-temporal blocks and a new multi-output decoder, namely MIMO-VP, to establish a new standard in video prediction. We evaluate our model in four highly competitive benchmarks (Moving MNIST, Human3.6M, Weather, KITTI). Extensive experiments show that our model wins 1st place on all the benchmarks with remarkable performance gains and surpasses the best SISO model in all aspects including efficiency, quantity, and quality. We believe our model can serve as a new baseline to facilitate the future research of video prediction tasks. The code will be released.

translated by 谷歌翻译

Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks

Vivien Sainte Fare Garnot , Loic Landrieu

分类：计算机视觉

2021-07-16

前所未有的访问多时间卫星图像，为各种地球观察任务开辟了新的视角。其中，农业包裹的像素精确的Panoptic分割具有重大的经济和环境影响。虽然研究人员对单张图像进行了探索了这个问题，但我们争辩说，随着图像的时间序列更好地寻址作物候选的复杂时间模式。在本文中，我们介绍了卫星图像时间序列（坐着）的Panoptic分割的第一端到端，单级方法（坐姿）。该模块可以与我们的新型图像序列编码网络相结合，依赖于时间自我关注，以提取丰富和自适应的多尺度时空特征。我们还介绍了Pastis，第一个开放式访问坐在Panoptic注释的数据集。我们展示了对多个竞争架构的语义细分的编码器的优越性，并建立了坐在的第一封Panoptic细分状态。我们的实施和痛苦是公开的。

translated by 谷歌翻译

Sci-Net: a Scale Invariant Model for Building Detection from Aerial Images

Hasan Nasrallah , Ali J. Ghandour

分类：计算机视觉

2021-11-12

建筑物分割是地球观测和空中图像分析领域的基本任务。最现有的基于深度学习的文献中的基于深度学习的算法可以应用于固定或窄的空间分辨率图像。在实践方案中，用户处理广泛的图像分辨率，因此，通常需要重新确定给定的空中图像以匹配用于训练深度学习模型的数据集的空间分辨率。然而，这将导致输出分割掩模的质量严重降级。要处理此问题，我们提出了这项研究，该研究是能够在不同空间分辨率下的空中图像中存在的建筑物的规模不变神经网络（SCI-NET）。具体而言，我们修改了U-Net架构并用密集的空间金字塔池（ASPP）融合，以提取细粒度的多尺度表示。我们将拟议模型对开放城市AI DataSet上的若干艺术模型的拟议模型进行了比较，并显示了SCI-Net在数据集中可用的所有分辨率方面提供稳定的改进余量。

translated by 谷歌翻译

EAN: Event Adaptive Network for Enhanced Action Recognition

Yuan Tian , Yichao Yan , Guangtao Zhai , Guodong Guo , Zhiyong Gao

分类：计算机视觉

2021-07-22

有效地对视频中的空间信息进行建模对于动作识别至关重要。为了实现这一目标，最先进的方法通常采用卷积操作员和密集的相互作用模块，例如非本地块。但是，这些方法无法准确地符合视频中的各种事件。一方面，采用的卷积是有固定尺度的，因此在各种尺度的事件中挣扎。另一方面，密集的相互作用建模范式仅在动作 - 欧元零件时实现次优性能，给最终预测带来了其他噪音。在本文中，我们提出了一个统一的动作识别框架，以通过引入以下设计来研究视频内容的动态性质。首先，在提取本地提示时，我们会生成动态尺度的时空内核，以适应各种事件。其次，为了将这些线索准确地汇总为全局视频表示形式，我们建议仅通过变压器在一些选定的前景对象之间进行交互，从而产生稀疏的范式。我们将提出的框架称为事件自适应网络（EAN），因为这两个关键设计都适应输入视频内容。为了利用本地细分市场内的短期运动，我们提出了一种新颖有效的潜在运动代码（LMC）模块，进一步改善了框架的性能。在几个大规模视频数据集上进行了广泛的实验，例如，某种东西，动力学和潜水48，验证了我们的模型是否在低拖鞋上实现了最先进或竞争性的表演。代码可在：https：//github.com/tianyuan168326/ean-pytorch中找到。

translated by 谷歌翻译

Video Frame Interpolation Transformer

Zhihao Shi , Xiangyu Xu , Xiaohong Liu , Jun Chen , Ming-Hsuan Yang

分类：计算机视觉

2021-11-27

用于深度卷积神经网络的视频插值的现有方法，因此遭受其内在限制，例如内部局限性核心权重和受限制的接收领域。为了解决这些问题，我们提出了一种基于变换器的视频插值框架，允许内容感知聚合权重，并考虑具有自我关注操作的远程依赖性。为避免全球自我关注的高计算成本，我们将当地注意的概念引入视频插值并将其扩展到空间域。此外，我们提出了一个节省时间的分离策略，以节省内存使用，这也提高了性能。此外，我们开发了一种多尺度帧合成方案，以充分实现变压器的潜力。广泛的实验证明了所提出的模型对最先进的方法来说，定量和定性地在各种基准数据集上进行定量和定性。

translated by 谷歌翻译

Benchmark Dataset for Precipitation Forecasting by Post-Processing the Numerical Weather Prediction

Taehyeon Kim , Namgyu Ho , Donggyu Kim , Se-Young Yun

分类：机器学习

2022-06-30

降水预测是一项重要的科学挑战，对社会产生广泛影响。从历史上看，这项挑战是使用数值天气预测（NWP）模型解决的，该模型基于基于物理的模拟。最近，许多作品提出了一种替代方法，使用端到端深度学习（DL）模型来替代基于物理的NWP。尽管这些DL方法显示出提高的性能和计算效率，但它们在长期预测中表现出局限性，并且缺乏NWP模型的解释性。在这项工作中，我们提出了一个混合NWP-DL工作流程，以填补独立NWP和DL方法之间的空白。在此工作流程下，NWP输出被馈入深层模型，该模型后处理数据以产生精致的降水预测。使用自动气象站（AWS）观测值作为地面真相标签，对深层模型进行了监督训练。这可以实现两全其美，甚至可以从NWP技术的未来改进中受益。为了促进朝这个方向进行研究，我们提出了一个专注于朝鲜半岛的新型数据集，该数据集称为KOMET（KOMEN（KOREA气象数据集），由NWP预测和AWS观察组成。对于NWP，我们使用全局数据同化和预测系统-KOREA集成模型（GDAPS-KIM）。

translated by 谷歌翻译

RHA-Net: An Encoder-Decoder Network with Residual Blocks and Hybrid Attention Mechanisms for Pavement Crack Segmentation

Guijie Zhu , Zhun Fan , Jiacheng Liu , Duan Yuan , Peili Ma , Meihua Wang , Weihua Sheng , Kelvin C. P. Wang

分类：计算机视觉 | 机器学习

2022-07-28

人行道表面数据的获取和评估在路面条件评估中起着至关重要的作用。在本文中，提出了一个称为RHA-NET的自动路面裂纹分割的有效端到端网络，以提高路面裂纹分割精度。 RHA-NET是通过将残留块（重阻）和混合注意块集成到编码器架构结构中来构建的。这些重组用于提高RHA-NET提取高级抽象特征的能力。混合注意块旨在融合低级功能和高级功能，以帮助模型专注于正确的频道和裂纹区域，从而提高RHA-NET的功能表现能力。构建并用于训练和评估所提出的模型的图像数据集，其中包含由自设计的移动机器人收集的789个路面裂纹图像。与其他最先进的网络相比，所提出的模型在全面的消融研究中验证了添加残留块和混合注意机制的功能。此外，通过引入深度可分离卷积生成的模型的轻加权版本可以更好地实现性能和更快的处理速度，而U-NET参数数量的1/30。开发的系统可以在嵌入式设备Jetson TX2（25 fps）上实时划分路面裂纹。实时实验拍摄的视频将在https://youtu.be/3xiogk0fig4上发布。

translated by 谷歌翻译

Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks

Lasse Espeholt , Shreya Agrawal , Casper Sønderby , Manoj Kumar , Jonathan Heek , Carla Bromberg , Cenk Gazen , Jason Hickey , Aaron Bell , Nal Kalchbrenner

分类：机器学习

2021-11-14

由于其对人类生命，运输，粮食生产和能源管理的高度影响，因此在科学上研究了预测天气的问题。目前的运营预测模型基于物理学，并使用超级计算机来模拟大气预测，提前预测数小时和日期。更好的基于物理的预测需要改进模型本身，这可能是一个实质性的科学挑战，以及潜在的分辨率的改进，可以计算令人望而却步。基于神经网络的新出现的天气模型代表天气预报的范式转变：模型学习来自数据的所需变换，而不是依赖于手工编码的物理，并计算效率。然而，对于神经模型，每个额外的辐射时间都会构成大量挑战，因为它需要捕获更大的空间环境并增加预测的不确定性。在这项工作中，我们提出了一个神经网络，能够提前十二小时的大规模降水预测，并且从相同的大气状态开始，该模型能够比最先进的基于物理的模型更高的技能HRRR和HREF目前在美国大陆运营。可解释性分析加强了模型学会模拟先进物理原则的观察。这些结果代表了建立与神经网络有效预测的新范式的实质性步骤。

translated by 谷歌翻译

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Xingjian Shi , Zhourong Chen , Hao Wang , Dit-Yan Yeung , Wai-kin Wong , Wang-chun Woo

分类：

2015-06-13

The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-theart operational ROVER algorithm for precipitation nowcasting.

translated by 谷歌翻译

Deep coastal sea elements forecasting using U-Net based models

Jesús García Fernández , Ismail Alaoui Abdellaoui , Siamak Mehrkanoon

分类：机器学习 | 计算机视觉

2020-11-06

能量供应和需求受到气象条件的影响。随着对可再生能源的需求增加，精确天气预报的相关性增加。能源提供者和决策者要求天气信息进行明智的选择，并根据业务目标建立最佳计划。由于最近应用于卫星图像的深度学习技术，使用遥感数据的天气预报也是主要进步的主题。本文通过基于U-Net的架构调查了荷兰沿海海洋元素的多个步骤框架预测。来自哥白尼观察计划的每小时数据在2年内跨过跨越2年的时间，用于培训模型并进行预测，包括季节性预测。我们提出了U-Net架构的变化，并使用剩余连接，并行卷积和不对称卷积进一步扩展了这一新颖模型，以便引入三种额外的架构。特别是，我们表明，配备有平行和不对称卷积的架构以及跳过连接优于其他三个讨论的模型。

translated by 谷歌翻译

TINYCD: A (Not So) Deep Learning Model For Change Detection

Andrea Codegoni , Gabriele Lombardi , Alessandro Ferrari

分类：计算机视觉 | 机器学习

2022-07-26

更改检测的目的（CD）是通过比较在不同时间拍摄的两张图像来检测变化。 CD的挑战性部分是跟踪用户想要突出显示的变化，例如新建筑物，并忽略了由于外部因素（例如环境，照明条件，雾或季节性变化）而引起的变化。深度学习领域的最新发展使研究人员能够在这一领域取得出色的表现。特别是，时空注意的不同机制允许利用从模型中提取的空间特征，并通过利用这两个可用图像来以时间方式将它们相关联。不利的一面是，这些模型已经变得越来越复杂且大，对于边缘应用来说通常是不可行的。当必须将模型应用于工业领域或需要实时性能的应用程序时，这些都是限制。在这项工作中，我们提出了一个名为TinyCD的新型模型，证明既轻量级又有效，能够实现较少参数13-150x的最新技术状态。在我们的方法中，我们利用了低级功能比较图像的重要性。为此，我们仅使用几个骨干块。此策略使我们能够保持网络参数的数量较低。为了构成从这两个图像中提取的特征，我们在参数方面引入了一种新颖的经济性，混合块能够在时空和时域中交叉相关的特征。最后，为了充分利用计算功能中包含的信息，我们定义了能够执行像素明智分类的PW-MLP块。源代码，模型和结果可在此处找到：https：//github.com/andreacodegoni/tiny_model_4_cd

translated by 谷歌翻译

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

David Junhao Zhang , Kunchang Li , Yunpeng Chen , Yali Wang , Shashwat Chandra , Yu Qiao , Luoqi Liu , Mike Zheng Shou

分类：计算机视觉

2021-11-24

自我关注已成为最近网络架构的一个组成部分，例如，统治主要图像和视频基准的变压器。这是因为自我关注可以灵活地模拟远程信息。出于同样的原因，研究人员最近使尝试恢复多层Perceptron（MLP）并提出一些类似MLP的架构，显示出极大的潜力。然而，当前的MLP样架构不擅长捕获本地细节并缺乏对图像和/或视频中的核心细节的逐步了解。为了克服这个问题，我们提出了一种新颖的Morphmlp架构，该架构专注于在低级层处捕获本地细节，同时逐渐改变，以专注于高级层的长期建模。具体地，我们设计一个完全连接的层，称为Morphfc，两个可变过滤器，其沿着高度和宽度尺寸逐渐地发展其接收领域。更有趣的是，我们建议灵活地调整视频域中的Morphfc层。为了我们最好的知识，我们是第一个创建类似MLP骨干的用于学习视频表示的骨干。最后，我们对图像分类，语义分割和视频分类进行了广泛的实验。我们的Morphmlp，如此自我关注的自由骨干，可以与基于自我关注的型号一样强大。

translated by 谷歌翻译