Optical flow, which computes the apparent motion from a pair of video frames, is a critical tool for scene motion estimation. Correlation volume is the central component of optical flow computational neural models. It estimates the pairwise matching costs between cross-frame features, and is then used to decode optical flow. However, traditional correlation volume is frequently noisy, outlier-prone, and sensitive to motion blur. We observe that, although the recent RAFT algorithm also adopts the traditional correlation volume, its additional context encoder provides semantically representative features to the flow decoder, implicitly compensating for the deficiency of the correlation volume. However, the benefits of this context encoder has been barely discussed or exploited. In this paper, we first investigate the functionality of RAFT's context encoder, then propose a new Context Guided Correlation Volume (CGCV) via gating and lifting schemes. CGCV can be universally integrated with RAFT-based flow computation methods for enhanced performance, especially effective in the presence of motion blur, de-focus blur and atmospheric effects. By incorporating the proposed CGCV with previous Global Motion Aggregation (GMA) method, at a minor cost of 0.5% extra parameters, the rank of GMA is lifted by 23 places on KITTI 2015 Leader Board, and 3 places on Sintel Leader Board. Moreover, at a similar model size, our correlation volume achieves competitive or superior performance to state of the art peer supervised models that employ Transformers or Graph Reasoning, as verified by extensive experiments.
translated by 谷歌翻译
Image harmonization aims to produce visually harmonious composite images by adjusting the foreground appearance to be compatible with the background. When the composite image has photographic foreground and painterly background, the task is called painterly image harmonization. There are only few works on this task, which are either time-consuming or weak in generating well-harmonized results. In this work, we propose a novel painterly harmonization network consisting of a dual-domain generator and a dual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain. The dual-domain generator performs harmonization by using AdaIn modules in the spatial domain and our proposed ResFFT modules in the frequency domain. The dual-domain discriminator attempts to distinguish the inharmonious patches based on the spatial feature and frequency feature of each patch, which can enhance the ability of generator in an adversarial manner. Extensive experiments on the benchmark dataset show the effectiveness of our method. Our code and model are available at https://github.com/bcmi/PHDNet-Painterly-Image-Harmonization.
translated by 谷歌翻译
The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients, each with heterogeneous images from a small subset of classes to be recognized. From the cloud-client discrepancies on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by concentrating the focus on each individual client's local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images, thereby calibrating the focus and the recognition result. Further considering that ICIIA's overhead is dominated by linear projection, we propose partitioned linear projection with feature shuffling for replacement and allow increasing the number of partitions to dramatically improve efficiency without scarifying too much accuracy. We finally evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets. Extensive evaluation results demonstrate the effectiveness and efficiency of ICIIA. Specifically, for ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA can improve the testing accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and 0.02% of FLOPs, respectively.
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
稀疏的一般矩阵乘法(SPGEMM)是许多科学应用中的基本构件。 SPGEMM的一项关键任务是计算或预测有效的内存分配和负载平衡的输出矩阵的结构(即,每个输出行的非零元素的数量),这会影响SPGEMM的整体性能。现有工作要么精确地计算出输出结构,要么采用基于上限或采样的方法来预测输出结构。但是,这些方法要么需要太多执行时间,要么不够准确。在本文中,我们提出了一种基于采样的新方法,与现有基于采样的方法相比,具有更好的精度和低成本。该方法首先通过利用中间产品的数量(表示为flop)和同一采样结果矩阵的非零元素(表示为NNZ)来预测SPGEMM的压缩比。然后,通过将每次输出行除以预测的压缩率来获得预测的输出结构。我们还建议使用优化的计算开销的基于采样的方法的参考设计,以证明所提出的方法的准确性。我们构建具有各种矩阵维度和稀疏结构的625个测试用例,以评估预测准确性。实验结果表明,在最坏的情况下,所提出方法和参考设计的绝对相对误差分别为1.56 \%和8.12 \%,分别为25 \%和156 \%。
translated by 谷歌翻译
很少有图像生成和几张相关的图像翻译是两个相关的任务,这两个任务旨在为只有几张图像的看不见类别生成新图像。在这项工作中,我们首次尝试将几张图像翻译方法调整为几乎没有图像生成任务。几乎没有图像翻译将图像分解为样式向量和内容图。看不见的样式矢量可以与不同的见面内容映射结合使用,以产生不同的图像。但是,它需要存储可见的图像以提供内容图,并且看不见的样式向量可能与可见的内容映射不相容。为了使其适应少量图像生成任务,我们通过将连续内容映射量化为离散的内容映射而不是存储可见图像,从而学习了局部内容向量的紧凑词字典。此外,我们对根据样式向量进行的离散内容图的自回归分布进行建模,这可以减轻内容映射和样式向量之间的不兼容。三个真实数据集的定性和定量结果表明,与以前的方法相比,我们的模型可以为看不见的类别产生更高的多样性和忠诚度图像。
translated by 谷歌翻译
学习为仅基于几个图像(称为少数图像生成的少数图像)生成新类别的新图像,引起了研究的兴趣。几项最先进的作品取得了令人印象深刻的结果,但多样性仍然有限。在这项工作中,我们提出了一个新型的三角洲生成对抗网络(Deltagan),该网络由重建子网和一代子网组成。重建子网捕获了类别内转换,即同一类别对之间的三角洲。该生成子网为输入图像生成了特定于样本的三角洲,该图像与此输入图像结合使用,以在同一类别中生成新图像。此外,对抗性的三角洲匹配损失旨在将上述两个子网链接在一起。六个基准数据集的广泛实验证明了我们提出的方法的有效性。我们的代码可从https://github.com/bcmi/deltagan-few-shot-image-generation获得。
translated by 谷歌翻译
作为一个常见的图像编辑操作,图像组成旨在将前景从一个图像切割并粘贴在另一个图像上,从而产生复合图像。但是,有许多问题可能使复合图像不现实。这些问题可以总结为前景和背景之间的不一致,包括外观不一致(例如,不兼容的照明),几何不一致(例如不合理的大小)和语义不一致(例如,不匹配的语义上下文)。先前的作品将图像组成任务分为多个子任务,其中每个子任务在一个或多个问题上目标。具体而言,对象放置旨在为前景找到合理的比例,位置和形状。图像混合旨在解决前景和背景之间的不自然边界。图像协调旨在调整前景的照明统计数据。影子生成旨在为前景产生合理的阴影。通过将所有上述努力放在一起,我们可以获取现实的复合图像。据我们所知,以前没有关于图像组成的调查。在本文中,我们对图像组成的子任务进行了全面的调查。对于每个子任务,我们总结了传统方法,基于深度学习的方法,数据集和评估。我们还指出了每个子任务中现有方法的局限性以及整个图像组成任务的问题。图像组合的数据集和代码在https://github.com/bcmi/awesome-image-composition上进行了总结。
translated by 谷歌翻译
图像构成目标在将前景对象插入到背景图像中。最先前的图像构成方法专注于调整前景,使其与背景兼容,同时忽略背景的前景的阴影效果。在这项工作中,我们专注于为复合图像中的前景对象产生合理的阴影。首先,我们通过基于配对的真实图像和deshadowed图像生成合成合成图像来贡献实际阴影生成数据集脱差。然后,我们提出了一种新的阴影生成网络SGRNet,其包括阴影掩模预测阶段和阴影填充阶段。在阴影掩模预测阶段,前景和背景信息彻底互动以产生前景影掩模。在阴影填充阶段,预计暗影参数填充阴影区域。我们的Desoba数据集和真实复合图像的广泛实验证明了我们所提出的方法的有效性。我们的数据集和代码可在https://github.com/bcmi/object-shadow-generation-dataset-desoba获得。
translated by 谷歌翻译
我们从频道明智激活的角度调查CNN的对抗性鲁棒性。通过比较\ Textit {非鲁棒}(通常训练)和\ exingit {REXITIT {REARUSTIFIED}(普及培训的)模型,我们观察到对抗性培训(AT)通过将频道明智的数据与自然的渠道和自然的对抗激活对齐来强调CNN同行。然而,在处理逆势数据时仍仍会过度激活以\ texit {excy-computive}(nr)的频道仍会过度激活。此外,我们还观察到,在所有课程上不会导致类似的稳健性。对于强大的类,具有较大激活大小的频道通常是更长的\ extedit {正相关}(pr)到预测,但这种对齐不适用于非鲁棒类。鉴于这些观察结果,我们假设抑制NR通道并对齐PR与其相关性进一步增强了在其下的CNN的鲁棒性。为了检查这个假设,我们介绍了一种新的机制,即\下划线{C} Hannel-Wise \ Underline {i} Mportance的\下划线{F} eature \ Underline {s}选举(CIFS)。 CIFS通过基于与预测的相关性产生非负乘法器来操纵某些层的激活。在包括CIFAR10和SVHN的基准数据集上的广泛实验明确验证了强制性CNN的假设和CIFS的有效性。 \ url {https://github.com/hanshuyan/cifs}
translated by 谷歌翻译