The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2
translated by 谷歌翻译
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.
translated by 谷歌翻译
在极低光线条件下捕获图像会对标准相机管道带来重大挑战。图像变得太黑了,太吵了,这使得传统的增强技术几乎不可能申请。最近,基于学习的方法已经为此任务显示了非常有希望的结果,因为它们具有更大的表现力能力来允许提高质量。这些研究中的激励,在本文中,我们的目标是利用爆破摄影来提高性能,并从极端暗的原始图像获得更加锐利和更准确的RGB图像。我们提出的框架的骨干是一种新颖的粗良好网络架构,逐步产生高质量的输出。粗略网络预测了低分辨率,去噪的原始图像,然后将其馈送到精细网络以恢复微尺的细节和逼真的纹理。为了进一步降低噪声水平并提高颜色精度,我们将该网络扩展到置换不变结构,使得它作为输入突发为低光图像,并在特征级别地合并来自多个图像的信息。我们的实验表明,我们的方法通过生产更详细和相当更高的质量的图像来引起比最先进的方法更令人愉悦的结果。
translated by 谷歌翻译
许多图像处理网络在整个输入图像上应用一组静态卷积核,这是自然图像的次优,因为它们通常由异质视觉模式组成。最近在分类,分割和图像恢复方面的工作已经证明,动态核优于局部图像统计数据的静态内核。然而,这些工作经常采用每像素卷积核,这引入了高存储器和计算成本。为了在没有显着开销的情况下实现空间变化的处理,我们呈现\ TextBF {Malle} Chable \ TextBF {CONV} olution(\ textbf {malleconv}),作为动态卷积的有效变体。 \我们的权重由能够在特定空间位置产生内容相关的输出的有效预测器网络动态地产生。与以前的作品不同,\我们从输入生成一组更小的空间变化内核,这会扩大网络的接收领域,并显着降低计算和内存成本。然后通过具有最小内存开销的高效切片和-Conver操作员将这些内核应用于全分辨率的特征映射。我们进一步使用MalleConv建立了高效的去噪网络,被创建为\ textbf {mallenet}。它实现了高质量的结果,没有非常深的架构,例如,它是8.91 $ \ times $的速度快于最好的去噪算法(Swinir),同时保持类似的性能。我们还表明,添加到标准的基于卷积的骨干的单个\我们可以贡献显着降低计算成本或以相似的成本提高图像质量。项目页面:https://yifanjiang.net/malleconv.html
translated by 谷歌翻译
Deep convolutional neural networks have achieved great progress in image denoising tasks. However, their complicated architectures and heavy computational cost hinder their deployments on a mobile device. Some recent efforts in designing lightweight denoising networks focus on reducing either FLOPs (floating-point operations) or the number of parameters. However, these metrics are not directly correlated with the on-device latency. By performing extensive analysis and experiments, we identify the network architectures that can fully utilize powerful neural processing units (NPUs) and thus enjoy both low latency and excellent denoising performance. To this end, we propose a mobile-friendly denoising network, namely MFDNet. The experiments show that MFDNet achieves state-of-the-art performance on real-world denoising benchmarks SIDD and DND under real-time latency on mobile devices. The code and pre-trained models will be released.
translated by 谷歌翻译
In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.
translated by 谷歌翻译
我们提出了一个可训练的图像信号处理(ISP)框架,该框架生成智能手机捕获的原始图像的数码单反相关图像。为了解决训练图对之间的颜色错位,我们采用了颜色条件的ISP网络,并优化了每个输入原始和参考DSLR图像之间的新型参数颜色映射。在推断期间,我们通过设计具有有效的全局上下文变压器模块的颜色预测网络来预测目标颜色图像。后者有效利用全球信息来学习一致的颜色和音调映射。我们进一步提出了一个强大的掩盖对齐损失,以识别和丢弃训练期间运动估计不准确的区域。最后,我们在野外(ISPW)数据集中介绍ISP,由弱配对的RAW和DSLR SRGB图像组成。我们广泛评估我们的方法,在两个数据集上设置新的最新技术。
translated by 谷歌翻译
近年来已经提出了显示屏下的显示器,作为减少移动设备的形状因子的方式,同时最大化屏幕区域。不幸的是,将相机放在屏幕后面导致显着的图像扭曲,包括对比度,模糊,噪音,色移,散射伪像和降低光敏性的损失。在本文中,我们提出了一种图像恢复管道,其是ISP-Annostic,即它可以与任何传统ISP组合,以产生使用相同的ISP与常规相机外观匹配的最终图像。这是通过执行Raw-Raw Image Restoration的深度学习方法来实现的。为了获得具有足够对比度和场景多样性的大量实际展示摄像机培训数据,我们还开发利用HDR监视器的数据捕获方法,以及数据增强方法以产生合适的HDR内容。监视器数据补充有现实世界的数据,该数据具有较少的场景分集,但允许我们实现细节恢复而不受监视器分辨率的限制。在一起,这种方法成功地恢复了颜色和对比度以及图像细节。
translated by 谷歌翻译
在本文中,我们使第一个基准测试精力阐述在低光增强中使用原始图像的优越性,并开发一种以更灵活和实用的方式利用原始图像的新颖替代路线。通过对典型图像处理管道进行充分考虑的启发,我们受到启发,开发了一种新的评估框架,分解增强模型(FEM),它将原始图像的属性分解成可测量的因素,并提供了探索原始图像属性的工具凭经验影响增强性能。经验基金基准结果表明,在元数据中记录的数据和曝光时间的线性起作用最关键的作用,这在将SRGB图像作为输入中的方法采取各种措施中提出了不同的性能增益。通过从基准测试结果中获得的洞察力,开发了一种原始曝光增强网络(REENET),这在实际应用中的实际应用中的优缺点与仅在原始图像中的原始应用中的优点和可接近之间的权衡培训阶段。 Reenet将SRGB图像投影到线性原域中,以应用相应的原始图像的约束,以减少建模培训的难度。之后,在测试阶段,我们的reenet不依赖于原始图像。实验结果不仅展示了Reenet到最先进的SRGB的方法以及原始指导和所有组件的有效性。
translated by 谷歌翻译
深度卷积神经网络(DCNN)辅助高动态范围(HDR)成像最近接受了很多关注。 DCNN生成的HDR图像的质量过于传统的对应物。然而,DCNN容易被计算密集和富力耗电。为了解决挑战,我们提出了用于极端双曝光图像融合的轻质CNN的基于轻型CNN的算法,这可以在具有有限的电力和硬件资源的各种嵌入式计算平台上实现。使用两个子网络:GlobalNet(g)和detailnet(d)。 G的目标是学习关于空间维度的全局信息,而D旨在增强通道维度的本地细节。 G和D都仅基于深度卷积(D CONC)和何时卷积(P CONV),以减少所需的参数和计算。实验结果显示所提出的技术可以在极其暴露的区域中产生具有合理细节的HDR图像。我们的模型超过了其他最先进的方法0.7至8.5,至于PSNR得分,并与其他方式达到7,675至463,385参数减少
translated by 谷歌翻译
单像超分辨率可以在需要可靠的视觉流以监视任务,处理远程操作或研究相关视觉细节的环境中支持机器人任务。在这项工作中,我们为实时超级分辨率提出了一个有效的生成对抗网络模型。我们采用了原始SRGAN的量身定制体系结构和模型量化,以提高CPU和Edge TPU设备上的执行,最多达到200 fps的推断。我们通过将其知识提炼成较小版本的网络,进一步优化我们的模型,并与标准培训方法相比获得显着的改进。我们的实验表明,与较重的最新模型相比,我们的快速和轻量级模型可保持相当令人满意的图像质量。最后,我们对图像传输进行带宽降解的实验,以突出提出的移动机器人应用系统的优势。
translated by 谷歌翻译
我们提出了Neuricam,这是一种基于钥匙帧的视频超分辨率和着色系统,可从双模式IoT摄像机获得低功耗视频捕获。我们的想法是设计一个双模式摄像机系统,其中第一个模式是低功率(1.1〜MW),但仅输出灰度,低分辨率和嘈杂的视频,第二种模式会消耗更高的功率(100〜MW),但输出会输出。颜色和更高分辨率的图像。为了减少总能源消耗,我们在高功率模式下高功率模式仅输出图像每秒一次。然后将来自该相机系统的数据无线流传输到附近的插入网关,在那里我们运行实时神经网络解码器,以重建更高的分辨率颜色视频。为了实现这一目标,我们基于每个空间位置的特征映射和输入框架的内容之间的相关性,引入了一种注意力特征滤波器机制,该机制将不同的权重分配给不同的特征。我们使用现成的摄像机设计无线硬件原型,并解决包括数据包丢失和透视不匹配在内的实用问题。我们的评估表明,我们的双摄像机硬件可减少相机的能耗,同时在先前的视频超级分辨率方法中获得平均的灰度PSNR增益为3.7〜db,而在现有的颜色传播方法上,我们的灰度尺度PSNR增益为3.7 〜db。开源代码:https://github.com/vb000/neuricam。
translated by 谷歌翻译
基于深度学习的单图像超分辨率(SISR)方法引起了人们的关注,并在现代高级GPU上取得了巨大的成功。但是,大多数最先进的方法都需要大量参数,记忆和计算资源,这些参数通常会显示在当前移动设备CPU/NPU上时显示出较低的推理时间。在本文中,我们提出了一个简单的普通卷积网络,该网络具有快速最近的卷积模块(NCNET),该模块对NPU友好,可以实时执行可靠的超级分辨率。提出的最近的卷积具有与最近的UP采样相同的性能,但更快,更适合Android NNAPI。我们的模型可以很容易地在具有8位量化的移动设备上部署,并且与所有主要的移动AI加速器完全兼容。此外,我们对移动设备上的不同张量操作进行了全面的实验,以说明网络体系结构的效率。我们的NCNET在DIV2K 3X数据集上进行了训练和验证,并且与其他有效的SR方法的比较表明,NCNET可以实现高保真SR结果,同时使用更少的推理时间。我们的代码和预估计的模型可在\ url {https://github.com/algolzw/ncnet}上公开获得。
translated by 谷歌翻译
移动设备上的低光成像通常是由于不足的孔径穿过相对较小的孔径而挑战,导致信噪比较低。以前的大多数关于低光图像处理的作品仅关注单个任务,例如照明调整,颜色增强或删除噪声;或在密切依赖于从特定的摄像机模型中收集的长时间曝光图像对的关节照明调整和降解任务上,因此,这些方法在需要摄像机特定的关节增强和恢复的现实环境中不太实用且可推广。为了解决这个问题,在本文中,我们提出了一个低光图像处理框架,该框架可以执行关节照明调整,增强色彩和降解性。考虑到模型特异性数据收集的难度和捕获图像的超高定义,我们设计了两个分支:系数估计分支以及关节增强和denoising分支。系数估计分支在低分辨率空间中起作用,并预测通过双边学习增强的系数,而关节增强和去核分支在全分辨率空间中工作,并逐步执行关节增强和脱氧。与现有方法相反,我们的框架在适应另一个摄像机模型时不需要回忆大量数据,这大大减少了微调我们用于实际使用方法所需的努力。通过广泛的实验,与当前的最新方法相比,我们在现实世界中的低光成像应用中证明了它的巨大潜力。
translated by 谷歌翻译
The ubiquity of camera-embedded devices and the advances in deep learning have stimulated various intelligent mobile video applications. These applications often demand on-device processing of video streams to deliver real-time, high-quality services for privacy and robustness concerns. However, the performance of these applications is constrained by the raw video streams, which tend to be taken with small-aperture cameras of ubiquitous mobile platforms in dim light. Despite extensive low-light video enhancement solutions, they are unfit for deployment to mobile devices due to their complex models and and ignorance of system dynamics like energy budgets. In this paper, we propose AdaEnlight, an energy-aware low-light video stream enhancement system on mobile devices. It achieves real-time video enhancement with competitive visual quality while allowing runtime behavior adaptation to the platform-imposed dynamic energy budgets. We report extensive experiments on diverse datasets, scenarios, and platforms and demonstrate the superiority of AdaEnlight compared with state-of-the-art low-light image and video enhancement solutions.
translated by 谷歌翻译
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive results on novel view synthesis by utilizing implicit neural representation to represent 3D scenes. Due to the process of volumetric rendering, the inference speed for NeRF is extremely slow, limiting the application scenarios of utilizing NeRF on resource-constrained hardware, such as mobile devices. Many works have been conducted to reduce the latency of running NeRF models. However, most of them still require high-end GPU for acceleration or extra storage memory, which is all unavailable on mobile devices. Another emerging direction utilizes the neural light field (NeLF) for speedup, as only one forward pass is performed on a ray to predict the pixel color. Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly. In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. We follow the setting of NeLF to train our network. Unlike existing works, we introduce a novel network architecture that runs efficiently on mobile devices with low latency and small size, i.e., saving $15\times \sim 24\times$ storage compared with MobileNeRF. Our model achieves high-resolution generation while maintaining real-time inference for both synthetic and real-world scenes on mobile devices, e.g., $18.04$ms (iPhone 13) for rendering one $1008\times756$ image of real 3D scenes. Additionally, we achieve similar image quality as NeRF and better quality than MobileNeRF (PSNR $26.15$ vs. $25.91$ on the real-world forward-facing dataset).
translated by 谷歌翻译
随着移动平台上对计算摄影和成像的需求不断增长,在相机系统中开发和集成了高级图像传感器与新型算法的发展。但是,缺乏用于研究的高质量数据以及从行业和学术界进行深入交流的难得的机会限制了移动智能摄影和成像(MIPI)的发展。为了弥合差距,我们引入了第一个MIPI挑战,其中包括五个专注于新型图像传感器和成像算法的曲目。在本文中,引入了QUAD Remosaic和Denoise,这是五个曲目之一,在完全分辨率上进行了四QFA插值向拜耳进行插值。为参与者提供了一个新的数据集,包括70(培训)和15个(验证)高品质四边形和拜耳对的场景。此外,对于每个场景,在0dB,24dB和42dB上提供了不同噪声水平的四边形。所有数据均在室外和室内条件下使用四边形传感器捕获。最终结果使用客观指标,包括PSNR,SSIM,LPIPS和KLD。本文提供了此挑战中所有模型的详细描述。有关此挑战的更多详细信息以及数据集的链接,请访问https://github.com/mipi-challenge/mipi2022。
translated by 谷歌翻译