相对色彩恒定是许多科学成像应用的重要要求。然而,大多数数码相机在其图像形成和本机传感器输出中的不同通常无法访问,例如,在智能手机相机应用中。这使得难以在一系列设备上实现一致的颜色评估,并且破坏了计算机视觉算法的性能。若要解决此问题,我们提出了一种颜色对齐模型,将相机映像形成为黑盒,并将颜色对准作为三步处理:相机响应校准,响应线性和颜色匹配。所提出的模型采用非标准颜色参考,即,通过利用新颖的线性距离特征,在不知道真实颜色值的情况下,颜色斑块。它相当于通过无监督过程确定相机参数。它还适用于跨图像的最小相应颜色块,以进行颜色,以提供适用的处理。在各种照明和曝光条件下由多个摄像机收集的两个具有挑战性的图像数据集用于评估模型。性能基准证明,与其他流行和最先进的方法相比,我们的模型实现了卓越的性能。
translated by 谷歌翻译
建模从场景辐照度到图像强度的映射对于许多计算机视觉任务至关重要。这样的映射称为相机响应。大多数数码相机都使用非线性函数来映射辐照度,该传感器测量到用于记录照片的图像强度。反应的建模对于非线性校准是必需的。在本文中,提出了一种使用单个潜在变量且完全连接的神经网络的新的高性能摄像头响应模型。该模型是使用无监督的学习与现实世界(示例)摄像头响应上的自动编码器一起生产的。然后,使用神经体系结构搜索来找到最佳的神经网络体系结构。引入了一种潜在的分布学习方法来限制潜在分布。所提出的模型在许多基准测试中实现了最新的CRF表示精度,但由于简单但有效的模型表示,在执行相机响应校准期间执行最大似然估计时,几乎是最佳当前模型的两倍。 。
translated by 谷歌翻译
极化成像已应用于越来越多的机器人视觉应用中(例如,水下导航,眩光去除,脱落,对象分类和深度估计)。可以在市场RGB极化摄像机上找到可以在单个快照中捕获颜色和偏振状态的摄像头。由于传感器的特性分散和镜头的使用,至关重要的是校准这些类型的相机以获得正确的极化测量。到目前为止开发的校准方法要么不适合这种类型的相机,要么需要在严格的设置中进行复杂的设备和耗时的实验。在本文中,我们提出了一种新方法来克服对复杂的光学系统有效校准这些相机的需求。我们表明,所提出的校准方法具有多个优点,例如任何用户都可以使用统一的线性极化光源轻松校准相机,而无需任何先验地了解其偏振状态,并且收购数量有限。我们将公开提供校准代码。
translated by 谷歌翻译
The last decade has seen an astronomical shift from imaging with DSLR and point-and-shoot cameras to imaging with smartphone cameras. Due to the small aperture and sensor size, smartphone images have notably more noise than their DSLR counterparts. While denoising for smartphone images is an active research area, the research community currently lacks a denoising image dataset representative of real noisy images from smartphone cameras with high-quality ground truth. We address this issue in this paper with the following contributions. We propose a systematic procedure for estimating ground truth for noisy images that can be used to benchmark denoising performance for smartphone cameras. Using this procedure, we have captured a dataset -the Smartphone Image Denoising Dataset (SIDD) -of ~30,000 noisy images from 10 scenes under different lighting conditions using five representative smartphone cameras and generated their ground truth images. We used this dataset to benchmark a number of denoising algorithms. We show that CNN-based methods perform better when trained on our high-quality dataset than when trained using alternative strategies, such as low-ISO images used as a proxy for ground truth data.
translated by 谷歌翻译
在本文中,我们使第一个基准测试精力阐述在低光增强中使用原始图像的优越性,并开发一种以更灵活和实用的方式利用原始图像的新颖替代路线。通过对典型图像处理管道进行充分考虑的启发,我们受到启发,开发了一种新的评估框架,分解增强模型(FEM),它将原始图像的属性分解成可测量的因素,并提供了探索原始图像属性的工具凭经验影响增强性能。经验基金基准结果表明,在元数据中记录的数据和曝光时间的线性起作用最关键的作用,这在将SRGB图像作为输入中的方法采取各种措施中提出了不同的性能增益。通过从基准测试结果中获得的洞察力,开发了一种原始曝光增强网络(REENET),这在实际应用中的实际应用中的优缺点与仅在原始图像中的原始应用中的优点和可接近之间的权衡培训阶段。 Reenet将SRGB图像投影到线性原域中,以应用相应的原始图像的约束,以减少建模培训的难度。之后,在测试阶段,我们的reenet不依赖于原始图像。实验结果不仅展示了Reenet到最先进的SRGB的方法以及原始指导和所有组件的有效性。
translated by 谷歌翻译
Lensless cameras are a class of imaging devices that shrink the physical dimensions to the very close vicinity of the image sensor by replacing conventional compound lenses with integrated flat optics and computational algorithms. Here we report a diffractive lensless camera with spatially-coded Voronoi-Fresnel phase to achieve superior image quality. We propose a design principle of maximizing the acquired information in optics to facilitate the computational reconstruction. By introducing an easy-to-optimize Fourier domain metric, Modulation Transfer Function volume (MTFv), which is related to the Strehl ratio, we devise an optimization framework to guide the optimization of the diffractive optical element. The resulting Voronoi-Fresnel phase features an irregular array of quasi-Centroidal Voronoi cells containing a base first-order Fresnel phase function. We demonstrate and verify the imaging performance for photography applications with a prototype Voronoi-Fresnel lensless camera on a 1.6-megapixel image sensor in various illumination conditions. Results show that the proposed design outperforms existing lensless cameras, and could benefit the development of compact imaging systems that work in extreme physical conditions.
translated by 谷歌翻译
Fruit is a key crop in worldwide agriculture feeding millions of people. The standard supply chain of fruit products involves quality checks to guarantee freshness, taste, and, most of all, safety. An important factor that determines fruit quality is its stage of ripening. This is usually manually classified by experts in the field, which makes it a labor-intensive and error-prone process. Thus, there is an arising need for automation in the process of fruit ripeness classification. Many automatic methods have been proposed that employ a variety of feature descriptors for the food item to be graded. Machine learning and deep learning techniques dominate the top-performing methods. Furthermore, deep learning can operate on raw data and thus relieve the users from having to compute complex engineered features, which are often crop-specific. In this survey, we review the latest methods proposed in the literature to automatize fruit ripeness classification, highlighting the most common feature descriptors they operate on.
translated by 谷歌翻译
虽然虚拟生产系统中使用的LED面板可以显示出宽阔的颜色范围的充满活力的图像,但由于狭窄带红色,绿色和蓝色LED的峰值光谱输出,它们在用作照明时会产生有问题的颜色转移。在这项工作中,我们为虚拟生产阶段提供了改进的颜色校准过程,可改善此颜色演绎问题,同时还通过准确的相机内背景颜色。我们通过优化1)在相机视野中可见的LED面板像素来完成此操作,2)相机视野外的像素照亮了对象,并作为后处理,3)相机记录的像素值。结果是,在RGB LED面板虚拟生产阶段拍摄的镜头可以表现出更准确的肤色和服装颜色,同时仍然重现相机内背景的所需颜色。
translated by 谷歌翻译
光学成像通常用于行业和学术界的科学和技术应用。在图像传感中,通过数字化图像的计算分析来执行一个测量,例如对象的位置。新兴的图像感应范例通过设计光学组件来执行不进行成像而是编码,从而打破了数据收集和分析之间的描述。通过将图像光学地编码为适合有效分析后的压缩,低维的潜在空间,这些图像传感器可以以更少的像素和更少的光子来工作,从而可以允许更高的直通量,较低的延迟操作。光学神经网络(ONNS)提供了一个平台,用于处理模拟,光学域中的数据。然而,基于ONN的传感器仅限于线性处理,但是非线性是深度的先决条件,而多层NNS在许多任务上的表现都大大优于浅色。在这里,我们使用商业图像增强器作为平行光电子,光学到光学非线性激活函数,实现用于图像传感的多层预处理器。我们证明,非线性ONN前处理器可以达到高达800:1的压缩率,同时仍然可以在几个代表性的计算机视觉任务中高精度,包括机器视觉基准测试,流程度图像分类以及对对象中对象的识别,场景。在所有情况下,我们都会发现ONN的非线性和深度使其能够胜过纯线性ONN编码器。尽管我们的实验专门用于ONN传感器的光线图像,但替代ONN平台应促进一系列ONN传感器。这些ONN传感器可能通过在空间,时间和/或光谱尺寸中预处处理的光学信息来超越常规传感器,并可能具有相干和量子质量,所有这些都在光学域中。
translated by 谷歌翻译
Spatially varying spectral modulation can be implemented using a liquid crystal spatial light modulator (SLM) since it provides an array of liquid crystal cells, each of which can be purposed to act as a programmable spectral filter array. However, such an optical setup suffers from strong optical aberrations due to the unintended phase modulation, precluding spectral modulation at high spatial resolutions. In this work, we propose a novel computational approach for the practical implementation of phase SLMs for implementing spatially varying spectral filters. We provide a careful and systematic analysis of the aberrations arising out of phase SLMs for the purposes of spatially varying spectral modulation. The analysis naturally leads us to a set of "good patterns" that minimize the optical aberrations. We then train a deep network that overcomes any residual aberrations, thereby achieving ideal spectral modulation at high spatial resolution. We show a number of unique operating points with our prototype including dynamic spectral filtering, material classification, and single- and multi-image hyperspectral imaging.
translated by 谷歌翻译
本文提出了一种新型电镀摄像机的校准算法,尤其是多焦距配置,其中使用了几种类型的微透镜,仅使用原始图像。电流校准方法依赖于简化投影模型,使用重建图像的功能,或者需要每种类型的微透镜进行分离的校准。在多聚焦配置中,根据微透镜焦距,场景的相同部分将展示不同量的模糊。通常,使用具有最小模糊量的微图像。为了利用所有可用的数据,我们建议在新推出的模糊的模糊(BAP)功能的帮助下,在新的相机模型中明确地模拟Defocus模糊。首先,它用于检索初始相机参数的预校准步骤,而第二步骤,以表达在我们的单个优化过程中最小化的新成本函数。第三,利用它来校准微图像之间的相对模糊。它将几何模糊,即模糊圈链接到物理模糊,即点传播函数。最后,我们使用产生的模糊概况来表征相机的景深。实际数据对受控环境的定量评估展示了我们校准的有效性。
translated by 谷歌翻译
视觉的触觉传感器由于经济实惠的高分辨率摄像机和成功的计算机视觉技术而被出现为机器人触摸的有希望的方法。但是,它们的物理设计和他们提供的信息尚不符合真实应用的要求。我们提供了一种名为Insight的强大,柔软,低成本,视觉拇指大小的3D触觉传感器:它不断在其整个圆锥形感测表面上提供定向力分布图。围绕内部单眼相机构造,传感器仅在刚性框架上仅成型一层弹性体,以保证灵敏度,鲁棒性和软接触。此外,Insight是第一个使用准直器将光度立体声和结构光混合的系统来检测其易于更换柔性外壳的3D变形。通过将图像映射到3D接触力的空间分布(正常和剪切)的深神经网络推断力信息。洞察力在0.4毫米的总空间分辨率,力量幅度精度约为0.03 n,并且对于具有不同接触面积的多个不同触点,在0.03-2 n的范围内的5度大约5度的力方向精度。呈现的硬件和软件设计概念可以转移到各种机器人部件。
translated by 谷歌翻译
我们提出了高动态范围辐射(HDR)字段,HDR-PLENOXELS,它学习了3D HDR辐射场的肺化功能,几何信息和2D低动态范围(LDR)图像中固有的不同摄像机设置。我们基于体素的卷渲染管道可重建HDR辐射字段,仅以端到端的方式从不同的相机设置中拍摄的多视图LDR图像,并且具有快速的收敛速度。为了在现实世界中处理各种摄像机,我们引入了一个音调映射模块,该模块模拟了数字相机内成像管道(ISP)(ISP)和DISTANGLES辐射测定设置。我们的音调映射模块可以通过控制每个新型视图的辐射设置来渲染。最后,我们构建一个具有不同摄像机条件的多视图数据集,适合我们的问题设置。我们的实验表明,HDR-Plenoxels可以从具有各种相机的LDR图像中表达细节和高质量的HDR新型视图。
translated by 谷歌翻译
Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $\rho$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $\rho$-Vision that eliminates the ISP are the potential reductions in computations and processing times.
translated by 谷歌翻译
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .
translated by 谷歌翻译
在本文中,我们在短PCCC中呈现点云颜色恒定,这是利用点云的照明色度估计算法。我们利用飞行时间(TOF)传感器捕获的深度信息与RGB传感器刚性安装,并形成一个6D云,其中每个点包含坐标和RGB强度,指出为(x,y,z,r,g,b)。PCCC将注意力架构应用于色彩恒定问题,导出照明矢量点明智,然后制定关于全局照明色度的全局决定。在两个流行的RGB-D数据集上,我们使用照明信息以及新颖的基准延伸,PCCC比最先进的算法获得更低的错误。我们的方法简单且快速,仅需要16 * 16尺寸的输入和超过500 FPS的速度,包括建立点云和净推理的成本。
translated by 谷歌翻译
成功培训端到端的深网进行真实运动去缩合,需要尖锐/模糊的图像对数据集,这些数据集现实且多样化,足以实现概括以实现真实的图像。获得此类数据集仍然是一项具有挑战性的任务。在本文中,我们首先回顾了现有的Deblurring基准数据集的局限性,从泛化到野外模糊图像的角度。其次,我们提出了一种有效的程序方法,以基于一个简单而有效的图像形成模型来生成清晰/模糊的图像对。这允许生成几乎无限的现实和多样化的培训对。我们通过在模拟对上训练现有的DeBlurring架构,并在四个真实模糊图像的标准数据集中对其进行评估,从而证明了所提出的数据集的有效性。我们观察到使用建议方法训练时动态场景的真实运动毛线照片的最终任务的出色概括性能。
translated by 谷歌翻译
Mapping the seafloor with underwater imaging cameras is of significant importance for various applications including marine engineering, geology, geomorphology, archaeology and biology. For shallow waters, among the underwater imaging challenges, caustics i.e., the complex physical phenomena resulting from the projection of light rays being refracted by the wavy surface, is likely the most crucial one. Caustics is the main factor during underwater imaging campaigns that massively degrade image quality and affect severely any 2D mosaicking or 3D reconstruction of the seabed. In this work, we propose a novel method for correcting the radiometric effects of caustics on shallow underwater imagery. Contrary to the state-of-the-art, the developed method can handle seabed and riverbed of any anaglyph, correcting the images using real pixel information, thus, improving image matching and 3D reconstruction processes. In particular, the developed method employs deep learning architectures in order to classify image pixels to "non-caustics" and "caustics". Then, exploits the 3D geometry of the scene to achieve a pixel-wise correction, by transferring appropriate color values between the overlapping underwater images. Moreover, to fill the current gap, we have collected, annotated and structured a real-world caustic dataset, namely R-CAUSTIC, which is openly available. Overall, based on the experimental results and validation the developed methodology is quite promising in both detecting caustics and reconstructing their intensity.
translated by 谷歌翻译
同时定位和映射(SLAM)对于自主机器人(例如自动驾驶汽车,自动无人机),3D映射系统和AR/VR应用至关重要。这项工作提出了一个新颖的LIDAR惯性 - 视觉融合框架,称为R $^3 $ LIVE ++,以实现强大而准确的状态估计,同时可以随时重建光线体图。 R $^3 $ LIVE ++由LIDAR惯性探针(LIO)和视觉惯性探测器(VIO)组成,均为实时运行。 LIO子系统利用从激光雷达的测量值重建几何结构(即3D点的位置),而VIO子系统同时从输入图像中同时恢复了几何结构的辐射信息。 r $^3 $ live ++是基于r $^3 $ live开发的,并通过考虑相机光度校准(例如,非线性响应功能和镜头渐滴)和相机的在线估计,进一步提高了本地化和映射的准确性和映射接触时间。我们对公共和私人数据集进行了更广泛的实验,以将我们提出的系统与其他最先进的SLAM系统进行比较。定量和定性结果表明,我们所提出的系统在准确性和鲁棒性方面对其他系统具有显着改善。此外,为了证明我们的工作的可扩展性,{我们基于重建的辐射图开发了多个应用程序,例如高动态范围(HDR)成像,虚拟环境探索和3D视频游戏。}最后,分享我们的发现和我们的发现和为社区做出贡献,我们在GitHub上公开提供代码,硬件设计和数据集:github.com/hku-mars/r3live
translated by 谷歌翻译
图像取证中的一项常见任务是检测剪接图像,其中多个源图像组成一个输出图像。大多数当前最佳性能的剪接探测器都利用高频伪像。但是,在图像受到强大的压缩后,大多数高频伪像不再可用。在这项工作中,我们探索了一种剪接检测的替代方法,该方法可能更适合于野外图像,但要受到强烈的压缩和下采样的影响。我们的建议是建模图像的颜色形成。颜色的形成很大程度上取决于场景对象的规模的变化,因此依赖于高频伪像。我们学到了一个深度度量空间,一方面对照明颜色和摄像机的白点估计敏感,但另一方面对物体颜色的变化不敏感。嵌入空间中的大距离表明两个图像区域源于不同的场景或不同的相机。在我们的评估中,我们表明,所提出的嵌入空间的表现优于受到强烈压缩和下采样的图像的最新状态。我们在另外两个实验中确认了度量空间的双重性质,即既表征采集摄像头和场景发光颜色。因此,这项工作属于基于物理和统计取证的交集,双方都受益。
translated by 谷歌翻译