Karhunen-Lo \`eve变换(KLT)通常用于数据去相关性和维度减少。由于其计算取决于输入信号的协方差矩阵,因此通过开发快速算法实现它的难度来严重限制KLT在实时应用中的使用严重限制。在这种情况下,本文提出了一种新的低复杂性变换,通过应用圆形函数来获得KLT矩阵的元素来获得的。评估所提出的变换,考虑到测量所提出的近似与精确KLT的编码功率和距离的优点,并且还在图像压缩实验中探讨。引入了提出的近似变换的快速算法。结果表明,所提出的变换在图像压缩中表现良好,需要低实现成本。
translated by 谷歌翻译
在本文中,提出了基于陈的分解的两个8点无乘法DCT近似值,并得出了它们的快速算法。通过计算成本,误差能量和编码增益来评估这两种转换。进行具有JPEG样图像压缩方案的实验,并将结果与竞争方法进行比较。根据JRIDI-Alfalou-Meher算法将提出的低复杂性变换缩放,以实现16分和32分的近似值。新的转换集嵌入了HEVC参考软件中,以提供完全符合HEVC的视频编码方案。我们表明,近似转换可以以非常低的复杂性成本胜过传统变换和最先进的方法。
translated by 谷歌翻译
本文基于Loeffler离散余弦变换(DCT)算法引入了矩阵参数化方法。结果,提出了一类新的八点DCT近似值,能够统一文献中几个八点DCT近似的数学形式主义。帕累托效率的DCT近似是通过多准则优化获得的,其中考虑了计算复杂性,接近性和编码性能。有效的近似及其缩放的16和32点版本嵌入了图像和视频编码器中,包括类似JPEG的编解码器以及H.264/AVC和H.265/HEVC标准。将结果与未修饰的标准编解码器进行比较。在Xilinx VLX240T FPGA上映射并实现了有效的近似值,并评估了面积,速度和功耗。
translated by 谷歌翻译
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. 1
translated by 谷歌翻译
在本文中,我们提出了一种方法,以最大程度地减少训练有素的卷积神经网络(Convnet)的计算复杂性。这个想法是要近似给定的Convnet的所有元素,并替换原始的卷积过滤器和参数(汇总和偏置系数;以及激活函数),并有效地近似计算复杂性。低复杂性卷积过滤器是通过基于Frobenius Norm的二进制(零)线性编程方案获得的,该方案在一组二元理性的集合上获得。最终的矩阵允许无乘法计算,仅需要添加和位移动操作。这样的低复杂性结构为低功率,高效的硬件设计铺平了道路。我们将方法应用于三种不同复杂性的用例中:(i)“轻”但有效的转换供面部检测(约有1000个参数); (ii)另一个用于手写数字分类的(超过180000个参数); (iii)一个明显更大的Convnet:Alexnet,$ \ $ \ $ 120万美元。我们评估了不同近似级别的各个任务的总体绩效。在所有考虑的应用中,都得出了非常低的复杂性近似值,以保持几乎相等的分类性能。
translated by 谷歌翻译
We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2-D image fragments (e.g., blocks) into 3-D data arrays which we call "groups." Collaborative filtering is a special procedure developed to deal with these 3-D groups. We realize it using the three successive steps: 3-D transformation of a group, shrinkage of the transform spectrum, and inverse 3-D transformation. The result is a 3-D estimate that consists of the jointly filtered grouped image blocks. By attenuating the noise, the collaborative filtering reveals even the finest details shared by grouped blocks and, at the same time, it preserves the essential unique features of each individual block. The filtered blocks are then returned to their original positions. Because these blocks are overlapping, for each pixel, we obtain many different estimates which need to be combined. Aggregation is a particular averaging procedure which is exploited to take advantage of this redundancy. A significant improvement is obtained by a specially developed collaborative Wiener filtering. An algorithm based on this novel denoising strategy and its efficient implementation are presented in full detail; an extension to color-image denoising is also developed. The experimental results demonstrate that this computationally scalable algorithm achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.
translated by 谷歌翻译
We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
translated by 谷歌翻译
The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.
translated by 谷歌翻译
在本文中,我们通过变换量化压缩卷积神经网络(CNN)权重。以前的CNN量化技术倾向于忽略权重和激活的联合统计,以给定的量化比特率产生次优CNN性能,或者在训练期间考虑其关节统计,并且不促进已经训练的CNN模型的有效压缩。我们最佳地转换(去相关)并使用速率失真框架来量化训练后的权重,以改善任何给定的量化比特率的压缩。变换量化在单个框架中统一量化和维度减少(去相关性)技术,以促进CNN的低比特率压缩和变换域中的有效推断。我们首先介绍CNN量化的速率和失真理论,并将最佳量化呈现为速率失真优化问题。然后,我们表明,通过在本文中获得的最佳端到端学习变换(ELT),可以使用最佳位深度分配来解决此问题。实验表明,变换量化在雷则和非烫伤量化方案中推进了CNN压缩中的技术状态。特别是,我们发现使用再培训的转换量化能够压缩CNN模型,例如AlexNet,Reset和DenSenet,以非常低的比特率(1-2比特)。
translated by 谷歌翻译
In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data.
translated by 谷歌翻译
作为常用的图像压缩格式,JPEG已广泛应用于图像的传输和存储。为了进一步降低压缩成本,同时保持JPEG图像的质量,已提出无损的转码技术来重新压缩DCT域中的压缩JPEG图像。另一方面,以前的工作通常会降低DCT系数的冗余性,并以手工制作的方式优化熵编码的概率预测,缺乏概括能力和灵活性。为了应对上述挑战,我们提出了通过关节损失和残留压缩的学习的无损JPEG转码框架。我们没有直接优化熵估计,而是关注DCT系数中存在的冗余。据我们所知,我们是第一个利用学习的端到端损失变换编码来减少紧凑型代表域中DCT系数的冗余的人。我们还引入了无损转编码的残留压缩,在使用基于上下文的熵编码对其进行压缩之前,它会自适应地学习残留DCT系数的分布。我们提出的转码结构在JPEG图像的压缩中表现出显着的优势,这要归功于学习的损失变换编码和残留熵编码的协作。在多个数据集上进行的广泛实验表明,根据JPEG压缩,我们提出的框架平均可以节省约21.49%的位,这表现优于典型的无损失转码框架JPEG-XL的jpeg-XL 3.51%。
translated by 谷歌翻译
最近,引入了卷积自动编码器(CAE)进行图像编码。他们对最新的JPEG2000方法实现了性能改进。但是,这些表演是使用具有大量参数的大型CAE获得的,并且其训练需要大量的计算能力。\\在本文中,我们使用具有较小的内存足迹和低计算功率使用的CAE解决了有损图像压缩的问题。为了克服计算成本问题,大多数文献都使用拉格朗日近端正则化方法,这些方法很耗时。\\在这项工作中,我们提出了一种约束的方法和一种新的结构化稀疏学习方法。我们设计了一个算法并在三个约束上进行测试:经典$ \ ell_1 $约束,$ \ ell_ {1,\ infty} $和新的$ \ ell_ {1,1} $约束。实验结果表明,$ \ ell_ {1,1} $约束提供了最佳的结构性稀疏性,从而导致内存和计算成本的高度降低,并且与密集网络相似的速率延伸性能。
translated by 谷歌翻译
我们提出了一种新的图像缩放方法,既用于缩小和放大尺寸,都以任何比例因子或所需的大小运行。调整大小的图像是通过对全球范围内的双变量多项式进行采样来实现的。该方法的特殊性在于我们使用的采样模型和插值多项式。我们考虑了基于第一类Chebyshev零的不寻常的采样系统,而不是经典的统一网格。这种节点的最佳分布允许考虑由de la vall \'ee poussin型过滤器定义的接近最佳的插值多项​​式。该过滤器的动作射线提供了一个附加参数,可以适当调节以改善近似值。该方法已在大量不同的图像数据集上进行了测试。结果以定性和定量术语进行评估,并与其他可用竞争方法进行比较。所得缩放图像的感知质量使得保留了重要的细节,并且伪像的外观很低。竞争性质量测量值,良好的视觉质量,有限的计算工作和中等记忆需求使该方法适合现实世界应用。
translated by 谷歌翻译
Recently, many neural network-based image compression methods have shown promising results superior to the existing tool-based conventional codecs. However, most of them are often trained as separate models for different target bit rates, thus increasing the model complexity. Therefore, several studies have been conducted for learned compression that supports variable rates with single models, but they require additional network modules, layers, or inputs that often lead to complexity overhead, or do not provide sufficient coding efficiency. In this paper, we firstly propose a selective compression method that partially encodes the latent representations in a fully generalized manner for deep learning-based variable-rate image compression. The proposed method adaptively determines essential representation elements for compression of different target quality levels. For this, we first generate a 3D importance map as the nature of input content to represent the underlying importance of the representation elements. The 3D importance map is then adjusted for different target quality levels using importance adjustment curves. The adjusted 3D importance map is finally converted into a 3D binary mask to determine the essential representation elements for compression. The proposed method can be easily integrated with the existing compression models with a negligible amount of overhead increase. Our method can also enable continuously variable-rate compression via simple interpolation of the importance adjustment curves among different quality levels. The extensive experimental results show that the proposed method can achieve comparable compression efficiency as those of the separately trained reference compression models and can reduce decoding time owing to the selective compression. The sample codes are publicly available at https://github.com/JooyoungLeeETRI/SCR.
translated by 谷歌翻译
视觉信号压缩是一个长期存在的问题。通过深度学习的最近进步,令人兴奋的进展已经推动。尽管压缩性能更好,但现有的端到端压缩算法仍然以速率失真优化而设计更好的信号质量。在本文中,我们表明,网络架构的设计和优化可以进一步改善压缩机器视觉。我们为机器视觉的端到端压缩的编码器提出了一种反转的瓶颈结构,这特别考虑了语义信息的有效表示。此外,我们通过将分析精度纳入优化过程来追求优化的能力,并且通过以迭代方式进一步探索具有广义速率准确优化的最优性。我们使用对象检测作为展示用于机器视觉的端到端压缩,并且广泛的实验表明,该方案在分析性能方面实现了显着的BD速率。此外,由于信号电平重建,还对其他机器视觉任务的强大泛化能力表明了该方案的承诺。
translated by 谷歌翻译
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
translated by 谷歌翻译
Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. The project page is at https://github.com/ZhengxueCheng/ Learned-Image-Compression-with-GMM-and-Attention.
translated by 谷歌翻译
在本文中,我们介绍了来自离散傅里叶变换(DFT)的一位或两位噪声观察的信号重建算法的两个变化。 DFT的单位观察对应于其实体部分的符号,而DFT的两位观察对应于DFT的真实和虚部的迹象。我们专注于图像进行分析和仿真,从而使用2D-DFT的标志。这类信号的选择是通过之前的作品启发的。对于我们的算法,我们表明,信号重建中的预期均方误差(MSE)与采样率的倒数渐近成比例。样品受到已知分布的添加零平均噪声的影响。我们通过设计使用收缩映射的算法来解决该信号估计问题,基于Banach TEXT点定理。提供了具有四个基准图像的数值测试以显示算法的有效性。采用PSNR,SSIM,ESSIM和MS-SSIM等图像重建质量评估的各种度量标准。在所有四个基准图像上,我们的算法通过显着的边距来满足所有这些指标中的最先进。
translated by 谷歌翻译
新兴和现有的灯场显示器非常能够在无自动镜玻璃平台上对3D场景进行现实呈现。在利用3D显示和流式传输目的的同时,光场大小是主要缺点。当光场具有高动态范围时,大小会大大增加。在本文中,我们为高动态范围光场提出了一种新型的压缩算法,该算法具有感知的无损压缩。该算法通过将其解释为四维体积来利用HDR光场的间和内部视图相关性。 HDR光场压缩基于一种新型的4DDCT-UCS(4D-DCT均匀颜色空间)算法。 HEVC通过HEVC获取的4DDCT-UCS获取图像的其他编码消除了HDR光场数据中的框内,框架间和内在冗余。与JPEG-XL和HDR视频编码算法等最新编码器的比较表现出对现实世界光场提出的方案的卓越压缩性能。
translated by 谷歌翻译
这项工作审查了旨在在通信约束下运行的自适应分布式学习策略。我们考虑一个代理网络,必须从持续观察流数据来解决在线优化问题。代理商实施了分布式合作策略,其中允许每个代理商与其邻居执行本地信息交换。为了应对通信约束,必须不可避免地压缩交换信息。我们提出了一种扩散策略,昵称为ACTC(适应 - 压缩 - 然后组合),其依赖于以下步骤:i)每个代理执行具有恒定步长大小的单独随机梯度更新的适应步骤; ii)一种压缩步骤,它利用最近引入的随机压缩操作员;和III)每个代理组合从其邻居接收的压缩更新的组合步骤。这项工作的区别要素如下。首先,我们专注于自适应策略,其中常数(而不是递减)阶梯大小对于实时响应非间断变化至关重要。其次,我们考虑一般的指导图表和左随机组合政策,使我们能够增强拓扑和学习之间的相互作用。第三,与对所有个人代理的成本职能承担强大的凸起的相关作品相比,我们只需要在网络水平的强大凸起,即使单个代理具有强凸的成本,剩余的代理商也不满足凸起成本。第四,我们专注于扩散(而不是共识)战略。在压缩信息的苛刻设置下,建立ACTC迭代在所需的优化器周围波动,在相邻代理之间交换的比特方面取得了显着的节省。
translated by 谷歌翻译