智能论文笔记

DCT Approximations Based on Chen's Factorization

C. J. Tablada , T. L. T. da Silveira , R. J. Cintra , F. M. Bayer

分类：计算机视觉

2022-07-24

在本文中，提出了基于陈的分解的两个8点无乘法DCT近似值，并得出了它们的快速算法。通过计算成本，误差能量和编码增益来评估这两种转换。进行具有JPEG样图像压缩方案的实验，并将结果与竞争方法进行比较。根据JRIDI-Alfalou-Meher算法将提出的低复杂性变换缩放，以实现16分和32分的近似值。新的转换集嵌入了HEVC参考软件中，以提供完全符合HEVC的视频编码方案。我们表明，近似转换可以以非常低的复杂性成本胜过传统变换和最先进的方法。

translated by 谷歌翻译

Low-Complexity Loeffler DCT Approximations for Image and Video Coding

D. F. G. Coelho , R. J. Cintra , F. M. Bayer , S. Kulasekera , A. Madanayake , P. A. C. Martinez , T. L. T. Silveira , R. S. Oliveira , V. S. Dimitrov

分类：计算机视觉

2022-07-29

本文基于Loeffler离散余弦变换（DCT）算法引入了矩阵参数化方法。结果，提出了一类新的八点DCT近似值，能够统一文献中几个八点DCT近似的数学形式主义。帕累托效率的DCT近似是通过多准则优化获得的，其中考虑了计算复杂性，接近性和编码性能。有效的近似及其缩放的16和32点版本嵌入了图像和视频编码器中，包括类似JPEG的编解码器以及H.264/AVC和H.265/HEVC标准。将结果与未修饰的标准编解码器进行比较。在Xilinx VLX240T FPGA上映射并实现了有效的近似值，并评估了面积，速度和功耗。

translated by 谷歌翻译

Low-complexity Rounded KLT Approximation for Image Compression

A. P. Radünz , F. M. Bayer , R. J. Cintra

分类：计算机视觉

2021-11-28

Karhunen-Lo \`eve变换（KLT）通常用于数据去相关性和维度减少。由于其计算取决于输入信号的协方差矩阵，因此通过开发快速算法实现它的难度来严重限制KLT在实时应用中的使用严重限制。在这种情况下，本文提出了一种新的低复杂性变换，通过应用圆形函数来获得KLT矩阵的元素来获得的。评估所提出的变换，考虑到测量所提出的近似与精确KLT的编码功率和距离的优点，并且还在图像压缩实验中探讨。引入了提出的近似变换的快速算法。结果表明，所提出的变换在图像压缩中表现良好，需要低实现成本。

translated by 谷歌翻译

Low-complexity Approximate Convolutional Neural Networks

R. J. Cintra , S. Duffner , C. Garcia , A. Leite

分类：机器学习 | 计算机视觉

2022-07-29

在本文中，我们提出了一种方法，以最大程度地减少训练有素的卷积神经网络（Convnet）的计算复杂性。这个想法是要近似给定的Convnet的所有元素，并替换原始的卷积过滤器和参数（汇总和偏置系数；以及激活函数），并有效地近似计算复杂性。低复杂性卷积过滤器是通过基于Frobenius Norm的二进制（零）线性编程方案获得的，该方案在一组二元理性的集合上获得。最终的矩阵允许无乘法计算，仅需要添加和位移动操作。这样的低复杂性结构为低功率，高效的硬件设计铺平了道路。我们将方法应用于三种不同复杂性的用例中：（i）“轻”但有效的转换供面部检测（约有1000个参数）；（ii）另一个用于手写数字分类的（超过180000个参数）；（iii）一个明显更大的Convnet：Alexnet，$ \ $ \ $ 120万美元。我们评估了不同近似级别的各个任务的总体绩效。在所有考虑的应用中，都得出了非常低的复杂性近似值，以保持几乎相等的分类性能。

translated by 谷歌翻译

Image scaling by de la Vallée-Poussin filtered interpolation

Donatella Occorsio , Giuliana Ramella , Woula Themistoclakis

分类：计算机视觉

2021-09-28

我们提出了一种新的图像缩放方法，既用于缩小和放大尺寸，都以任何比例因子或所需的大小运行。调整大小的图像是通过对全球范围内的双变量多项式进行采样来实现的。该方法的特殊性在于我们使用的采样模型和插值多项式。我们考虑了基于第一类Chebyshev零的不寻常的采样系统，而不是经典的统一网格。这种节点的最佳分布允许考虑由de la vall \'ee poussin型过滤器定义的接近最佳的插值多项式。该过滤器的动作射线提供了一个附加参数，可以适当调节以改善近似值。该方法已在大量不同的图像数据集上进行了测试。结果以定性和定量术语进行评估，并与其他可用竞争方法进行比较。所得缩放图像的感知质量使得保留了重要的细节，并且伪像的外观很低。竞争性质量测量值，良好的视觉质量，有限的计算工作和中等记忆需求使该方法适合现实世界应用。

translated by 谷歌翻译

Variational image compression with a scale hyperprior

Johannes Ballé , David Minnen , Saurabh Singh , Sung Jin Hwang , Nick Johnston

分类：

2018-02-01

We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

translated by 谷歌翻译

Image denoising by sparse 3-D transform-domain collaborative filtering

分类：

We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2-D image fragments (e.g., blocks) into 3-D data arrays which we call "groups." Collaborative filtering is a special procedure developed to deal with these 3-D groups. We realize it using the three successive steps: 3-D transformation of a group, shrinkage of the transform spectrum, and inverse 3-D transformation. The result is a 3-D estimate that consists of the jointly filtered grouped image blocks. By attenuating the noise, the collaborative filtering reveals even the finest details shared by grouped blocks and, at the same time, it preserves the essential unique features of each individual block. The filtered blocks are then returned to their original positions. Because these blocks are overlapping, for each pixel, we obtain many different estimates which need to be combined. Aggregation is a particular averaging procedure which is exploited to take advantage of this redundancy. A significant improvement is obtained by a specially developed collaborative Wiener filtering. An algorithm based on this novel denoising strategy and its efficient implementation are presented in full detail; an extension to color-image denoising is also developed. The experimental results demonstrate that this computationally scalable algorithm achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.

translated by 谷歌翻译

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Halko , Per-Gunnar Martinsson , Joel A. Tropp

分类：

2009-09-22

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

translated by 谷歌翻译

Transform Quantization for CNN (Convolutional Neural Network) Compression

Sean I. Young , Wang Zhe , David Taubman , Bernd Girod

分类：计算机视觉 | 机器学习

2020-09-02

在本文中，我们通过变换量化压缩卷积神经网络（CNN）权重。以前的CNN量化技术倾向于忽略权重和激活的联合统计，以给定的量化比特率产生次优CNN性能，或者在训练期间考虑其关节统计，并且不促进已经训练的CNN模型的有效压缩。我们最佳地转换（去相关）并使用速率失真框架来量化训练后的权重，以改善任何给定的量化比特率的压缩。变换量化在单个框架中统一量化和维度减少（去相关性）技术，以促进CNN的低比特率压缩和变换域中的有效推断。我们首先介绍CNN量化的速率和失真理论，并将最佳量化呈现为速率失真优化问题。然后，我们表明，通过在本文中获得的最佳端到端学习变换（ELT），可以使用最佳位深度分配来解决此问题。实验表明，变换量化在雷则和非烫伤量化方案中推进了CNN压缩中的技术状态。特别是，我们发现使用再培训的转换量化能够压缩CNN模型，例如AlexNet，Reset和DenSenet，以非常低的比特率（1-2比特）。

translated by 谷歌翻译

Image quality assessment: from error visibility to structural similarity

分类：

Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. 1

translated by 谷歌翻译

End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

M. Akın Yılmaz , A. Murat Tekalp

分类：计算机视觉

2021-12-17

传统的视频压缩（VC）方法基于运动补偿变换编码，并且由于端到端优化问题的组合性质，运动估计，模式和量化参数选择的步骤和熵编码是单独优化的。学习VC允许同时对端到端速率失真（R-D）优化非线性变换，运动和熵模型的优化训练。大多数工作都在学习VC基于R-D损耗对连续帧的对考虑连续视频编解码器的端到端优化。它在传统的VC中众所周知的是，双向编码优于顺序压缩，因为它能够使用过去和未来的参考帧。本文提出了一种学习的分层双向视频编解码器（LHBDC），其结合了分层运动补偿预测和端到端优化的益处。实验结果表明，我们达到了迄今为止在PSNR和MS-SSIM中的学习VC方案报告的最佳R-D结果。与传统的视频编解码器相比，我们的端到端优化编解码器的RD性能优于PSNR和MS-SSIM中的X265和SVT-HEVC编码器（“非常流”预设）以及MS-中的HM 16.23参考软件。 SSIM。我们提出了由于所提出的新颖工具，例如学习屏蔽，流场附带和时间流量矢量预测等新颖工具，展示了表现出性能提升。重现我们结果的模型和说明可以在https://github.com/makinyilmaz/lhbdc/中找到

translated by 谷歌翻译

"Sparse + Low-Rank'' Tensor Completion Approach for Recovering Images and Videos

Chenjian Pan , Chen Ling , Hongjin He , Liqun Qi , Yanwei Xu

分类：计算机视觉

2021-10-18

从高度不足的数据中恢复颜色图像和视频是面部识别和计算机视觉中的一项基本且具有挑战性的任务。通过颜色图像和视频的多维性质，在本文中，我们提出了一种新颖的张量完成方法，该方法能够有效探索离散余弦变换（DCT）下张量数据的稀疏性。具体而言，我们介绍了两个``稀疏 +低升级''张量完成模型，以及两种可实现的算法来找到其解决方案。第一个是基于DCT的稀疏加权核标准诱导低级最小化模型。第二个是基于DCT的稀疏加上$ P $换图映射引起的低秩优化模型。此外，我们因此提出了两种可实施的增强拉格朗日算法，以解决基础优化模型。一系列数值实验在内，包括颜色图像介入和视频数据恢复表明，我们所提出的方法的性能要比许多现有的最新张量完成方法更好，尤其是对于缺少数据比率较高的情况。

translated by 谷歌翻译

Image quality metrics: PSNR vs. SSIM

分类：

In this paper, we analyse two well-known objective image quality metrics, the peak-signal-to-noise ratio (PSNR) as well as the structural similarity index measure (SSIM), and we derive a simple mathematical relationship between them which works for various kinds of image degradations such as Gaussian blur, additive Gaussian white noise, jpeg and jpeg2000 compression. A series of tests realized on images extracted from the Kodak database gives a better understanding of the similarity and difference between the SSIM and the PSNR.

translated by 谷歌翻译

Learned Video Compression via Heterogeneous Deformable Compensation Network

Huairui Wang , Zhenzhong Chen , Chang Wen Chen

分类：计算机视觉

2022-07-11

学习的视频压缩最近成为开发高级视频压缩技术的重要研究主题，其中运动补偿被认为是最具挑战性的问题之一。在本文中，我们通过异质变形补偿策略（HDCVC）提出了一个学识渊博的视频压缩框架，以解决由单尺度可变形的特征域中单尺可变形核引起的不稳定压缩性能的问题。更具体地说，所提出的算法提取物从两个相邻框架中提取的算法提取物特征来估算估计内容自适应的异质变形（Hetdeform）内核偏移量，而不是利用光流或单尺内核变形对齐。然后，我们将参考特征转换为HetDeform卷积以完成运动补偿。此外，我们设计了一个空间 - 邻化的分裂归一化（SNCDN），以实现更有效的数据高斯化结合了广义分裂的归一化。此外，我们提出了一个多框架增强的重建模块，用于利用上下文和时间信息以提高质量。实验结果表明，HDCVC比最近最新学习的视频压缩方法取得了优越的性能。

translated by 谷歌翻译

CANF-VC: Conditional Augmented Normalizing Flows for Video Compression

Yung-Han Ho , Chih-Peng Chang , Peng-Yu Chen , Alessandro Gnutti , Wen-Hsiao Peng

分类：计算机视觉 | 机器学习

2022-07-12

本文基于条件增强归一化流（ANF），介绍了一种基于端到端的学习视频压缩系统，称为CANF-VC。大多数博学的视频压缩系统采用与传统编解码器相同的基于混合的编码体系结构。关于条件编码的最新研究表明，基于混合的编码的亚地区，并为深层生成模型打开了在创建新编码框架中发挥关键作用的机会。 CANF-VC代表了一种新的尝试，该尝试利用条件ANF学习有条件框架间编码的视频生成模型。我们之所以选择ANF，是因为它是一种特殊类型的生成模型，其中包括各种自动编码器作为一种特殊情况，并且能够获得更好的表现力。 CANF-VC还将条件编码的想法扩展到运动编码，形成纯粹的条件编码框架。对常用数据集的广泛实验结果证实了CANF-VC对最新方法的优越性。

translated by 谷歌翻译

Recovery of Future Data via Convolution Nuclear Norm Minimization

Guangcan Liu , Wayne Zhang

分类：机器学习 | 人工智能 | 计算机视觉

2019-09-06

本文从压缩感测的角度研究时间序列预测（TSF）的问题。首先，我们将TSF转换为具有任意采样（TCAS）的更加包容性问题，称为TCOR完成，该问题是从其条目的子集中以任意方式恢复张量。虽然已知在Tucker低级别的框架中，但理论上是不可能根据一些任意选择的条目识别目标张量，在这项工作中，我们将表明TCAS根据称为新概念的光明粘附卷积低秩，这是众所周知的傅立叶稀疏性的概括。然后我们介绍了一个凸面的卷积核规范最小化（CNNM），我们证明CNNM在求解TCA时，只要采样条件取决于目标张量的卷积等级 - 遵守。该理论为制作给定数量预测所需的最小采样大小提供了有意义的答案。单变量时间序列，图像和视频的实验显示令人鼓舞的结果。

translated by 谷歌翻译

Quaternion-based dynamic mode decomposition for background modeling in color videos

Juan Han , Kit Ian Kou , Jifei Miao

分类：计算机视觉

2021-12-28

场景背景初始化（SBI）是计算机愿景中的具有挑战性之一。动态模式分解（DMD）是最近提出的方法，以鲁布妥地将视频序列分解为背景模型和相应的前景部分。然而，该方法需要将彩色图像转换为用于处理的灰度图像，这导致忽略彩色图像的三个通道之间的耦合信息。在本研究中，我们提出了一种基于四元数的DMD（Q-DMD），其通过四元数矩阵分析扩展了DMD，以便完全保留彩色图像的固有色彩结构和彩色视频。我们利用四元数矩阵的标准特征值来计算其光谱分解并计算相应的Q-DMD模式和特征值。公开的基准数据集上的结果证明我们的Q-DMD优于确切的DMD方法，实验结果还表明，我们的方法的性能与最先进的模式相当。

translated by 谷歌翻译

Selective compression learning of latent representations for variable-rate image compression

Jooyoung Lee , Seyoon Jeong , Munchurl Kim

分类：人工智能 | 机器学习

2022-11-08

Recently, many neural network-based image compression methods have shown promising results superior to the existing tool-based conventional codecs. However, most of them are often trained as separate models for different target bit rates, thus increasing the model complexity. Therefore, several studies have been conducted for learned compression that supports variable rates with single models, but they require additional network modules, layers, or inputs that often lead to complexity overhead, or do not provide sufficient coding efficiency. In this paper, we firstly propose a selective compression method that partially encodes the latent representations in a fully generalized manner for deep learning-based variable-rate image compression. The proposed method adaptively determines essential representation elements for compression of different target quality levels. For this, we first generate a 3D importance map as the nature of input content to represent the underlying importance of the representation elements. The 3D importance map is then adjusted for different target quality levels using importance adjustment curves. The adjusted 3D importance map is finally converted into a 3D binary mask to determine the essential representation elements for compression. The proposed method can be easily integrated with the existing compression models with a negligible amount of overhead increase. Our method can also enable continuously variable-rate compression via simple interpolation of the importance adjustment curves among different quality levels. The extensive experimental results show that the proposed method can achieve comparable compression efficiency as those of the separately trained reference compression models and can reduce decoding time owing to the selective compression. The sample codes are publicly available at https://github.com/JooyoungLeeETRI/SCR.

translated by 谷歌翻译

An Introduction to Quantum Computing for Statisticians

Anna Lopatnikova , Minh-Ngoc Tran

分类：机器学习

2021-12-13

量子计算有可能彻底改变和改变我们的生活和理解世界的方式。该审查旨在提供对量子计算的可访问介绍，重点是统计和数据分析中的应用。我们从介绍了了解量子计算所需的基本概念以及量子和经典计算之间的差异。我们描述了用作量子算法的构建块的核心量子子程序。然后，我们审查了一系列预期的量子算法，以便在统计和机器学习中提供计算优势。我们突出了将量子计算应用于统计问题的挑战和机遇，并讨论潜在的未来研究方向。

translated by 谷歌翻译

A Survey of Orthogonal Moments for Image Representation: Theory, Implementation, and Evaluation

Shuren Qi , Yushu Zhang , Chao Wang , Jiantao Zhou , Xiaochun Cao

分类：计算机视觉

2021-03-27

图像表示是计算机视觉和模式识别中的一个重要主题。它在一系列应用中扮演了了解视觉内容的基本作用。据报道，基于矩的图像表示在满足其由于其有益的数学特性而满足语义描述的核心条件，特别是几何不变性和独立性。本文介绍了对图像表示的正交矩的全面调查，涵盖了快速/准确计算，鲁棒性/不变性优化，定义扩展和应用程序的最新进步。我们还为各种广泛使用的正交瞬间创建一个软件包，并在同一基地中评估此类方法。提出的理论分析，软件实施和评估结果可以支持社区，特别是在开发新颖的技术和促进现实世界的应用方面。

translated by 谷歌翻译