我们提出了一个端到端的可训练框架,通过仅通过查看其条目的一小部分来处理大规模的视觉数据张量。我们的方法将神经网络编码器与张振火车分解组合以学习低级潜在编码,耦合与交叉近似(CA)耦合,以通过原始样本的子集学习表示。 CA是一种自适应采样算法,它是原产的张量分解,并避免明确地使用全高分辨率数据。相反,它主动选择我们获取核心和按需获取的本地代表性样本。所需数量的样本仅使用输入的大小对数进行对数。我们网络中的张量的隐式表示,可以处理在其未压缩形式中不能以其他方式丢失的大网格。所提出的方法对于大规模的多维网格数据(例如,3D断层扫描)以及需要在大型接收领域(例如,预测整个器官的医学条件)的任务,特别适用于需要上下文的任务。代码可在https://github.com/aelphy/c-pic中获得。
translated by 谷歌翻译
A simple nonrecursive form of the tensor decomposition in d dimensions is presented. It does not inherently suffer from the curse of dimensionality, it has asymptotically the same number of parameters as the canonical decomposition, but it is stable and its computation is based on lowrank approximation of auxiliary unfolding matrices. The new form gives a clear and convenient way to implement all basic operations efficiently. A fast rounding procedure is presented, as well as basic linear algebra operations. Examples showing the benefits of the decomposition are given, and the efficiency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional operator.
translated by 谷歌翻译
与2D栅格图像不同,没有用于3D视觉数据处理的单个主导表示。点云,网格或隐式功能等不同格式都具有其优点和劣势。尽管如此,诸如签名距离函数之类的网格表示在3D中也具有吸引人的属性。特别是,它们提供恒定的随机访问,并且非常适合现代机器学习。不幸的是,网格的存储大小随其尺寸而呈指数增长。因此,即使在中等分辨率下,它们也经常超过内存限制。这项工作探讨了各种低量张量格式,包括Tucker,Tensor Train和Wartenics Tensor tensor tensor tensor tensor分解,以压缩时间变化的3D数据。我们的方法迭代地计算,体素化和压缩每个帧的截断符号距离函数,并将张量式截断施加到代表整个4D场景的单个压缩张量中,将所有框架凝结到一个单个压缩张量中。我们表明,低级张量压缩对于存储和查询时间变化的签名距离功能非常紧凑。它大大降低了4D场景的内存足迹,同时令人惊讶地保留了它们的几何质量。与现有的基于迭代学习的方法(如DEEPSDF和NERF)不同,我们的方法使用具有理论保证的封闭式算法。
translated by 谷歌翻译
我们提出了Tntorch,这是一个张量学习框架,该框架支持统一界面下的多个分解(包括CandeComp/Parafac,Tucker和Tensor Train)。借助我们的库,用户可以通过自动差异,无缝的GPU支持以及Pytorch的API的便利性学习和处理低排名的张量。除分解算法外,TNTORCH还实施可区分的张量代数,等级截断,交叉透视,批处理处理,全面的张量算术等。
translated by 谷歌翻译
高维时空动力学通常可以在低维子空间中编码。用于建模,表征,设计和控制此类大规模系统的工程应用通常依赖于降低尺寸,以实时计算解决方案。降低维度的常见范例包括线性方法,例如奇异值分解(SVD)和非线性方法,例如卷积自动编码器(CAE)的变体。但是,这些编码技术缺乏有效地表示与时空数据相关的复杂性的能力,后者通常需要可变的几何形状,非均匀的网格分辨率,自适应网格化和/或参数依赖性。为了解决这些实用的工程挑战,我们提出了一个称为神经隐式流(NIF)的一般框架,该框架可以实现大型,参数,时空数据的网格不稳定,低级别表示。 NIF由两个修改的多层感知器(MLP)组成:(i)shapenet,它分离并代表空间复杂性,以及(ii)参数,该参数解释了任何其他输入复杂性,包括参数依赖关系,时间和传感器测量值。我们演示了NIF用于参数替代建模的实用性,从而实现了复杂时空动力学的可解释表示和压缩,有效的多空间质量任务以及改善了稀疏重建的通用性能。
translated by 谷歌翻译
张量火车的分解因其高维张量的简洁表示,因此在机器学习和量子物理学中广泛使用,克服了维度的诅咒。交叉近似 - 从近似形式开发用于从一组选定的行和列中表示矩阵,这是一种有效的方法,用于构建来自其少数条目的张量的张量列器分解。虽然张量列车交叉近似在实际应用中取得了显着的性能,但迄今为止缺乏其理论分析,尤其是在近似误差方面的理论分析。据我们所知,现有结果仅提供元素近似精度的保证,这会导致扩展到整个张量时的束缚非常松。在本文中,我们通过提供精确测量和嘈杂测量的整个张量来保证准确性来弥合这一差距。我们的结果说明了选定子观察器的选择如何影响交叉近似的质量,并且模型误差和/或测量误差引起的近似误差可能不会随着张量的顺序而指数增长。这些结果通过数值实验来验证,并且可能对高阶张量的交叉近似值(例如在量子多体状态的描述中遇到的)具有重要意义。
translated by 谷歌翻译
在本文中,我们在不同研究领域使用的三种模型之间存在联系:来自正式语言和语言学的加权有限自动机〜(WFA),机器学习中使用的经常性神经网络,以及张量网络,包括一组高处的优化技术量子物理学和数值分析中使用的顺序张量。我们首先介绍WFA与张力列车分解,特定形式的张量网络之间的内在关系。该关系允许我们展示由WFA计算的函数的Hankel矩阵的新型低级结构,并设计利用这种结构的有效光谱学习算法来扩展到非常大的Hankel矩阵。我们将解开基本连接在WFA和第二阶逆转神经网络之间〜(2-RNN):在离散符号的序列的情况下,具有线性激活功能的WFA和2-RNN是表现性的。利用该等效结果与加权自动机的经典频谱学习算法相结合,我们介绍了在连续输入向量序列上定义的线性2-RNN的第一可提供学习算法。本算法依赖于Hankel Tensor的低等级子块,可以从中可以从中恢复线性2-RNN的参数。在综合性和现实世界数据的仿真研究中评估了所提出的学习算法的性能。
translated by 谷歌翻译
本文介绍了一种通过张量 - 训练(TT)分解来更紧凑地表示图形神经网络(GNN)表的新方法。我们考虑(a)缺乏节点特征的图形数据,从而在训练过程中学习嵌入的情况; (b)我们希望利用GPU平台,即使对于大型内存GPU,也需要较小的桌子来减少主机到GPU的通信。 TT的使用实现了嵌入的紧凑参数化,使其足够小,甚至可以完全适合现代GPU,即使是大量图形。当与明智的初始化和分层图分区结合使用时,这种方法可以将嵌入矢量的大小降低1,659次,至81,362次,在大型公开可用的基准数据集中,可以实现可比性或更高的准确性或更高的准确性和在多GPU系统上的显着速度。在某些情况下,我们的模型在输入上没有明确的节点功能甚至可以匹配使用节点功能的模型的准确性。
translated by 谷歌翻译
While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark datasets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and predictive performance.
translated by 谷歌翻译
与CNN的分类,分割或对象检测相比,生成网络的目标和方法根本不同。最初,它们不是作为图像分析工具,而是生成自然看起来的图像。已经提出了对抗性训练范式来稳定生成方法,并已被证明是非常成功的 - 尽管绝不是第一次尝试。本章对生成对抗网络(GAN)的动机进行了基本介绍,并通​​过抽象基本任务和工作机制并得出了早期实用方法的困难来追溯其成功的道路。将显示进行更稳定的训练方法,也将显示出不良收敛及其原因的典型迹象。尽管本章侧重于用于图像生成和图像分析的gan,但对抗性训练范式本身并非特定于图像,并且在图像分析中也概括了任务。在将GAN与最近进入场景的进一步生成建模方法进行对比之前,将闻名图像语义分割和异常检测的架构示例。这将允许对限制的上下文化观点,但也可以对gans有好处。
translated by 谷歌翻译
Convolutional Neural Networks (CNNs) with U-shaped architectures have dominated medical image segmentation, which is crucial for various clinical purposes. However, the inherent locality of convolution makes CNNs fail to fully exploit global context, essential for better recognition of some structures, e.g., brain lesions. Transformers have recently proven promising performance on vision tasks, including semantic segmentation, mainly due to their capability of modeling long-range dependencies. Nevertheless, the quadratic complexity of attention makes existing Transformer-based models use self-attention layers only after somehow reducing the image resolution, which limits the ability to capture global contexts present at higher resolutions. Therefore, this work introduces a family of models, dubbed Factorizer, which leverages the power of low-rank matrix factorization for constructing an end-to-end segmentation model. Specifically, we propose a linearly scalable approach to context modeling, formulating Nonnegative Matrix Factorization (NMF) as a differentiable layer integrated into a U-shaped architecture. The shifted window technique is also utilized in combination with NMF to effectively aggregate local information. Factorizers compete favorably with CNNs and Transformers in terms of accuracy, scalability, and interpretability, achieving state-of-the-art results on the BraTS dataset for brain tumor segmentation and ISLES'22 dataset for stroke lesion segmentation. Highly meaningful NMF components give an additional interpretability advantage to Factorizers over CNNs and Transformers. Moreover, our ablation studies reveal a distinctive feature of Factorizers that enables a significant speed-up in inference for a trained Factorizer without any extra steps and without sacrificing much accuracy. The code and models are publicly available at https://github.com/pashtari/factorizer.
translated by 谷歌翻译
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
translated by 谷歌翻译
我们提出了一种基于有效的量化张量列表表示和广义最大矩阵音量原理的组合进行优化的新过程。我们证明了新的张量火车优化器(TTOPT)方法在各种任务中的适用性,从最小化多维功能到增强学习。我们的算法与流行的基于进化的方法进行了比较,并以函数评估或执行时间的数量(通常是大幅度的余量)优于它们。
translated by 谷歌翻译
最近已证明不变性在机器学习模型中是强大的归纳偏见。这样的一类预测模型是张量网络。我们引入了一种新的数值算法来构建在任意离散组的正常矩阵表示的作用下不变的张量的基础。该方法的数量级可以比以前的方法快几个数量级。然后将组不变的张量合并为一个组不变张量火车网络,该网络可用作监督机器学习模型。考虑到特定于问题的不知道,我们将该模型应用于蛋白质结合分类问题,并根据最新的深度学习方法获得了预测准确性。
translated by 谷歌翻译
许多数值优化技术的收敛性对提供给求解器的初始猜测高度敏感。我们提出了一种基于张量方法的方法,以初始化靠近全局Optima的现有优化求解器。该方法仅使用成本函数的定义,不需要访问任何良好解决方案的数据库。我们首先将成本函数(这是任务参数和优化变量的函数)转换为概率密度函数。与将任务参数设置为常数的现有方法不同,我们将它们视为另一组随机变量,并使用替代概率模型近似任务参数的关节概率分布和优化变量。对于给定的任务,我们就给定的任务参数从条件分布中生成样本,并将其用作优化求解器的初始化。由于调节和来自任意密度函数的调节和采样具有挑战性,因此我们使用张量列车分解来获得替代概率模型,我们可以从中有效地获得条件模型和样品。该方法可以为给定任务产生来自不同模式的多个解决方案。我们首先通过将其应用于各种具有挑战性的基准函数来评估该方法以进行数值优化,这些功能很难使用基于梯度的优化求解器以幼稚的初始化来求解,这表明所提出的方法可以生成靠近全局优化的样品,并且来自多种模式。 。然后,我们通过将所提出的方法应用于7-DOF操纵器来证明框架的通用性及其与机器人技术的相关性。
translated by 谷歌翻译
作为新一代神经体系结构的变形金刚在自然语言处理和计算机视觉方面表现出色。但是,现有的视觉变形金刚努力使用有限的医学数据学习,并且无法概括各种医学图像任务。为了应对这些挑战,我们将Medformer作为数据量表变压器呈现为可推广的医学图像分割。关键设计结合了理想的电感偏差,线性复杂性的层次建模以及以空间和语义全局方式以线性复杂性的关注以及多尺度特征融合。 Medformer可以在不预训练的情况下学习微小至大规模的数据。广泛的实验表明,Medformer作为一般分割主链的潜力,在三个具有多种模式(例如CT和MRI)和多样化的医学靶标(例如,健康器官,疾病,疾病组织和肿瘤)的三个公共数据集上优于CNN和视觉变压器。我们将模型和评估管道公开可用,为促进广泛的下游临床应用提供固体基线和无偏比较。
translated by 谷歌翻译
这本数字本书包含在物理模拟的背景下与深度学习相关的一切实际和全面的一切。尽可能多,所有主题都带有Jupyter笔记本的形式的动手代码示例,以便快速入门。除了标准的受监督学习的数据中,我们将看看物理丢失约束,更紧密耦合的学习算法,具有可微分的模拟,以及加强学习和不确定性建模。我们生活在令人兴奋的时期:这些方法具有从根本上改变计算机模拟可以实现的巨大潜力。
translated by 谷歌翻译
We propose the tensorizing flow method for estimating high-dimensional probability density functions from the observed data. The method is based on tensor-train and flow-based generative modeling. Our method first efficiently constructs an approximate density in the tensor-train form via solving the tensor cores from a linear system based on the kernel density estimators of low-dimensional marginals. We then train a continuous-time flow model from this tensor-train density to the observed empirical distribution by performing a maximum likelihood estimation. The proposed method combines the optimization-less feature of the tensor-train with the flexibility of the flow-based generative models. Numerical results are included to demonstrate the performance of the proposed method.
translated by 谷歌翻译
We present a deep convolutional decoder architecture that can generate volumetric 3D outputs in a compute-and memory-efficient manner by using an octree representation. The network learns to predict both the structure of the octree, and the occupancy values of individual cells. This makes it a particularly valuable technique for generating 3D shapes. In contrast to standard decoders acting on regular voxel grids, the architecture does not have cubic complexity. This allows representing much higher resolution outputs with a limited memory budget. We demonstrate this in several application domains, including 3D convolutional autoencoders, generation of objects and whole scenes from high-level representations, and shape from a single image.
translated by 谷歌翻译
Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention and is rapidly growing both in theoretic investigations and practical applications. The growing popularity of unrolled deep networks is due in part to their potential in developing efficient, high-performance and yet interpretable network architectures from reasonable size training sets. In this article, we review algorithm unrolling for signal and image processing. We extensively cover popular techniques for algorithm unrolling in various domains of signal and image processing including imaging, vision and recognition, and speech processing. By reviewing previous works, we reveal the connections between iterative algorithms and neural networks and present recent theoretical results. Finally, we provide a discussion on current limitations of unrolling and suggest possible future research directions.
translated by 谷歌翻译