translated by 谷歌翻译
比较概率分布是许多机器学习算法的关键。最大平均差异(MMD)和最佳运输距离(OT)是在过去几年吸引丰富的关注的概率措施之间的两类距离。本文建立了一些条件,可以通过MMD规范控制Wassersein距离。我们的作品受到压缩统计学习(CSL)理论的推动,资源有效的大规模学习的一般框架,其中训练数据总结在单个向量(称为草图)中,该训练数据捕获与所考虑的学习任务相关的信息。在CSL中的现有结果启发,我们介绍了H \“较旧的较低限制的等距属性(H \”较旧的LRIP)并表明这家属性具有有趣的保证对压缩统计学习。基于MMD与Wassersein距离之间的关系,我们通过引入和研究学习任务的Wassersein可读性的概念来提供压缩统计学习的保证,即概率分布之间的某些特定于特定的特定度量,可以由Wassersein界定距离。
translated by 谷歌翻译
稀疏矩阵分解是近似矩阵$ \ mathbf {z} $ j $稀疏因素$ \ mathbf {x} ^ {(j)} \ mathbf {x} ^ {(j-1)的乘积的问题} \ ldots \ mathbf {x} ^ {(1)} $。本文旨在鉴于在稀疏限制问题良好地提出的情况下更好地理解,鉴于此问题的可识别性问题。我们提供了将矩阵分解成\ emph {两个}稀疏因素的问题承认唯一的解决方案,最多达到不可避免的置换和缩放等效命令。我们的一般框架考虑了一系列规定的稀疏模式,允许我们捕获更多的稀疏性概念,而不是简单的非零条目的计数。这些条件被证明与精确矩阵分解的基本唯一性有关,以秩一矩阵的总和,具有结构的稀疏性约束。特别地,在固定支持稀疏矩阵分子的情况下,我们基于秩一矩阵完成性为可识别性提供一般的条件,并且我们从它源自完井算法,可以验证是否满足此充分条件,并恢复如果是这种情况,这两个稀疏因素中的条目。伴随文件进一步利用这些条件来导出用于多层稀疏矩阵分解的可识别性特性和理论上声音分解方法,以及与诸如Hadamard或离散傅里叶变换的一些众所周知的快速变换相关联的支持约束。
translated by 谷歌翻译
许多众所周知的矩阵$ Z $与FORMS $ z = x ^ j \ ldots x ^ 1 $相对应的快速变换相关联,其中每个因素$ x ^ \ ell $稀疏和可能结构化。本文研究了这种因素的基本独特性。我们的第一个主要贡献是证明具有所谓的蝴蝶结构的任何$ n \ times n $矩阵承认为$ j $蝴蝶因子(其中$ n = 2 ^ $),并且这些因素可以是通过分层分解方法恢复。这与现有的方法形成对比,其通过梯度下降将蝴蝶因子产品拟合到给定基质的乘积。该提出的方法可以特别应用于检索Hadamard或离散傅里叶变换矩阵的尺寸为2 ^ j $的分解。计算此类构建的成本$ \ mathcal {o}(n ^ 2)$,它是密集矩阵 - 矢量乘法的顺序,而获得的因子化使能快速$ \ mathcal {o}(n \ log n)$矩阵 - 矢量乘法。此分层标识性属性依赖于最近建立的两层和固定支持设置中的简单标识性条件。虽然蝴蝶结构对应于每个因素的固定规定的支撑,但我们的第二款贡献是通过允许的稀疏模式的更多普通家庭获得可识别性结果,同时考虑到不可避免的诽谤歧义。通常,我们通过分层范式展示了分离傅里叶变换矩阵的蝴蝶分解矩阵为2 ^ j $承认为$ 2 $ 2 $-al-dialAlysity的$ 2 $-ad-assity时,将独特的稀疏因子分解为$ j $ factors。关于每个因素。
translated by 谷歌翻译
具有整流线性单元(Relu)非线性的神经网络由参数$ \ Theta $的矢量描述,并实现为分段线性连续函数$ r _ {\ theta}:x \ in \ mathbb r ^ {d} \ mapsto r _ {\ theta}(x)\ in \ mathbb r ^ {k} $。自然缩放和排列在参数$ \ theta $留下的实现不变,导致相同的参数类,产生相同的实现。这些考虑因而导致可识别性的概念 - 从其实现$ r _ {\} $的唯一知识中恢复(等价类别)$ \ theta $的能力。本文的总体目标是介绍任何深度的Relu神经网络,$ \ Phi(\ Theta)$的嵌入,即不变于缩放,并且提供网络实现的本地线性参数化。利用这两个关键属性,我们得出了一些条件,在这种情况下,深度relu网络确实可以从有限一组样本的实现的知识局部地识别$ x_ {i} \ in \ mathbb r ^ {d} $。我们在更深入的深度上研究了浅层案例,为网络建立了必要的和充分条件,从界限子集$ \ Mathcal X \ subseteq \ MathBB r ^ {d} $识别。
translated by 谷歌翻译
We propose a novel method for constructing wavelet transforms of functions defined on the vertices of an arbitrary finite weighted graph. Our approach is based on defining scaling using the the graph analogue of the Fourier domain, namely the spectral decomposition of the discrete graph Laplacian L. Given a wavelet generating kernel g and a scale parameter t, we define the scaled wavelet operator T t g = g(tL). The spectral graph wavelets are then formed by localizing this operator by applying it to an indicator function. Subject to an admissibility condition on g, this procedure defines an invertible transform. We explore the localization properties of the wavelets in the limit of fine scales. Additionally, we present a fast Chebyshev polynomial approximation algorithm for computing the transform that avoids the need for diagonalizing L. We highlight potential applications of the transform through examples of wavelets on graphs corresponding to a variety of different problem domains.
translated by 谷歌翻译
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
translated by 谷歌翻译
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.
translated by 谷歌翻译
Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their spatial extent and the structural information they provide. Traditional line detectors based on the image gradient are extremely fast and accurate, but lack robustness in noisy images and challenging conditions. Their learned counterparts are more repeatable and can handle challenging images, but at the cost of a lower accuracy and a bias towards wireframe lines. We propose to combine traditional and learned approaches to get the best of both worlds: an accurate and robust line detector that can be trained in the wild without ground truth lines. Our new line segment detector, DeepLSD, processes images with a deep network to generate a line attraction field, before converting it to a surrogate image gradient magnitude and angle, which is then fed to any existing handcrafted line detector. Additionally, we propose a new optimization tool to refine line segments based on the attraction field and vanishing points. This refinement improves the accuracy of current deep detectors by a large margin. We demonstrate the performance of our method on low-level line detection metrics, as well as on several downstream tasks using multiple challenging datasets. The source code and models are available at https://github.com/cvg/DeepLSD.
translated by 谷歌翻译
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
translated by 谷歌翻译