The Forster transform is a method of regularizing a dataset by placing it in {\em radial isotropic position} while maintaining some of its essential properties. Forster transforms have played a key role in a diverse range of settings spanning computer science and functional analysis. Prior work had given {\em weakly} polynomial time algorithms for computing Forster transforms, when they exist. Our main result is the first {\em strongly polynomial time} algorithm to compute an approximate Forster transform of a given dataset or certify that no such transformation exists. By leveraging our strongly polynomial Forster algorithm, we obtain the first strongly polynomial time algorithm for {\em distribution-free} PAC learning of halfspaces. This learning result is surprising because {\em proper} PAC learning of halfspaces is {\em equivalent} to linear programming. Our learning approach extends to give a strongly polynomial halfspace learner in the presence of random classification noise and, more generally, Massart noise.
translated by 谷歌翻译
我们研究了学习单个神经元的基本问题,即$ \ mathbf {x} \ mapsto \ sigma(\ mathbf {w} \ cdot \ cdot \ mathbf {x})$单调激活$ \ sigma $ \ sigma: \ mathbb {r} \ mapsto \ mathbb {r} $,相对于$ l_2^2 $ -loss,在存在对抗标签噪声的情况下。具体来说,我们将在$(\ mathbf {x},y)\ in \ mathbb {r}^d \ times \ times \ mathbb {r} $上给我们从$(\ mathbf {x},y)\ on a发行$ d $中给我们标记的示例。 }^\ ast \ in \ mathbb {r}^d $ achieving $ f(\ mathbf {w}^\ ast)= \ epsilon $,其中$ f(\ mathbf {w})= \ m马理bf {e} (\ mathbf {x},y)\ sim d} [(\ sigma(\ mathbf {w} \ cdot \ mathbf {x}) - y)^2] $。学习者的目标是输出假设向量$ \ mathbf {w} $,以使$ f(\ m athbb {w})= c \,\ epsilon $具有高概率,其中$ c> 1 $是通用常数。作为我们的主要贡献,我们为广泛的分布(包括对数 - 循环分布)和激活功能提供有效的恒定因素近似学习者。具体地说,对于各向同性对数凸出分布的类别,我们获得以下重要的推论:对于逻辑激活,我们获得了第一个多项式时间常数因子近似(即使在高斯分布下)。我们的算法具有样品复杂性$ \ widetilde {o}(d/\ epsilon)$,这在多毛体因子中很紧。对于relu激活,我们给出了一个有效的算法,带有样品复杂性$ \ tilde {o}(d \,\ polylog(1/\ epsilon))$。在我们工作之前,最著名的常数因子近似学习者具有样本复杂性$ \ tilde {\ omega}(d/\ epsilon)$。在这两个设置中,我们的算法很简单,在(正规)$ L_2^2 $ -LOSS上执行梯度散发。我们的算法的正确性取决于我们确定的新结构结果,表明(本质上是基本上)基础非凸损失的固定点大约是最佳的。
translated by 谷歌翻译
Mazumdar和Saha \ Cite {MS17A}的开创性论文引入了有关聚类的广泛工作,并带有嘈杂的查询。然而,尽管在问题上取得了重大进展,但所提出的方法至关重要地取决于了解基础全随随随之而来的甲骨文错误的确切概率。在这项工作中,我们开发了可靠的学习方法,这些方法可以忍受一般的半随机噪声,从而在定性上获得与全随机模型中最佳方法相同的保证。更具体地说,给定一组$ n $点带有未知的基础分区,我们可以查询点$ u,v $检查它们是否在同一群集中,但是有了概率$ p $,答案可能可以受到对抗的选择。我们在理论上显示信息$ o \ left(\ frac {nk \ log n} {(1-2p)^2} \ right)$查询足以学习任何足够大尺寸的群集。我们的主要结果是一种计算高效算法,可以用$ o \ left(\ frac {nk \ log n} {(1-2p)^2} \ right) + \ text {poly} \ left(\ log(\ log) n,k,\ frac {1} {1-2p} \ right)$查询,与完全随机模型中最知名算法的保证相匹配。作为我们方法的推论,我们为全随机模型开发了第一个无参数算法,并通过\ cite {ms17a}回答一个空的问题。
translated by 谷歌翻译
我们在盒子值上相关的分布中重新审视经典的潘多拉盒(PB)问题。 ARXIV的最新工作:1911.01632获得了限制性类别的策略量持续近似算法,该策略以固定顺序访问框。在这项工作中,我们研究了近似最佳策略的复杂性,该策略可以根据迄今为止所看到的值适应下一步访问哪个框。我们的主要结果确定了PB的近似值等效性与研究良好的统一决策树(UDT)问题,从随机优化和Min-Sum Set封面的变体($ \ MATHCAL {MSSC} _F $)问题。对于支持$ M $的分布,UDT承认$ \ log M $近似值,而多项式时间的恒定因子近似是一个长期的开放问题,但在次指数时间内可以实现恒定的因子近似值(ARXIV:1906.11385)。我们的主要结果意味着PB和$ \ MATHCAL {MSSC} _F $具有相同的属性。我们还研究了一个案例,使价值分布更简洁地作为$ m $产品分布的混合物。这个问题再次与最佳决策树的嘈杂变体有关,该变体更具挑战性。我们给出一个恒定的因子近似值,该近似时间$ n^{\ tilde o(m^2/\ varepsilon^2)} $当每个盒子上的混合组件在电视距离中相同或通过$ \ varepsilon $在电视距离中相同或分开。
translated by 谷歌翻译
我们在高斯分布下使用Massart噪声与Massart噪声进行PAC学习半个空间的问题。在Massart模型中,允许对手将每个点$ \ mathbf {x} $的标签与未知概率$ \ eta(\ mathbf {x})\ leq \ eta $,用于某些参数$ \ eta \ [0,1 / 2] $。目标是找到一个假设$ \ mathrm {opt} + \ epsilon $的错误分类错误,其中$ \ mathrm {opt} $是目标半空间的错误。此前已经在两个假设下研究了这个问题:(i)目标半空间是同质的(即,分离超平面通过原点),并且(ii)参数$ \ eta $严格小于$ 1/2 $。在此工作之前,当除去这些假设中的任何一个时,不知道非增长的界限。我们研究了一般问题并建立以下内容:对于$ \ eta <1/2 $,我们为一般半个空间提供了一个学习算法,采用样本和计算复杂度$ d ^ {o_ {\ eta}(\ log(1 / \ gamma) )))}} \ mathrm {poly}(1 / \ epsilon)$,其中$ \ gamma = \ max \ {\ epsilon,\ min \ {\ mathbf {pr} [f(\ mathbf {x})= 1], \ mathbf {pr} [f(\ mathbf {x})= -1] \} \} $是目标半空间$ f $的偏差。现有的高效算法只能处理$ \ gamma = 1/2 $的特殊情况。有趣的是,我们建立了$ d ^ {\ oomega(\ log(\ log(\ log(\ log))}}的质量匹配的下限,而是任何统计查询(SQ)算法的复杂性。对于$ \ eta = 1/2 $,我们为一般半空间提供了一个学习算法,具有样本和计算复杂度$ o_ \ epsilon(1)d ^ {o(\ log(1 / epsilon))} $。即使对于均匀半空间的子类,这个结果也是新的;均匀Massart半个空间的现有算法为$ \ eta = 1/2 $提供可持续的保证。我们与D ^ {\ omega(\ log(\ log(\ log(\ log(\ epsilon))} $的近似匹配的sq下限补充了我们的上限,这甚至可以为同类半空间的特殊情况而保持。
translated by 谷歌翻译
Given a large graph with few node labels, how can we (a) identify the mixed network-effect of the graph and (b) predict the unknown labels accurately and efficiently? This work proposes Network Effect Analysis (NEA) and UltraProp, which are based on two insights: (a) the network-effect (NE) insight: a graph can exhibit not only one of homophily and heterophily, but also both or none in a label-wise manner, and (b) the neighbor-differentiation (ND) insight: neighbors have different degrees of influence on the target node based on the strength of connections. NEA provides a statistical test to check whether a graph exhibits network-effect or not, and surprisingly discovers the absence of NE in many real-world graphs known to have heterophily. UltraProp solves the node classification problem with notable advantages: (a) Accurate, thanks to the network-effect (NE) and neighbor-differentiation (ND) insights; (b) Explainable, precisely estimating the compatibility matrix; (c) Scalable, being linear with the input size and handling graphs with millions of nodes; and (d) Principled, with closed-form formula and theoretical guarantee. Applied on eight real-world graph datasets, UltraProp outperforms top competitors in terms of accuracy and run time, requiring only stock CPU servers. On a large real-world graph with 1.6M nodes and 22.3M edges, UltraProp achieves more than 9 times speedup (12 minutes vs. 2 hours) compared to most competitors.
translated by 谷歌翻译
High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
translated by 谷歌翻译
While risk-neutral reinforcement learning has shown experimental success in a number of applications, it is well-known to be non-robust with respect to noise and perturbations in the parameters of the system. For this reason, risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. In this work, we introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties. In particular, we study the effect of exponential criteria on the risk-sensitivity of the policy of a reinforcement learning agent, and develop variants of the Monte Carlo Policy Gradient algorithm and the online (temporal-difference) Actor-Critic algorithm. Analytical results showcase that the use of exponential criteria generalize commonly used ad-hoc regularization approaches. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
translated by 谷歌翻译
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for clustering, classification, and regression problems.
translated by 谷歌翻译
Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based on current consumer preferences within an interactive and immersive environment. To achieve this we have to deal with the following central challenges: (1) the proposed method should not hinder the creative process and thus it has to rely only on the garment's visual characteristics, (2) the new garment lacks historical data from which to extrapolate their future popularity and (3) fashion trends in general are highly dynamical. To this end, we develop a computer vision pipeline fine tuned on fashion imagery in order to extract relevant visual features along with the category and attributes of the garment. We propose a hierarchical label sharing (HLS) pipeline for automatically capturing hierarchical relations among fashion categories and attributes. Moreover, we propose MuQAR, a Multimodal Quasi-AutoRegressive neural network that forecasts the popularity of new garments by combining their visual features and categorical features while an autoregressive neural network is modelling the popularity time series of the garment's category and attributes. Both the proposed HLS and MuQAR prove capable of surpassing the current state-of-the-art in key benchmark datasets, DeepFashion for image classification and VISUELLE for new garment sales forecasting.
translated by 谷歌翻译