智能论文笔记

Fast algorithm for overcomplete order-3 tensor decomposition

Jingqiu Ding , Tommaso d'Orsi , Chih-Hung Liu , Stefan Tiegel , David Steurer

分类：机器学习

2022-02-14

我们开发了第一个快速频谱算法，用于分解$ \ mathbb {r}^d $排名到$ o的随机三阶张量。我们的算法仅涉及简单的线性代数操作，并且可以在当前矩阵乘法时间下在时间$ o（d^{6.05}）$中恢复所有组件。在这项工作之前，只能通过方形的总和[MA，Shi，Steurer 2016]实现可比的保证。相反，快速算法[Hopkins，Schramm，Shi，Steurer 2016]只能分解排名最多的张量（D^{4/3}/\ text {polylog}（d））$。我们的算法结果取决于两种关键成分。将三阶张量的清洁提升到六阶张量，可以用张量网络的语言表示。将张量网络仔细分解为一系列矩形矩阵乘法，这使我们能够快速实现该算法。

translated by 谷歌翻译

Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures

Rares-Darius Buhai , David Steurer

分类：机器学习 | (统计)机器学习

2021-12-10

我们认为$ k \ geq 2 $高斯组件的混合物具有良好分离的未知方式和未知的手段和未知的协方差（相同的协方差，即独特的组件在大多数$ k { - c} $的统计重叠中具有统计重叠足够的常数$ c \ ge 1 $。以前的统计查询下限[DKS17]给出了甚至区分此类混合物的正式证据，这些混合物可能是难以指示的（以美元为单位）。我们表明，如果允许混合重量呈指数小，则只能出现这种硬度，并且对于多项式下界混合权重的非琐碎的算法保证，可以在准多项式时间内进行。具体地，我们在最小混合重量中基于具有运行时间准多项式的正方形方法的算法。该算法可以可靠地区分$ K \ GE 2 $良好分离的高斯组件和（纯）高斯分布的混合物。作为证书，该算法计算输入样品的两分，其分离一对混合物组分，即，两侧的两侧含有至少一个组分的大多数样本点。对于Colinear意味着的特殊情况，我们的算法输出了输入样本的$ K $群集，其与混合物的组件大致一致。对我们的结果进行了重大挑战是，与最先前的高斯混合物的最先前结果不同，它们似乎对富集的抗体异常值不同。原因是，即使对于具有多项式下有界混合重量的混合物，这种异常值也可以模拟指数小的混合重量。关键技术成分是在对应于最小混合重量中的两种仔细选择的顺序对数的瞬间的多项式的矩分开的分离性高斯部件的分离方向的表征。

translated by 谷歌翻译

Robust recovery for stochastic block models

Jingqiu Ding , Tommaso d'Orsi , Rajai Nasser , David Steurer

分类：机器学习 | (统计)机器学习

2021-11-16

我们开发了一种高效的随机块模型中的弱恢复算法。该算法与随机块模型的Vanilla版本的最佳已知算法的统计保证匹配。从这个意义上讲，我们的结果表明，随机块模型没有稳健性。我们的工作受到最近的银行，Mohanty和Raghavendra（SODA 2021）的工作，为相应的区别问题提供了高效的算法。我们的算法及其分析显着脱离了以前的恢复。关键挑战是我们算法的特殊优化景观：种植的分区可能远非最佳意义，即完全不相关的解决方案可以实现相同的客观值。这种现象与PCA的BBP相转变的推出效应有关。据我们所知，我们的算法是第一个在非渐近设置中存在这种推出效果的鲁棒恢复。我们的算法是基于凸优化的框架的实例化（与平方和不同的不同），这对于其他鲁棒矩阵估计问题可能是有用的。我们的分析的副产物是一种通用技术，其提高了任意强大的弱恢复算法的成功（输入的随机性）从恒定（或缓慢消失）概率以指数高概率。

translated by 谷歌翻译

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi , Chih-Hung Liu , Rajai Nasser , Gleb Novikov , David Steurer , Stefan Tiegel

分类：机器学习 | (统计)机器学习

2021-11-04

我们开发机器以设计有效的可计算和一致的估计，随着观察人数而达到零的估计误差，因为观察的次数增长，当面对可能损坏的答复，除了样本的所有品，除了每种量之外的ALL。作为具体示例，我们调查了两个问题：稀疏回归和主成分分析（PCA）。对于稀疏回归，我们实现了最佳样本大小的一致性$ n \ gtrsim（k \ log d）/ \ alpha ^ $和最佳错误率$ o（\ sqrt {（k \ log d）/（n \ cdot \ alpha ^ 2））$ N $是观察人数，$ D $是尺寸的数量，$ k $是参数矢量的稀疏性，允许在数量的数量中为逆多项式进行逆多项式样品。在此工作之前，已知估计是一致的，当Inliers $ \ Alpha $ IS $ O（1 / \ log \ log n）$，即使是（非球面）高斯设计矩阵时也是一致的。结果在弱设计假设下持有，并且在这种一般噪声存在下仅被D'Orsi等人最近以密集的设置（即一般线性回归）显示。 [DNS21]。在PCA的上下文中，我们在参数矩阵上的广泛尖端假设下获得最佳错误保证（通常用于矩阵完成）。以前的作品可以仅在假设下获得非琐碎的保证，即与最基于的测量噪声以$ n $（例如，具有方差1 / n ^ 2 $的高斯高斯）。为了设计我们的估算，我们用非平滑的普通方（如$ \ ell_1 $ norm或核规范）装备Huber丢失，并以一种新的方法来分析损失的新方法[DNS21]的方法[DNS21]。功能。我们的机器似乎很容易适用于各种估计问题。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Posterior Collapse and Latent Variable Non-identifiability

Yixin Wang , David M. Blei , John P. Cunningham

分类： (统计)机器学习 | 机器学习

2023-01-02

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Morphology-based non-rigid registration of coronary computed tomography and intravascular images through virtual catheter path optimization

Karim Kadry , Abhishek Karmakar , Andreas Schuh , Kersten Peterson , Michiel Schaap , David Marlevi , Charles Taylor , Elazer Edelman , Farhad Nezami

分类：计算机视觉

2022-12-30

Coronary Computed Tomography Angiography (CCTA) provides information on the presence, extent, and severity of obstructive coronary artery disease. Large-scale clinical studies analyzing CCTA-derived metrics typically require ground-truth validation in the form of high-fidelity 3D intravascular imaging. However, manual rigid alignment of intravascular images to corresponding CCTA images is both time consuming and user-dependent. Moreover, intravascular modalities suffer from several non-rigid motion-induced distortions arising from distortions in the imaging catheter path. To address these issues, we here present a semi-automatic segmentation-based framework for both rigid and non-rigid matching of intravascular images to CCTA images. We formulate the problem in terms of finding the optimal \emph{virtual catheter path} that samples the CCTA data to recapitulate the coronary artery morphology found in the intravascular image. We validate our co-registration framework on a cohort of $n=40$ patients using bifurcation landmarks as ground truth for longitudinal and rotational registration. Our results indicate that our non-rigid registration significantly outperforms other co-registration approaches for luminal bifurcation alignment in both longitudinal (mean mismatch: 3.3 frames) and rotational directions (mean mismatch: 28.6 degrees). By providing a differentiable framework for automatic multi-modal intravascular data fusion, our developed co-registration modules significantly reduces the manual effort required to conduct large-scale multi-modal clinical studies while also providing a solid foundation for the development of machine learning-based co-registration approaches.

translated by 谷歌翻译

Controllable Mechanical-domain Energy Accumulators

Sung Y. Kim , David J. Braun

分类：机器人

2022-12-29

Springs are efficient in storing and returning elastic potential energy but are unable to hold the energy they store in the absence of an external load. Lockable springs use clutches to hold elastic potential energy in the absence of an external load but have not yet been widely adopted in applications, partly because clutches introduce design complexity, reduce energy efficiency, and typically do not afford high-fidelity control over the energy stored by the spring. Here, we present the design of a novel lockable compression spring that uses a small capstan clutch to passively lock a mechanical spring. The capstan clutch can lock up to 1000 N force at any arbitrary deflection, unlock the spring in less than 10 ms with a control force less than 1 % of the maximal spring force, and provide an 80 % energy storage and return efficiency (comparable to a highly efficient electric motor operated at constant nominal speed). By retaining the form factor of a regular spring while providing high-fidelity locking capability even under large spring forces, the proposed design could facilitate the development of energy-efficient spring-based actuators and robots.

translated by 谷歌翻译