智能论文笔记

Flexible and Hierarchical Prior for Bayesian Nonnegative Matrix Factorization

Jun Lu , Xuanyu Ye

分类：机器学习 | (统计)机器学习

2022-05-23

在本文中，我们介绍了一种用于学习非负矩阵分解（NMF）的概率模型，该模型通常用于预测数据中缺失值并在数据中找到隐藏模式，其中矩阵因子是与每个数据维度相关的潜在变量。通过在非负子空间上支持先验的先验，可以处理潜在因素的非阴性约束。采用基于Gibbs抽样的贝叶斯推理程序。我们在几个现实世界中的数据集上评估了该模型，包括Movielens 100K和Movielens 1M具有不同尺寸和尺寸的Movielens，并表明所提出的贝叶斯NMF GRRN模型可导致更好的预测，并避免与现有的贝叶斯NMF方法相比，避免过度适应。

translated by 谷歌翻译

Robust Bayesian Nonnegative Matrix Factorization with Implicit Regularizers

Jun Lu , Christine P. Chai

分类：机器学习

2022-08-22

我们引入了一个具有隐式规范正规化的概率模型，用于学习非负矩阵分解（NMF），该模型通常用于预测缺失值并在数据中找到隐藏模式，其中矩阵因子是与每个数据维度相关的潜在变量。潜在因素的非负限制是通过选择基于指数函数的指数密度或分布的支持的先验来处理的。采用基于Gibbs抽样的贝叶斯推理程序。我们在几个现实世界数据集上评估了该模型，包括癌症中药物敏感性的基因组学（GDSC $ ic_ {50} $）和具有不同尺寸和尺寸的基因体甲基化，并表明拟议的贝叶斯NMF GL $ _2^2^2 $ and and anGL $ _ \ infty $模型可以对不同的数据值进行强大的预测，并避免与竞争性贝叶斯NMF方法相比过度拟合。

translated by 谷歌翻译

Comparative Study of Inference Methods for Interpolative Decomposition

Jun Lu

分类：机器学习

2022-06-29

在本文中，我们提出了一个具有自动相关性测定（ARD）的概率模型，用于学习插值分解（ID），该模型通常用于低级别近似，特征选择，并识别数据中的隐藏模式，其中矩阵因子是潜在的。与每个数据维度关联的变量。在指定子空间上具有支持的先前密度用于解决观察到的矩阵的分量分量的大小的约束。采用基于Gibbs抽样的贝叶斯推理程序。我们在各种现实世界数据集上评估了该模型即使与固定潜在尺寸设置为矩阵等级的香草贝叶斯ID算法相比，甚至会导致较小的重建错误。

translated by 谷歌翻译

Feature Selection via the Intervened Interpolative Decomposition and its Application in Diversifying Quantitative Strategies

Jun Lu , Joerg Osterrieder

分类：机器学习

2022-09-29

在本文中，我们提出了一个用于计算插值分解（ID）的概率模型，其中观察到的矩阵的每一列都有其自身的优先级或重要性，因此分解的最终结果可以找到一组代表该功能的特征，这些功能代表了该功能整个功能以及所选功能的优先级也比其他功能更高。这种方法通常用于低级别近似，特征选择和提取数据中的隐藏模式，其中矩阵因子是与每个数据维度相关的潜在变量。应用贝叶斯推理的Gibbs采样用于进行优化。我们评估了现实世界数据集上的拟议模型，包括十个中国A股票股票，并证明了带有干预措施（IID）的拟议的贝叶斯ID算法（IID）与现有贝叶斯ID算法的可比较重建错误，同时选择具有更高分数或优先级的特征。

translated by 谷歌翻译

Probabilistic matrix factorization

分类：

Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix's own system.

translated by 谷歌翻译

Bayesian Low-rank Matrix Completion with Dual-graph Embedding: Prior Analysis and Tuning-free Inference

Yangge Chen , Lei Cheng , Yik-Chung Wu

分类：机器学习

2022-03-18

最近，通过双段正则化的镜头，基于基于低矩阵完成的无监督学习的兴趣复兴，这显着改善了多学科机器学习任务的性能，例如推荐系统，基因型插图和图像插入。虽然双颗粒正则化贡献了成功的主要部分，但通常涉及计算昂贵的超参数调谐。为了避免这样的缺点并提高完成性能，我们提出了一种新颖的贝叶斯学习算法，该算法会自动学习与双重正规化相关的超参数，同时保证矩阵完成的低级别。值得注意的是，设计出一个小说的先验是为了促进矩阵的低级别并同时编码双电图信息，这比单圈对应物更具挑战性。然后探索所提出的先验和可能性函数之间的非平凡条件偶联性，以使有效算法在变化推理框架下得出。使用合成和现实世界数据集的广泛实验证明了针对各种数据分析任务的拟议学习算法的最先进性能。

translated by 谷歌翻译

Accelerated structured matrix factorization

Lorenzo Schiavon , Bernardo Nipoti , Antonio Canale

分类： (统计)机器学习

2022-12-13

Matrix factorization exploits the idea that, in complex high-dimensional data, the actual signal typically lies in lower-dimensional structures. These lower dimensional objects provide useful insight, with interpretability favored by sparse structures. Sparsity, in addition, is beneficial in terms of regularization and, thus, to avoid over-fitting. By exploiting Bayesian shrinkage priors, we devise a computationally convenient approach for high-dimensional matrix factorization. The dependence between row and column entities is modeled by inducing flexible sparse patterns within factors. The availability of external information is accounted for in such a way that structures are allowed while not imposed. Inspired by boosting algorithms, we pair the the proposed approach with a numerical strategy relying on a sequential inclusion and estimation of low-rank contributions, with data-driven stopping rule. Practical advantages of the proposed approach are demonstrated by means of a simulation study and the analysis of soccer heatmaps obtained from new generation tracking data.

translated by 谷歌翻译

Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data

Sarah Samorodnitsky , Chris H. Wendt , Eric F. Lock

分类： (统计)机器学习

2022-11-29

Understanding of the pathophysiology of obstructive lung disease (OLD) is limited by available methods to examine the relationship between multi-omic molecular phenomena and clinical outcomes. Integrative factorization methods for multi-omic data can reveal latent patterns of variation describing important biological signal. However, most methods do not provide a framework for inference on the estimated factorization, simultaneously predict important disease phenotypes or clinical outcomes, nor accommodate multiple imputation. To address these gaps, we propose Bayesian Simultaneous Factorization (BSF). We use conjugate normal priors and show that the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. We then extend BSF to simultaneously predict a continuous or binary response, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation and full posterior inference for missing data, including "blockwise" missingness, and BSFP offers prediction of unobserved outcomes. We show via simulation that BSFP is competitive in recovering latent variation structure, as well as the importance of propagating uncertainty from the estimated factorization to prediction. We also study the imputation performance of BSF via simulation under missing-at-random and missing-not-at-random assumptions. Lastly, we use BSFP to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated OLD. Our analysis reveals a distinct cluster of patients with OLD driven by shared metabolomic and proteomic expression patterns, as well as multi-omic patterns related to lung function decline. Software is freely available at https://github.com/sarahsamorodnitsky/BSFP .

translated by 谷歌翻译

A Bayesian Framework on Asymmetric Mixture of Factor Analyser

Hamid Reza Safaeyan , Karim Zare , Mohamad R. Mahmoudi , Amir Mosavi

分类：机器学习

2022-11-01

Mixture of factor analyzer (MFA) model is an efficient model for the analysis of high dimensional data through which the factor-analyzer technique based on the covariance matrices reducing the number of free parameters. The model also provides an important methodology to determine latent groups in data. There are several pieces of research to extend the model based on the asymmetrical and/or with outlier datasets with some known computational limitations that have been examined in frequentist cases. In this paper, an MFA model with a rich and flexible class of skew normal (unrestricted) generalized hyperbolic (called SUNGH) distributions along with a Bayesian structure with several computational benefits have been introduced. The SUNGH family provides considerable flexibility to model skewness in different directions as well as allowing for heavy tailed data. There are several desirable properties in the structure of the SUNGH family, including, an analytically flexible density which leads to easing up the computation applied for the estimation of parameters. Considering factor analysis models, the SUNGH family also allows for skewness and heavy tails for both the error component and factor scores. In the present study, the advantages of using this family of distributions have been discussed and the suitable efficiency of the introduced MFA model using real data examples and simulation has been demonstrated.

translated by 谷歌翻译

Sparse Horseshoe Estimation via Expectation-Maximisation

Shu Yu Tew , Daniel F. Schmidt , Enes Makalic

分类： (统计)机器学习 | 机器学习

2022-11-07

The horseshoe prior is known to possess many desirable properties for Bayesian estimation of sparse parameter vectors, yet its density function lacks an analytic form. As such, it is challenging to find a closed-form solution for the posterior mode. Conventional horseshoe estimators use the posterior mean to estimate the parameters, but these estimates are not sparse. We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters in the case of the standard linear model. A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood. We introduce several simple modifications of this EM procedure that allow for straightforward extension to generalised linear models. In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods in terms of statistical performance and computational cost.

translated by 谷歌翻译

Bayesian Complementary Kernelized Learning for Multidimensional Spatiotemporal Data

Mengying Lei , Aurelie Labbe , Lijun Sun

分类： (统计)机器学习 | 机器学习

2022-08-21

多维时空数据的概率建模对于许多现实世界应用至关重要。然而，现实世界时空数据通常表现出非平稳性的复杂依赖性，即相关结构随位置/时间而变化，并且在空间和时间之间存在不可分割的依赖性，即依赖关系。开发有效和计算有效的统计模型，以适应包含远程和短期变化的非平稳/不可分割的过程，成为一项艰巨的任务，尤其是对于具有各种腐败/缺失结构的大规模数据集。在本文中，我们提出了一个新的统计框架 - 贝叶斯互补内核学习（BCKL），以实现多维时空数据的可扩展概率建模。为了有效地描述复杂的依赖性，BCKL与短距离时空高斯过程（GP）相结合的内核低级分解（GP），其中两个组件相互补充。具体而言，我们使用多线性低级分组组件来捕获数据中的全局/远程相关性，并基于紧凑的核心函数引入加法短尺度GP，以表征其余的局部变异性。我们为模型推断开发了有效的马尔可夫链蒙特卡洛（MCMC）算法，并在合成和现实世界时空数据集上评估了所提出的BCKL框架。我们的结果证实了BCKL在提供准确的后均值和高质量不确定性估计方面的出色表现。

translated by 谷歌翻译

A similarity-based Bayesian mixture-of-experts model

Tianfang Zhang , Rasmus Bokrantz , Jimmy Olsson

分类： (统计)机器学习 | 机器学习

2020-12-03

我们提出了一种新的非参数混合物模型，用于多变量回归问题，灵感来自概率K-Nearthimest邻居算法。使用有条件指定的模型，对样本外输入的预测基于与每个观察到的数据点的相似性，从而产生高斯混合物表示的预测分布。在混合物组件的参数以及距离度量标准的参数上，使用平均场变化贝叶斯算法进行后推断，并具有基于随机梯度的优化过程。在与数据大小相比，输入 - 输出关系很复杂，预测分布可能偏向或多模式的情况下，输入相对较高的尺寸，该方法尤其有利。对五个数据集进行的计算研究，其中两个是合成生成的，这说明了我们的高维输入的专家混合物方法的明显优势，在验证指标和视觉检查方面都优于竞争者模型。

translated by 谷歌翻译

Unitary Approximate Message Passing for Matrix Factorization

Zhengdao Yuan , Qinghua Guo , Yonina C. Eldar , Yonghui Li

分类：机器学习

2022-07-31

我们考虑具有某些约束的矩阵分解（MF），在各个领域找到广泛的应用。利用变异推理（VI）和单一近似消息传递（UAMP），我们通过有效的消息传递实现（称为UAMPMF）开发了MF的贝叶斯方法。通过对因子矩阵施加的适当先验，UAMPMF可用于解决许多可以表达为MF的问题，例如非负基质分解，词典学习，具有矩阵不确定性的压缩感，可靠的主成分分析和稀疏矩阵分解。提供了广泛的数值示例，以表明UAMPMF在恢复精度，鲁棒性和计算复杂性方面显着优于最先进的算法。

translated by 谷歌翻译

BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery

Chris Cundy , Aditya Grover , Stefano Ermon

分类：机器学习 | 人工智能 | (统计)机器学习

2021-12-06

结构方程模型（SEM）是一种有效的框架，其原因是通过定向非循环图（DAG）表示的因果关系。最近的进步使得能够从观察数据中实现了DAG的最大似然点估计。然而，在实际场景中，可以不能准确地捕获在推断下面的底层图中的不确定性，其中真正的DAG是不可识别的并且/或观察到的数据集是有限的。我们提出了贝叶斯因果发现网（BCD网），一个变分推理框架，用于估算表征线性高斯SEM的DAG的分布。由于图形的离散和组合性质，开发一个完整的贝叶斯后面是挑战。我们通过表达变分别家庭分析可扩展VI的可扩展VI的关键设计选择，例如1）表达性变分别家庭，2）连续弛豫，使低方差随机优化和3）在潜在变量上具有合适的前置。我们提供了一系列关于实际和合成数据的实验，显示BCD网在低数据制度中的标准因果发现度量上的最大似然方法，例如结构汉明距离。

translated by 谷歌翻译

Variational Factorization Machines for Preference Elicitation in Large-Scale Recommender Systems

Jill-Jênn Vie , Tomas Rigaux , Hisashi Kashima

分类：机器学习 | 人工智能

2022-12-20

Factorization machines (FMs) are a powerful tool for regression and classification in the context of sparse observations, that has been successfully applied to collaborative filtering, especially when side information over users or items is available. Bayesian formulations of FMs have been proposed to provide confidence intervals over the predictions made by the model, however they usually involve Markov-chain Monte Carlo methods that require many samples to provide accurate predictions, resulting in slow training in the context of large-scale data. In this paper, we propose a variational formulation of factorization machines that allows us to derive a simple objective that can be easily optimized using standard mini-batch stochastic gradient descent, making it amenable to large-scale data. Our algorithm learns an approximate posterior distribution over the user and item parameters, which leads to confidence intervals over the predictions. We show, using several datasets, that it has comparable or better performance than existing methods in terms of prediction accuracy, and provide some applications in active learning strategies, e.g., preference elicitation techniques.

translated by 谷歌翻译

Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

Yuan Jin , He Zhao , Ming Liu , Ye Zhu , Lan Du , Longxiang Gao , He Zhang , Yunfeng Li

分类：机器学习 | (统计)机器学习

2020-02-21

矩阵分解（MF）已广泛应用于建议系统中的协作过滤。它的贝叶斯变体可以得出用户和项目嵌入的后验分布，并且对稀疏评分更强大。但是，贝叶斯方法受到其后验参数的更新规则的限制，这是由于先验和可能性的结合。变量自动编码器（VAE）可以通过捕获后验参数和数据之间的复杂映射来解决此问题。但是，当前对合作过滤的VAE的研究仅根据明确的数据信息考虑映射，而隐含嵌入信息则被忽略了。在本文中，我们首先从两个观点（以用户为导向和面向项目的观点）得出了贝叶斯MF模型的贝叶斯MF模型的较低界限（ELBO）。根据肘部，我们提出了一个基于VAE的贝叶斯MF框架。它不仅利用数据，还利用嵌入信息来近似用户项目联合分布。正如肘部所建议的那样，近似是迭代的，用户和项目嵌入彼此的编码器的交叉反馈。更具体地说，在上一个迭代中采样的用户嵌入被馈送到项目端编码器中，以估计当前迭代处的项目嵌入的后验参数，反之亦然。该估计还可以关注交叉食品的嵌入式，以进一步利用有用的信息。然后，解码器通过当前重新采样的用户和项目嵌入方式通过矩阵分解重建数据。

translated by 谷歌翻译

Variational Inference: A Review for Statisticians

David M. Blei , Alp Kucukelbir , Jon D. McAuliffe

分类：

2016-01-04

One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.

translated by 谷歌翻译

Black Box Variational Inference

Rajesh Ranganath , Sean Gerrish , David M. Blei

分类：

2013-12-31

Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant model-specific analysis, and these efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult model-based derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data.

translated by 谷歌翻译

Fully Bayesian inference for latent variable Gaussian process models

Suraj Yerramilli , Akshay Iyer , Wei Chen , Daniel W. Apley

分类： (统计)机器学习 | 机器学习

2022-11-04

Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first mapping each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.

translated by 谷歌翻译

Variational Bayesian inference for CP tensor completion with side information

Stanislav Budzinskiy , Nikolai Zamarashkin

分类：机器学习

2022-06-24

We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of lowdimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induced by SI with extensive numerical experiments on synthetic and real-world data and present the results about tensor recovery and rank determination. The results show that the number of samples required for successful completion is significantly reduced in the presence of SI. We also discuss the origin of a bump in the phase transition curves that exists when the dimensionality of SI is comparable with that of the tensor.

translated by 谷歌翻译