智能论文笔记

Machine Learning Framework: Competitive Intelligence and Key Drivers Identification of Market Share Trends Among Healthcare Facilities

Anudeep Appe , Bhanu Poluparthi , Lakshmi Kasivajjula , Udai Mv , Sobha Bagadi , Punya Modi , Aditya Singh , Hemanth Gunupudi

分类：机器学习

2022-12-09

The necessity of data driven decisions in healthcare strategy formulation is rapidly increasing. A reliable framework which helps identify factors impacting a Healthcare Provider Facility or a Hospital (from here on termed as Facility) Market Share is of key importance. This pilot study aims at developing a data driven Machine Learning - Regression framework which aids strategists in formulating key decisions to improve the Facilitys Market Share which in turn impacts in improving the quality of healthcare services. The US (United States) healthcare business is chosen for the study; and the data spanning across 60 key Facilities in Washington State and about 3 years of historical data is considered. In the current analysis Market Share is termed as the ratio of facility encounters to the total encounters among the group of potential competitor facilities. The current study proposes a novel two-pronged approach of competitor identification and regression approach to evaluate and predict market share, respectively. Leveraged model agnostic technique, SHAP, to quantify the relative importance of features impacting the market share. The proposed method to identify pool of competitors in current analysis, develops Directed Acyclic Graphs (DAGs), feature level word vectors and evaluates the key connected components at facility level. This technique is robust since its data driven which minimizes the bias from empirical techniques. Post identifying the set of competitors among facilities, developed Regression model to predict the Market share. For relative quantification of features at a facility level, incorporated SHAP a model agnostic explainer. This helped to identify and rank the attributes at each facility which impacts the market share.

translated by 谷歌翻译

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

Jinglin Chen , Aditya Modi , Akshay Krishnamurthy , Nan Jiang , Alekh Agarwal

分类：机器学习 | 人工智能 | (统计)机器学习

2022-06-21

我们在一般的非线性函数近似下研究无奖励增强学习（RL），并在各种标准结构假设下建立样品效率和硬度结果。从积极的一面来看，我们提出了在最小的结构假设下进行样品有效奖励探索的Rfolive（无奖励橄榄）算法，该假设涵盖了先前研究的线性MDPS的设置（Jin等，2020b），线性完整性（线性完整性）（ Zanette等人，2020b）和低级MDP，具有未知的表示（Modi等，2021）。我们的分析表明，以前针对后两个设置的易学性或可及性假设在统计上对于无奖励探索而言并不是必需的。在负面方面，我们为在线性完整性假设下的无奖励和奖励意识探索提供统计硬度结果时，当基础特征未知时，显示了低级别和线性完整性设置之间的指数分离。

translated by 谷歌翻译

Joint Learning-Based Stabilization of Multiple Unknown Linear Systems

Mohamad Kazem Shirani Faradonbeh , Aditya Modi

分类：人工智能 | 机器学习

2022-01-01

基于学习的线性系统控制最近收到了大量的注意。在流行的设置中，真正的动态模型对决策者未知，并且需要通过将控制输入应用于系统来交互式学习。与用于自适应控制单个系统的有效加强学习政策的成熟文献不同，目前没有导致多个系统的联合学习的结果。特别是，快速可靠的关节稳定化的重要问题仍然是唯一的，因此这项工作的重点是唯一的。我们提出了一种新颖的基于联合学习的稳定算法，用于从不稳定状态轨迹的数据中快速地学习所有系统的稳定政策。所提出的程序被认为是显着有效的，使得它在极短的时间段内稳定动力系统系列。

translated by 谷歌翻译

Joint Learning of Linear Time-Invariant Dynamical Systems

Aditya Modi , Mohamad Kazem Shirani Faradonbeh , Ambuj Tewari , George Michailidis

分类： (统计)机器学习 | 机器学习

2021-12-21

学习线性时间不变动态系统（LTID）的参数是当前兴趣的问题。在许多应用程序中，人们有兴趣联合学习多个相关LTID的参数，这仍然是未探究的日期。为此，我们开发一个联合估计器，用于学习共享常见基矩阵的LTID的过渡矩阵。此外，我们建立有限时间误差界限，取决于底层的样本大小，维度，任务数和转换矩阵的光谱属性。结果是在轻度规律假设下获得的，并在单独学习每个系统的比较中，展示从LTID的汇集信息汇总信息。我们还研究了错过过渡矩阵的联合结构的影响，并显示成立的结果在适度误操作的存在下是强大的。

translated by 谷歌翻译

Model-free Representation Learning and Exploration in Low-rank MDPs

Aditya Modi , Jinglin Chen , Akshay Krishnamurthy , Nan Jiang , Alekh Agarwal

分类：机器学习 | (统计)机器学习

2021-02-14

低级MDP已成为研究强化学习中的表示和探索的重要模型。有了已知的代表，存在几种无模型的探索策略。相反，未知表示设置的所有算法都是基于模型的，因此需要对完整动力学进行建模。在这项工作中，我们介绍了低级MDP的第一个无模型表示学习算法。关键的算法贡献是一个新的Minimax表示学习目标，我们为其提供具有不同权衡的变体，其统计和计算属性不同。我们将这一表示的学习步骤与探索策略交织在一起，以无奖励的方式覆盖状态空间。所得算法可证明样品有效，并且可以适应一般函数近似以扩展到复杂的环境。

translated by 谷歌翻译

Cross-Domain Consumer Review Analysis

Aditya Pandey , Kunal Joshi

分类：机器学习

2022-12-23

The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.

translated by 谷歌翻译

A Twitter BERT Approach for Offensive Language Detection in Marathi

Tanmay Chavan , Shantanu Patankar , Aditya Kane , Omkar Gokhale , Raviraj Joshi

分类：自然语言处理

2022-12-20

Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.

translated by 谷歌翻译

Continual Mean Estimation Under User-Level Privacy

Anand Jerry George , Lekshmi Ramesh , Aditya Vikram Singh , Himanshu Tyagi

分类：机器学习

2022-12-20

We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tilde{\Omega}(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.

translated by 谷歌翻译

A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness

Tiantian Feng , Rajat Hebbar , Nicholas Mehlman , Xuan Shi , Aditya Kommineni , and Shrikanth Narayanan

分类：机器学习

2022-12-18

Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over privacy breaches, discriminating performance, and vulnerability to adversarial attacks have all been discovered in ML research fields. In order to address the above challenges and risks, a significant number of efforts have been made to ensure these ML systems are trustworthy, especially private, safe, and fair. In this paper, we conduct the first comprehensive survey on speech-centric trustworthy ML topics related to privacy, safety, and fairness. In addition to serving as a summary report for the research community, we point out several promising future research directions to inspire the researchers who wish to explore further in this area.

translated by 谷歌翻译

BNSynth: Bounded Boolean Functional Synthesis

Ravi Raja , Stanly Samuel , Chiranjib Bhattacharyya , Deepak D'Souza , Aditya Kanade

分类：人工智能 | 机器学习

2022-12-15

The automated synthesis of correct-by-construction Boolean functions from logical specifications is known as the Boolean Functional Synthesis (BFS) problem. BFS has many application areas that range from software engineering to circuit design. In this paper, we introduce a tool BNSynth, that is the first to solve the BFS problem under a given bound on the solution space. Bounding the solution space induces the synthesis of smaller functions that benefit resource constrained areas such as circuit design. BNSynth uses a counter-example guided, neural approach to solve the bounded BFS problem. Initial results show promise in synthesizing smaller solutions; we observe at least \textbf{3.2X} (and up to \textbf{24X}) improvement in the reduction of solution size on average, as compared to state of the art tools on our benchmarks. BNSynth is available on GitHub under an open source license.

translated by 谷歌翻译