从自动驾驶车辆和倒车机器人到虚拟助手,我们下一次在美发沙龙或在那家餐厅用餐 - 机器学习系统越来越普遍。这样做的主要原因是这些方法具有非凡的预测能力。然而,这些模型中的大多数仍然是黑盒子,这意味着人类追随并理解其错综复杂的内部运作是非常具有挑战性的。因此,在这种日益复杂的复杂性下,可解释性受到了影响。机器学习模型。特别是对于新规则,例如通用数据保护条例(GDPR),这些黑箱所做出的合理性和可预测性的必要性是不可或缺的。在行业和实践需求的推动下,研究界已经认识到这种可解释性问题,并着重于在过去的几年中开发出越来越多的所谓解释方法。这些方法解释了黑盒机器学习模型所做的个人预测,并有助于恢复一些丢失的可解释性。然而,随着这些解释方法的扩散,通常不清楚哪种解释方法提供更高的解释质量,或者通常更适合于手头的情况。因此,在本论文中,我们提出了anaxiomatic框架,它允许比较不同平台方法的质量。通过实验验证,我们发现开发的框架有助于评估不同解释方法的解释质量,并得出在独立研究中一致的结论。
translated by 谷歌翻译
准确的患者个体生存分布模型可以帮助确定终末患者的适当治疗方案。不幸的是,风险核心(例如,来自Cox比例风险模型)不提供生存概率,单次概率模型(例如,Gail模型,预测5年概率)仅提供单个时间点,而标准的Kaplan-Meier生存曲线仅提供人口大类患者的平均值意味着他们不是针对个体患者的特异性。这可以激发另一类工具,可以学习提供个体生存分布的模型,该模型提供所有时间的生存概率 - 例如Cox模型的扩展,加速失败时间,对随机生存森林的延伸和多任务逻辑回归。本文首先激发了这种“个体生存分布”(ISD)模型,并指出它们与标准模型的区别。然后讨论了评估此类模型的方法 - 即一致性,1校准,Brier评分和各种版本的L1损失 - 然后激励并定义了一种新方法“D校准”,它确定模型的概率估计是否有意义。我们还讨论了这些测量方法的不同之处,并使用它们来评估一系列生存数据集中的几种ISD预测工具。
translated by 谷歌翻译
随着信息技术的迅速发展,与健康相关的数据为医学和健康发现带来了前所未有的潜力,同时也是机器学习技术在规模和复杂性方面的重大挑战。这些挑战包括:具有各种存储格式的结构化数据和由异构数据源引起的值类型;医学诊断和治疗各方面广泛存在的不确定性;特征空间的高维度;纵向医疗记录数据,相邻观察之间的间隔不规则;具有相似遗传因素,位置或社会人口背景的物体之间存在丰富的关系。本文旨在开发先进的统计关系学习方法,以有效地利用这些与健康相关的数据,促进医学研究的发现。它介绍了挖掘结构化不平衡数据的成本敏感统计关系学习的工作,第一个用于预测纵向结构化数据的连续事件的连续时间概率逻辑模型,以及用于从异构结构数据中学习的混合概率关系模型。它还展示了这些提出的模型以及其他最先进的机器学习模型在应用于医学研究问题和其他现实世界的大型系统时的出色表现,揭示了统计关系学习在探索结构化健康相关数据方面的巨大潜力促进医学研究。
translated by 谷歌翻译
向重症监护医生提供来自多种监测系统的大量患者信息和测量结果。人类处理这种复杂信息的有限能力阻碍了医生容易地识别并对患者恶化的早期迹象采取行动。我们使用机器学习基于具有240个患者年数据的高分辨率ICU数据库开发用于循环衰竭的早期预警系统。该自动系统预测90.0%的循环衰竭事件(患病率为3.1%),81.8%预先确定超过两小时,导致接收器操作特征曲线下面积为94.0%,精确召回曲线下面积为63.0% 。该模型在大型独立患者队列中进行了外部验证。
translated by 谷歌翻译
Understanding predictive models, in terms of interpreting and identifying actionable insights, is a challenging task. Often the importance of a feature in a model is only a rough estimate condensed into one number. However, our research goes beyond these na¨ıvena¨ıve estimates through the design and implementation of an interactive visual analytics system, Prospector. By providing interactive partial dependence diagnostics, data scientists can understand how features affect the prediction overall. In addition, our support for localized inspection allows data scientists to understand how and why specific dat-apoints are predicted as they are, as well as support for tweak-ing feature values and seeing how the prediction responds. Our system is then evaluated using a case study involving a team of data scientists improving predictive models for detecting the onset of diabetes from electronic medical records.
translated by 谷歌翻译
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles , namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications. 1. Introduction. The starting point for this paper is an interesting procedure called "boosting," which is a way of combining the performance of many "weak" classifiers to produce a powerful "committee." Boosting was proposed in the computational learning theory literature [Schapire (1990), Freund (1995), Freund and Schapire (1997)] and has since received much attention. While boosting has evolved somewhat over the years, we describe the most commonly used version of the AdaBoost procedure [Freund and Schapire
translated by 谷歌翻译
了解模型进行某种预测的原因与许多应用程序中预测的准确性一样重要。然而,大型现代数据集的最高精度通常是通过复杂的模型来实现的,即使是专家也可以解释,例如集合或深度学习模型,在准确性和可解释性之间创造扩展。作为回应,最近提出了各种方法来帮助用户解释复杂模型的预测,但是通常不清楚这些方法是如何相关的,并且当一种方法优于另一种方法时。为了解决这个问题,我们提出了一个用于解释预测的统一框架,即SHAP(SHapley AdditiveexPlanations)。 SHAP为每个要素分配特定预测的重要性值。其新颖的组成部分包括:(1)确定一类新的附加特征重要性度量,以及(2)理论结果表明该类中有一个独特的解决方案,具有一系列理想的性质。新类统一了六种现有方法,值得注意的是因为该类中最近的几种方法缺乏所提出的理想特性。基于这种统一的见解,我们提出了新的方法,这些方法显示出比以前的方法更好的计算性能和/或与人类直觉更好的一致性。
translated by 谷歌翻译
Recommender systems have become essential tools for users to navigate the plethora of content in the online world. Collaborative filtering-a broad term referring to the use of a variety, or combination, of machine learning algorithms operating on user ratings-lies at the heart of recommender systems' success. These algorithms have been traditionally studied from the point of view of how well they can predict users' ratings and how precisely they rank content; state of the art approaches are continuously improved in these respects. However, a rift has grown between how filtering algorithms are investigated and how they will operate when deployed in real systems. Deployed systems will continuously be queried for personalised recommendations; in practice, this implies that system administrators will iteratively retrain their algorithms in order to include the latest ratings. Collaborative filtering research does not take this into account: algorithms are improved and compared to each other from a static viewpoint, while they will be ultimately deployed in a dynamic setting. Given this scenario, two new problems emerge: current filtering algorithms are neither (a) designed nor (b) evaluated as algorithms that must account for time. This thesis addresses the divergence between research and practice by examining how collaborative filtering algorithms behave over time. Our contributions include: 1. A fine grained analysis of temporal changes in rating data and user/item similarity graphs that clearly demonstrates how recommender system data is dynamic and constantly changing. 2. A novel methodology and time-based metrics for evaluating collaborative filtering over time, both in terms of accuracy and the diversity of top-N recommendations. 3. A set of hybrid algorithms that improve collaborative filtering in a range of different scenarios. These include temporal-switching algorithms that aim to promote either accuracy or diversity; parameter update methods to improve temporal accuracy; and re-ranking a subset of users' recommendations in order to increase diversity. 4. A set of temporal monitors that secure collaborative filtering from a wide range of different temporal attacks by flagging anomalous rating patterns. We have implemented and extensively evaluated the above using large-scale sets of user ratings; we further discuss how this novel methodology provides insight into dimensions of recommender systems that were previously unexplored. We conclude that investigating collaborative filtering from a temporal perspective is not only more suitable to the context in which recommender systems are deployed, but also opens a number of future research opportunities.
translated by 谷歌翻译
在过去几年中,许多准确的决策支持系统被构建为黑盒子,即将内部逻辑隐藏到用户的系统。这种缺乏解释构成了一个实际和邪恶的问题。文献报道了许多旨在克服这种危险性的方法,有时以牺牲可解释性的准确性为代价。可以使用黑盒决策系统的应用是多种多样的,并且每种方法通常被开发以提供特定问题的解决方案,并且因此明确地或明确地描述其自身对可解释性和解释的定义。本文的目的是提供一个关于解释概念和黑箱系统类型的文献中解决的主要问题的分类。鉴于问题定义,黑匣子类型和愿望展示,这项调查应该有助于研究人员找到对他自己的工作更有用的建议。对开放式黑盒子模型的方法的拟议分类也应该有助于对许多研究开放式问题进行透视。
translated by 谷歌翻译
translated by 谷歌翻译
我们提供了一个新颖的概念,即可以解释的含义,通过与人类理解的通常联系。我们的关键见解是,解释性不是一个绝对的概念,所以我们相对于目标模型来定义它,它可能是也可能不是人类。我们定义了一个框架,允许通过将可解释的过程与重要的实际方面(如准确性和健壮性)相关联来比较它们。我们在框架中描述了许多当前最先进的可解释方法,描述了它的一般适用性。最后,提出了原则性可解释策略,并根据合成数据以及最近提供的最大公共嗅觉数据集\ cite {olfs}进行了实证评估。我们还在MNIST上用简单的目标模型和不同的具有不同复杂性的oracle模型进行了实验。这导致了这一观点,即目标模型的改进不仅是oracle模型性能的函数,而且还是相对于目标模型的相对复杂性。进一步的实验CIFAR-10,一个真实的制造数据集和FICO数据集展示了当目标模型简单且复杂模型是神经网络时我们的方法优于知识蒸馏的好处。
translated by 谷歌翻译
We introduce a very general method for high dimensional classification, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower dimensional space. In one special case that we study in detail, the random projections are divided into disjoint groups, and within each group we select the projection yielding the smallest estimate of the test error. Our random-projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment. Our theoretical results elucidate the effect on performance of increasing the number of projections. Moreover, under a boundary condition that is implied by the sufficient dimension reduction assumption, we show that the test excess risk of the random-projection ensemble classifier can be controlled by terms that do not depend on the original data dimension and a term that becomes negligible as the number of projections increases. The classifier is also compared empirically with several other popular high dimensional classifiers via an extensive simulation study, which reveals its excellent finite sample performance.
translated by 谷歌翻译
We are honored to welcome you to the 2nd International Workshop on Advanced Analyt-ics and Learning on Temporal Data (AALTD), which is held in Riva del Garda, Italy, on September 19th, 2016, co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016). The aim of this workshop is to bring together researchers and experts in machine learning, data mining, pattern analysis and statistics to share their challenging issues and advance researches on temporal data analysis. Analysis and learning from temporal data cover a wide scope of tasks including learning metrics, learning representations, unsupervised feature extraction, clustering and classification. This volume contains the conference program, an abstract of the invited keynotes and the set of regular papers accepted to be presented at the conference. Each of the submitted papers was reviewed by at least two independent reviewers, leading to the selection of eleven papers accepted for presentation and inclusion into the program and these proceedings. The contributions are given by the alphabetical order, by surname. The keynote given by Marco Cuturi on "Regularized DTW Divergences for Time Se-ries" focuses on the definition of alignment kernels for time series that can later be used at the core of standard machine learning algorithms. The one given by Tony Bagnall on "The Great Time Series Classification Bake Off" presents an important attempt to experimentally compare performance of a wide range of time series classifiers, together with ensemble classifiers that aim at combining existing classifiers to improve classification quality. Accepted papers spanned from innovative ideas on analytic of temporal data, including promising new approaches and covering both practical and theoretical issues. We wish to thank the ECML PKDD council members for giving us the opportunity to hold the AALTD workshop within the framework of the ECML/PKDD Conference and the members of the local organizing committee for their support. The organizers of the AALTD conference gratefully thank the financial support of the Université de Rennes 2, MODES and Universidade da Coruña. Last but not least, we wish to thank the contributing authors for the high quality works and all members of the Reviewing Committee for their invaluable assistance in the iii selection process. All of them have significantly contributed to the success of AALTD 2106. We sincerely hope that the workshop participants have a great and fruitful time at the conference.
translated by 谷歌翻译
We are honored to welcome you to the 2nd International Workshop on Advanced Analyt-ics and Learning on Temporal Data (AALTD), which is held in Riva del Garda, Italy, on September 19th, 2016, co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016). The aim of this workshop is to bring together researchers and experts in machine learning, data mining, pattern analysis and statistics to share their challenging issues and advance researches on temporal data analysis. Analysis and learning from temporal data cover a wide scope of tasks including learning metrics, learning representations, unsupervised feature extraction, clustering and classification. This volume contains the conference program, an abstract of the invited keynotes and the set of regular papers accepted to be presented at the conference. Each of the submitted papers was reviewed by at least two independent reviewers, leading to the selection of eleven papers accepted for presentation and inclusion into the program and these proceedings. The contributions are given by the alphabetical order, by surname. The keynote given by Marco Cuturi on "Regularized DTW Divergences for Time Se-ries" focuses on the definition of alignment kernels for time series that can later be used at the core of standard machine learning algorithms. The one given by Tony Bagnall on "The Great Time Series Classification Bake Off" presents an important attempt to experimentally compare performance of a wide range of time series classifiers, together with ensemble classifiers that aim at combining existing classifiers to improve classification quality. Accepted papers spanned from innovative ideas on analytic of temporal data, including promising new approaches and covering both practical and theoretical issues. We wish to thank the ECML PKDD council members for giving us the opportunity to hold the AALTD workshop within the framework of the ECML/PKDD Conference and the members of the local organizing committee for their support. The organizers of the AALTD conference gratefully thank the financial support of the Université de Rennes 2, MODES and Universidade da Coruña. Last but not least, we wish to thank the contributing authors for the high quality works and all members of the Reviewing Committee for their invaluable assistance in the iii selection process. All of them have significantly contributed to the success of AALTD 2106. We sincerely hope that the workshop participants have a great and fruitful time at the conference.
translated by 谷歌翻译
There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.
translated by 谷歌翻译
我们最近看到许多成功应用复发神经网络(RNN)的电子病历(EMR),其中包含患者的诊断,药物和其他各种事件的历史,以便预测患者的当前和未来状态。尽管RNN具有强大的性能,但用户通常很难理解为什么模型会做出特定的预测。 RNN的这种黑盒性质可能阻碍临床实践的广泛采用。此外,我们没有确定的方法来交互式地利用用户的领域专业知识和先前的知识输入来指导模型。因此,我们的设计研究旨在通过医学专家,人工智能科学家和视觉分析研究人员的共同努力,提供可视化分析解决方案,以提高NRN的可解释性和互动性。在专家之间的迭代设计过程之后,我们设计,实施和评估了一个名为RetainVis的可视化分析工具,它结合了一个新改进的,可解释的和交互式的基于RNN的模型RetainEX和可视化,供用户在预测任务环境中探索EMR数据。我们的研究表明,使用RetainVis可以有效利用个体医疗代码如何利用心力衰竭和白内障症状患者的EMR进行风险预测。我们的研究还展示了我们如何对名为RETAIN的最先进的RNN模型进行实质性改变,以便利用时间信息并增加交互性。本研究将为研究人员提供有用的指导方针,旨在为RNN设计可解释和交互式的可视化分析工具。
translated by 谷歌翻译
The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: methods based on Additive Noise Models (ANMs) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 data sets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One *. Part of this work was done while JMM and JZ were with the MPI Tübingen, and JP with ETH Zürich. of the best performing methods overall is the method based on Additive Noise Models that has originally been proposed by Hoyer et al. (2009), which obtains an accuracy of 63 ± 10 % and an AUC of 0.74 ± 0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.
translated by 谷歌翻译
集合方法 - 特别是基于决策树的方法 - 最近在各种机器学习环境中表现出优异的性能。我们引入了许多现有决策树方法的概括,称为“随机投影森林”(RPF),它是使用(可能是数据相关和随机)线性投影的任何决策林。使用这个框架,我们引入了一个名为“Lumberjack”的特殊情况,使用非常稀疏的randomprojection,即一小部分特征的线性组合.Lumberjack在RandomForests,Gradient Boosted Trees和其他方法上获得统计上显着提高的准确性。用于分类的标准基准测试,具有不同的尺寸,样本大小和类别数量。为了说明Lumberjack如何,为什么以及何时优于其他方法,我们在矢量,图像和非线性流形中进行了广泛的模拟实验。 Lumberjack通常比现有的决策树集合产生更好的性能,同时降低计算效率和可扩展性,并保持可解释性。伐木工人可以很容易地融入其他集合方法,例如加强以获得潜在的相似收益。
translated by 谷歌翻译