智能论文笔记

Detecting Distributional Differences in Labeled Sequence Data with Application to Tropical Cyclone Satellite Imagery

Trey McNeely , Galen Vincent , Kimberly M. Wood , Rafael Izbicki , Ann B. Lee

分类： (统计)机器学习

2022-02-04

我们的目标是量化热带旋风（TC）卫星图像中的时空模式是否以及如何量化，信号是即将发生的快速强度变化事件。为了解决这个问题，我们提出了一个新的非参数测试，对图像的时间序列和一系列二进制事件标签之间的关联测试。我们询问在事件之前与非事件之前的图像的24小时序列之间的分布差异（相关但分布相同）之间是否存在差异。通过将统计检验重写为回归问题，我们利用神经网络来推断TC对流的结构演变模式，这些模式代表了促进快速强度变化事件的导致。附近序列之间的依赖性通过估计标签系列边际分布的自举程序来处理。我们证明，只要标签系列的分布得到充分估计，就可以保证I型错误控制，这可以通过二进制TC事件标签的广泛历史数据更容易。我们表明的经验证据表明，我们提出的方法确定了与快速强化风险相关的红外图像原型，通常以随着时间的推移深度或深化核心对流标记。这样的结果为改善快速强化的预测提供了基础。

translated by 谷歌翻译

Identifying Distributional Differences in Convective Evolution Prior to Rapid Intensification in Tropical Cyclones

Trey McNeely , Galen Vincent , Rafael Izbicki , Kimberly M. Wood , Ann B. Lee

分类： (统计)机器学习 | 机器学习

2021-09-24

通过人类预报员颁发热带气旋（TC）强度预测，他们评估时空观测（例如，卫星图像）和模型输出（例如，数值天气预报，统计模型）每6小时生产预测。在这些时间约束中，绘制来自此类数据的洞察力可能具有挑战性。虽然高容量机器学习方法非常适合具有复杂序列数据的预测问题，但难以提取可解释的科学信息。在这里，我们利用强大的AI预测算法和经典的统计推理，以识别TC对流结构的演变中的模式，这导致风暴的快速增强，从而为TC行为提供了预报员和科学家。

translated by 谷歌翻译

AdsorbML: Accelerating Adsorption Energy Calculations with Machine Learning

Janice Lan , Aini Palizhati , Muhammed Shuaibi , Brandon M. Wood , Brook Wander , Abhishek Das , Matt Uyttendaele , C. Lawrence Zitnick , Zachary W. Ulissi

分类：机器学习

2022-11-29

Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the minimum binding energy - the adsorption energy - for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration, within a 0.1 eV threshold, 86.63% of the time, while achieving a 1387x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 87,045 unique configurations.

translated by 谷歌翻译

Deep Learning Generates Synthetic Cancer Histology for Explainability and Education

James M. Dolezal , Rachelle Wolk , Hanna M. Hieromnimon , Frederick M. Howard , Andrew Srisuwananukorn , Dmitry Karpeyev , Siddhi Ramesh , Sara Kochanny , Jung Woo Kwon , Meghana Agni

分类：计算机视觉

2022-11-12

Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.

translated by 谷歌翻译

Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization

Killian Wood , Alec M. Dunton , Amanda Muyskens , Benjamin W. Priest

分类：机器学习 | (统计)机器学习

2022-09-22

高斯工艺（GPS）是贝叶斯非参数模型，由于其准确性和天然不确定性定量（UQ），因此在各种应用中流行。调整GP超参数对于确保预测准确性和不确定性的有效性至关重要。独特地估计多个超参数，例如Matern内核也可能是一个重大挑战。此外，大规模数据集中的培训GPS是一个高度活跃的研究领域：传统的最大似然超参数训练需要二次记忆以形成协方差矩阵并具有立方训练的复杂性。为了解决可扩展的超参数调整问题，我们提出了一种新型算法，该算法估算了Matern内核中的平滑度和长度尺度参数，以提高所得预测不确定性的鲁棒性。使用与超参数估计算法MUYGPS提供的计算框架中的合并预测算法相似的新型损失函数，我们在数值实验中证明了高度可伸缩性，同时保持了高度可伸缩性。

translated by 谷歌翻译

The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysis

Richard Tran , Janice Lan , Muhammed Shuaibi , Siddharth Goyal , Brandon M. Wood , Abhishek Das , Javier Heras-Domingo , Adeesh Kolluru , Ammar Rizvi , Nima Shoghi

分类：机器学习

2022-06-17

计算催化和机器学习社区在开发用于催化剂发现和设计的机器学习模型方面取得了长足的进步。然而，跨越催化的化学空间的一般机器学习潜力仍然无法触及。一个重大障碍是在广泛的材料中获得访问培训数据的访问。缺乏数据的一类重要材料是氧化物，它抑制模型无法更广泛地研究氧气进化反应和氧化物电催化。为了解决这个问题，我们开发了开放的催化剂2022（OC22）数据集，包括62,521个密度功能理论（DFT）放松（〜9,884,504个单点计算），遍及一系列氧化物材料，覆盖范围，覆盖率和吸附物（ *H， *o， *o， *o， *o， *o， * n， *c， *ooh， *oh， *oh2， *o2， *co）。我们定义广义任务，以预测催化过程中适用的总系统能量，发展几个图神经网络的基线性能（Schnet，Dimenet ++，Forcenet，Spinconv，Painn，Painn，Gemnet-DT，Gemnet-DT，Gemnet-OC），并提供预先定义的数据集分割以建立明确的基准，以实现未来的努力。对于所有任务，我们研究组合数据集是否会带来更好的结果，即使它们包含不同的材料或吸附物。具体而言，我们在Open Catalyst 2020（OC20）数据集和OC22上共同训练模型，或OC22上的微调OC20型号。在最一般的任务中，Gemnet-OC看到通过微调来提高了约32％的能量预测，通过联合训练的力预测提高了约9％。令人惊讶的是，OC20和较小的OC22数据集的联合培训也将OC20的总能量预测提高了约19％。数据集和基线模型是开源的，公众排行榜将遵循，以鼓励社区的持续发展，以了解总能源任务和数据。

translated by 谷歌翻译

Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation

Pengfei Guo , Dong Yang , Ali Hatamizadeh , An Xu , Ziyue Xu , Wenqi Li , Can Zhao , Daguang Xu , Stephanie Harmon , Evrim Turkbey

分类：计算机视觉

2022-03-12

联合学习（FL）是一种分布式机器学习技术，可以在避免明确的数据共享的同时进行协作模型培训。 FL算法的固有保护属性使其对医疗领域特别有吸引力。但是，如果有异质的客户数据分布，则标准FL方法是不稳定的，需要密集的超参数调整以实现最佳性能。常规的超参数优化算法在现实世界中的FL应用中是不切实际的，因为它们涉及大量的培训试验，而计算预算有限，这些试验通常是不起作用的。在这项工作中，我们提出了一种有效的增强学习（RL）的联合次数超参数优化算法，称为自动FEDRL，其中在线RL代理可以根据当前的培训进度动态调整每个客户的超参数。进行了广泛的实验以研究不同的搜索策略和RL代理。该方法的有效性在CIFAR-10数据集的异质数据分配以及两个现实世界中的医学图像分割数据集上进行了验证，用于胸部CT中的COVID-19变病变分段，腹部CT中的胰腺细分。

translated by 谷歌翻译

Integrated multimodal artificial intelligence framework for healthcare applications

Luis R. Soenksen , Yu Ma , Cynthia Zeng , Leonard D. J. Boussioux , Kimberly Villalobos Carballo , Liangyuan Na , Holly M. Wiberg , Michael L. Li , Ignacio Fuentes , Dimitris Bertsimas

分类：机器学习 | 人工智能

2022-02-25

人工智能（AI）系统在接下来的几十年中有很大的希望可以改善医疗保健。具体而言，利用多个数据源和输入模式的AI系统有望成为一种可行的方法，可以在广泛的应用程序中提供更准确的结果和可部署的管道。在这项工作中，我们提出并评估一个统一的医学中的整体AI（HAIM）框架，以促进利用多模式输入的AI系统的生成和测试。我们的方法使用可通用的数据预处理和机器学习建模阶段，可以很容易地适应医疗保健环境中的研究和部署。我们通过训练和表征基于MIMIC-IV-MM的14,324个独立模型来评估我们的HAIM框架，该模型是一种多模式临床数据库（n = 34,537个样本），其中包含7,279个独特的住院和6,485名患者，涵盖了4个数据模态的所有可能输入组合（即，所有可能的输入组合）表格，时间序列，文本和图像），11个独特的数据源和12个预测任务。我们表明，该框架可以始终如一地生产出在各种医疗保健示范中超过相似的单源方法的模型（乘以6-33％），包括10种不同的胸部病理学诊断，以及休息时间和48小时的死亡率预测。我们还使用Shapley值量化了每种模式和数据源的贡献，这证明了数据类型重要性的异质性以及在不同医疗保健相关的任务中多模式输入的必要性。我们的整体医学AI（HAIM）框架的可推广性能和灵活性可以为未来的临床和运营医疗环境中的多模式预测系统提供有希望的途径。

translated by 谷歌翻译

Development of collective behavior in newborn artificial agents

Donsuk Lee , Samantha M. W. Wood , Justin N. Wood

分类：人工智能

2021-11-06

集体行为在动物王国范围内普遍存在。然而，迄今为止，集体行为的发展和机械基础尚未正式建立。什么学会机制推动新生动物中集体行为的发展？在这里，我们使用了深度增强学习和好奇心驱动的学习 - 深深植根于心理和神经科学研究的两种学习机制 - 建立开发集体行为的新生人工代理。像新生动物一样，我们的代理商学习来自自然主义环境中的原始感官投入的集体行为。我们的代理商还学习没有外部奖励的集体行为，只使用内在的动机（好奇心）来推动学习。具体而言，当我们在具有组织中的自然视觉环境中提高人工剂时，该代理自发地发展为自我运动，对象识别，以及对组织的偏好，迅速学习集体行为所需的所有核心技能。这项工作桥接了高维感官输入和集体动作之间的划分，导致了集体动物行为的像素与动作模型。更一般地说，我们表明，两个通用学习机制 - 深度加强学习和好奇心驱动的学习 - 足以学习来自无监督的自然体验的集体行为。

translated by 谷歌翻译

Procedural Humans for Computer Vision

Charlie Hewitt , Tadas Baltrušaitis , Erroll Wood , Lohit Petikam , Louis Florentin , Hanz Cuevas Velasquez

分类：计算机视觉

2023-01-03

Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving to face landmark detection and reconstruction. There are a number of benefits of using synthetic data from privacy preservation and bias elimination to quality and feasibility of annotation. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective machine learning models can be trained using synthetic face data alone. We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety, with ground-truth annotations for computer vision applications. In this report we describe how we construct a parametric model of the face and body, including articulated hands; our rendering pipeline to generate realistic images of humans based on this body model; an approach for training DNNs to regress a dense set of landmarks covering the entire body; and a method for fitting our body model to dense landmarks predicted from multiple views.

translated by 谷歌翻译