智能论文笔记

Deep learning at the edge enables real-time streaming ptychographic imaging

Anakha V Babu , Tao Zhou , Saugat Kandel , Tekin Bicer , Zhengchun Liu , William Judge , Daniel J. Ching , Yi Jiang , Sinisa Veseli , Steven Henke

分类：机器学习

2022-09-20

相干显微镜技术提供了跨科学和技术领域的材料的无与伦比的多尺度视图，从结构材料到量子设备，从综合电路到生物细胞。在构造更明亮的来源和高速探测器的驱动下，连贯的X射线显微镜方法（如Ptychography）有望彻底改变纳米级材料的特征。但是，相关的数据和计算需求显着增加意味着，常规方法不再足以从高速相干成像实验实时恢复样品图像。在这里，我们演示了一个工作流程，该工作流利用边缘的人工智能和高性能计算，以实现直接从检测器直接从检测器流出的X射线ptychography数据实时反演。拟议的AI支持的工作流程消除了传统的Ptychography施加的采样约束，从而使用比传统方法所需的数据较少的数据级允许低剂量成像。

translated by 谷歌翻译

Globus Automation Services: Research process automation across the space-time continuum

Ryan Chard , Jim Pruyne , Kurt McKee , Josh Bryan , Brigitte Raumann , Rachana Ananthakrishnan , Kyle Chard , Ian Foster

分类：人工智能

2022-08-19

研究过程自动化 - 对科学仪器，计算机，数据存储和其他资源的可靠，高效和可重复执行的可靠，高效和可重复执行，这是现代科学的基本要素。我们在此处报告Globus研究数据管理平台内的新服务，该服务可以将各种研究过程的规范作为可重复使用的动作集，流量以及在异质研究环境中执行此类流动的集合。为了以广泛的空间范围（例如，从科学仪器到远程数据中心）和时间范围（从几秒钟到几周），这些Globus自动化服务功能：1）云托管以可靠地执行长期持久的流量，尽管零星的失败，但这些Globus自动化服务功能：1） ; 2）声明性符号和可扩展的异步行动提供商API，用于定义和执行涉及任意资源的各种行动和流动规范； 3）授权授权机制，用于安全调用动作。这些服务允许研究人员将广泛的研究任务的管理外包和自动化为可靠，可扩展和安全的云平台。我们向Globus自动化服务提供用例

translated by 谷歌翻译

FAIR principles for AI models, with a practical application for accelerated high energy diffraction microscopy

Nikil Ravi , Pranshu Chaturvedi , E. A. Huerta , Zhengchun Liu , Ryan Chard , Aristana Scourtas , K. J. Schmidt , Kyle Chard , Ben Blaiszik , Ian Foster

分类：人工智能 | 机器学习

2022-07-01

科学数据的一套简洁且可衡量的公平（可访问，可互操作和可重复使用的）原则正在转变用于数据管理和管理的最新实践，以支持和支持发现和创新。从这项计划中学习，并承认人工智能（AI）在科学和工程实践中的影响，我们为AI模型引入了一套实用，简洁和可衡量的公平原则。我们展示了如何在统一的计算框架内创建和共享公平的数据和AI模型，结合了以下要素：Argonne国家实验室的高级光子源，材料数据设施，科学数据和学习中心，Funcx和Argonne Leadersition的数据和学习中心计算设施（ALCF），尤其是ALCF AI测试台的Thetagpu SuperCuputer和Sambanova Datascale系统。我们描述了如何利用这种域 - 不足的计算框架来实现自主AI驱动的发现。

translated by 谷歌翻译

The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

Addison J. Hu , Alden Green , Ryan J. Tibshirani

分类： (统计)机器学习 | 机器学习

2022-12-30

We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.

translated by 谷歌翻译

A Hypergraph Neural Network Framework for Learning Hyperedge-Dependent Node Embeddings

Ryan Aponte , Ryan A. Rossi , Shunan Guo , Jane Hoffswell , Nedim Lipka , Chang Xiao , Gromit Chan , Eunyee Koh , Nesreen Ahmed

分类：机器学习

2022-12-28

In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedge-dependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.

translated by 谷歌翻译

PersonaSAGE: A Multi-Persona Graph Neural Network

Gautam Choudhary , Iftikhar Ahamath Burhanuddin , Eunyee Koh , Fan Du , Ryan A. Rossi

分类：机器学习

2022-12-28

Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform.

translated by 谷歌翻译

Deep Learning for Space Weather Prediction: Bridging the Gap between Heliophysics Data and Theory

John C. Dorelli , Chris Bard , Thomas Y. Chen , Daniel Da Silva , Luiz Fernando Guides dos Santos , Jack Ireland , Michael Kirk , Ryan McGranaghan , Ayris Narock , Teresa Nieves-Chinchilla

分类：机器学习

2022-12-27

Traditionally, data analysis and theory have been viewed as separate disciplines, each feeding into fundamentally different types of models. Modern deep learning technology is beginning to unify these two disciplines and will produce a new class of predictively powerful space weather models that combine the physical insights gained by data and theory. We call on NASA to invest in the research and infrastructure necessary for the heliophysics' community to take advantage of these advances.

translated by 谷歌翻译

Graph Learning with Localized Neighborhood Fairness

April Chen , Ryan Rossi , Nedim Lipka , Jane Hoffswell , Gromit Chan , Shunan Guo , Eunyee Koh , Sungchul Kim , Nesreen K. Ahmed

分类：机器学习

2022-12-22

Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph.Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.

translated by 谷歌翻译

Ontologically Faithful Generation of Non-Player Character Dialogues

Nathaniel Weir , Ryan Thomas , Randolph D'Amore , Kellie Hill , Benjamin Van Durme , Harsh Jhamtani

分类：自然语言处理

2022-12-20

We introduce a language generation task grounded in a popular video game environment. KNUDGE (KNowledge Constrained User-NPC Dialogue GEneration) involves generating dialogue trees conditioned on an ontology captured in natural language passages providing quest and entity specifications. KNUDGE is constructed from side quest dialogues drawn directly from game data of Obsidian Entertainment's The Outer Worlds, leading to real-world complexities in generation: (1) dialogues are branching trees as opposed to linear chains of utterances; (2) utterances must remain faithful to the game lore--character personas, backstories, and entity relationships; and (3) a dialogue must accurately reveal new quest-related details to the human player. We report results for supervised and in-context learning techniques, finding there is significant room for future work on creating realistic game-quality dialogues.

translated by 谷歌翻译

A Measure-Theoretic Characterization of Tight Language Models

Li Du , Lucas Torroba Hennigen , Tiago Pimentel , Clara Meister , Jason Eisner , Ryan Cotterell

分类：自然语言处理

2022-12-20

Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.

translated by 谷歌翻译