智能论文笔记

Statistical Shape Modeling of Biventricular Anatomy with Shared Boundaries

Krithika Iyer , Alan Morris , Brian Zenger , Karthik Karnath , Benjamin A Orkild , Oleksandre Korshak , Shireen Elhabian

分类：计算机视觉

2022-09-06

统计形状建模（SSM）是一种有价值且强大的工具，可以生成复杂解剖结构的详细表示，该解剖结构可以实现定量分析和形状及其变化的比较。 SSM应用数学，统计和计算来将形状解析为定量表示（例如对应点或地标），这些表示将有助于回答有关整个人群解剖学变化的各种问题。复杂的解剖结构具有许多不同的部分，具有不同的相互作用或复杂的结构。例如，心脏是四腔解剖结构，腔室之间有几个共同的边界。对于在整个身体中充分灌注末端器官，必要的心脏腔室的协调和有效收缩是必要的。这些心脏共享边界内的细微形状变化可以表明潜在的病理变化，导致不协调的收缩和末端器官灌注不良。早期检测和稳健的量化可以洞悉理想的治疗技术和干预时机。但是，现有的SSM方法无法明确对共享边界的统计数据进行建模。本文提出了一种通用且灵活的数据驱动方法，用于构建具有共同边界的多器官解剖结构的统计形状模型，可捕获单个解剖学及其在整个人群中共享边界表面的形态和对齐变化。我们通过开发形状模型来证明使用双脑室心脏数据集的提议方法的有效性，从而在整个人群数据中始终如一地参数化心脏双脑室结构和介入的室内隔膜（共享边界表面）。

translated by 谷歌翻译

RENs: Relevance Encoding Networks

Krithika Iyer , Riddhish Bhalodia , Shireen Elhabian

分类：机器学习

2022-05-25

高维数据的歧管假设假设数据是通过改变从低维潜在空间获得的一组参数而生成的。深层生成模型（DGM）被广泛用于以无监督的方式学习数据表示。 DGM使用瓶颈体系结构（例如变异自动编码器（VAE））参数化数据空间中的基础低维歧管。 VAE的瓶颈尺寸被视为取决于数据集的超参数，并在广泛调整后在设计时间固定。由于大多数实际数据集的内在维度尚不清楚，因此固有维度与选择为超参数的潜在维度之间存在不匹配。这种不匹配可能会对表示形式学习和样本生成任务的模型性能产生负面影响。本文提出了相关性编码网络（RENS）：一种新型的基于VAE的概率VAE框架，该框架在潜在空间中使用自动相关性确定（ARD）来学习数据特定的瓶颈维度。每个潜在维度的相关性是直接从数据以及使用随机梯度下降的其他模型参数以及适合非高斯先验的重新聚集技巧的其他模型参数中学到的。我们利用深处的概念来捕获数据和潜在空间中的置换统计属性，以确定相关性。所提出的框架是一般且灵活的，可用于最先进的VAE模型，该模型利用正规化器在潜在空间中施加特定特征（例如，脱离）。通过对合成和公共图像数据集进行了广泛的实验，我们表明，所提出的模型了解了相关的潜在瓶颈维度，而不会损害样品的表示和发电质量。

translated by 谷歌翻译

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Srinivasan Iyer , Xi Victoria Lin , Ramakanth Pasunuru , Todor Mihaylov , Daniel Simig , Ping Yu , Kurt Shuster , Tianlu Wang , Qing Liu , Punit Singh Koura

分类：自然语言处理

2022-12-22

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.

translated by 谷歌翻译

Decoding surface codes with deep reinforcement learning and probabilistic policy reuse

Elisha Siddiqui Matekole , Esther Ye , Ramya Iyer , Samuel Yen-Chi Chen

分类：人工智能 | 机器学习 | 神经与进化计算

2022-12-22

Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.

translated by 谷歌翻译

Maximal Initial Learning Rates in Deep ReLU Networks

Gaurav Iyer , Boris Hanin , David Rolnick

分类： (统计)机器学习 | 机器学习

2022-12-14

Training a neural network requires choosing a suitable learning rate, involving a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ demonstrates different behavior to the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of $(\text{depth} \times \text{width})$, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer of the network is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating that they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of $(\text{depth} \times \text{width})$ that align with our empirical results.

translated by 谷歌翻译

Demystifying Prompts in Language Models via Perplexity Estimation

Hila Gonen , Srini Iyer , Terra Blevins , Noah A. Smith , Luke Zettlemoyer

分类：自然语言处理

2022-12-08

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best prompts. In this work, we analyze the factors that contribute to this variance and establish a new empirical hypothesis: the performance of a prompt is coupled with the extent to which the model is familiar with the language it contains. Over a wide range of tasks, we show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task. As a result, we devise a method for creating prompts: (1) automatically extend a small seed set of manually written prompts by paraphrasing using GPT3 and backtranslation and (2) choose the lowest perplexity prompts to get significant gains in performance.

translated by 谷歌翻译

Fully Bayesian inference for latent variable Gaussian process models

Suraj Yerramilli , Akshay Iyer , Wei Chen , Daniel W. Apley

分类： (统计)机器学习 | 机器学习

2022-11-04

Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first mapping each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.

translated by 谷歌翻译

Low-Stabilizer-Complexity Quantum States Are Not Pseudorandom

Sabee Grewal , Vishnu Iyer , William Kretschmer , Daniel Liang

分类：机器学习

2022-09-29

我们表明，具有“低稳定器复杂性”的量子状态可以有效地与HAAR随机区分开。具体而言，给定$ n $ qubit的纯状态$ | \ psi \ rangle $，我们给出了一种有效的算法，以区分$ | \ psi \ rangle $是（i）haar-random或（ii）具有稳定器保真度的状态至少$ \ frac {1} {k} $（即，具有一些稳定器状态的保真度至少$ \ frac {1} {k} $），保证就是其中之一。使用Black-box访问$ | \ psi \ rangle $，我们的算法使用$ o \！\ left（k^{12} \ log（1/\ delta）\ right）$ copies $ | \ psi \ rangle $和$ o \！\ left（n k^{12} \ log（1/\ delta）\ right）$ $时间以概率至少$ 1- \ delta $成功，并且随着访问状态准备统一，以$ | | \ psi \ rangle $（及其倒数），$ o \！\ left（k^{3} \ log（1/\ delta）\ right）$ queries和$ o \！\！ log（1/\ delta）\ right）$时间就足够了。作为推论，我们证明$ \ omega（\ log（n））$ $ t $ - 盖特对于任何Clifford+$ t $ circile都是必不可少的，以准备计算上的pseudorandom Quantum Quantum state，这是一种首要的下限。

translated by 谷歌翻译

Streaming Encoding Algorithms for Scalable Hyperdimensional Computing

Anthony Thomas , Behnam Khaleghi , Gopi Krishna Jha , Nageen Himayat , Ravi Iyer , Nilesh Jain , Tajana Rosing

分类：机器学习 | 神经与进化计算

2022-09-20

高维计算（HDC）是用于数据表示和学习的范式，起源于计算神经科学。HDC将数据表示为高维，低精度向量，可用于学习或召回等各种信息处理任务。高维空间的映射是HDC中的一个基本问题，现有方法在输入数据本身是高维时会遇到可伸缩性问题。在这项工作中，我们探索了一个基于哈希的流媒体编码技术。我们正式表明，这些方法在学习应用程序的性能方面具有可比的保证，同时比现有替代方案更有效。我们在一个流行的高维分类问题上对这些结果进行了实验验证，并表明我们的方法很容易扩展到非常大的数据集。

translated by 谷歌翻译

Dual-Geometric Space Embedding Model for Two-View Knowledge Graphs

Roshni G. Iyer , Yunsheng Bai , Wei Wang , Yizhou Sun

分类：人工智能

2022-09-19

两视图知识图（kgs）共同表示两个组成部分：抽象和常识概念的本体论观点，以及针对本体论概念实例化的特定实体的实例视图。因此，这些kg包含来自实例视图的本体学和周期性的分层的异质结构。尽管KG中有这些不同的结构，但最新的嵌入KG的作品假设整个KG仅属于两个观点之一，但并非同时属于。对于寻求将KG视为两种视图的作品，假定实例和本体论的观点属于相同的几何空间，例如所有嵌入在同一欧几里得空间中的节点或非欧盟产品空间，不再是合理的。对于两视图kg，图表的不同部分显示出不同的结构。为了解决这个问题，我们定义并构建了一个双几何空间嵌入模型（DGS），该模型通过将KG的不同部分嵌入不同的几何空间中，该模型使用复杂的非欧盟几何几何空间进行对两视图KGS进行建模。 DGS利用球形空间，双曲线空间及其在统一框架中学习嵌入的框架中的相交空间。此外，对于球形空间，我们提出了直接在球形空间中运行的新型封闭的球形空间操作员，而无需映射到近似切线空间。公共数据集上的实验表明，DGS在KG完成任务上的先前最先进的基线模型明显优于先前的基线模型，这表明了其在KGS中更好地建模异质结构的能力。

translated by 谷歌翻译