智能论文笔记

PATO: Producibility-Aware Topology Optimization using Deep Learning for Metal Additive Manufacturing

Naresh S. Iyer , Amir M. Mirzendehdel , Sathyanarayanan Raghavan , Yang Jiao , Erva Ulu , Morad Behandish , Saigopal Nelaturi , Dean M. Robinson

分类：人工智能 | 机器学习

2021-12-08

在本文中，我们提出了帕托 - 一种可生产性感知拓扑优化（至）框架，以帮助有效地探索使用金属添加剂制造（AM）制造的部件的设计空间，同时确保相对于裂化的可制造性。具体地，通过激光粉末融合制造的部件由于从构建过程中产生的陡峭热梯度产生的高残余应力值而易于诸如翘曲或裂缝的缺陷。为这些零件的设计成熟并规划其制作可能跨越几年，通常涉及设计和制造工程师之间的多种切换。帕托基于先验的无裂缝设计的发现，使得优化部分可以在一开始就自由缺陷。为确保设计在优化期间无裂缝，可以在使用裂缝指数的标准制剂中明确地编码生产性。探索多个裂缝指数并使用实验验证，最大剪切应变指数（MSSI）被显示为准确的裂缝指数。模拟构建过程是耦合的多物理计算，并将其结合在循环中可以计算上禁止。我们利用了深度卷积神经网络的当前进步，并基于基于关注的U-Net架构的高保真代理模型，以将MSSI值预测为部分域上的空间变化的字段。此外，我们采用自动差异来直接计算关于输入设计变量的最大MSSI的梯度，并使用基于性能的灵敏度字段增强，以优化设计，同时考虑重量，可制造性和功能之间的权衡。我们通过3D基准研究以及实验验证来证明所提出的方法的有效性。

translated by 谷歌翻译

Justification-Based Reliability in Machine Learning

Nurali Virani , Naresh Iyer , Zhaoyuan Yang

分类：机器学习 | (统计)机器学习

2019-11-18

随着深度学习的出现，机器学习领域（ML）在不同的分类任务上超越了人力级别的性能。与此同时，存在表征和量化模型对单个样本预测的可靠性。在工业控制和医疗保健的安全关键域中应用此类模型尤其如此。为了解决这一需求，我们将模型的个人预测的可靠性与模型预测的认知不确定性联系起来的问题。更具体地说，我们在认识论中延长了证明真实信仰（JTB）的理论，以研究人为知识的有效性和限制，以表征监督分类机中知识的有效性和限制。我们对神经网络分类器的分析分析了将其预测的可靠性连接到从网络的输入和潜空间收集的支持的特征的输入。我们假设JTB分析暴露了模型的认识到其推断的认知不确定性（或无知），从而允许推断只能与辩护允许一样强。我们使用训练数据探索为输入生成的各种形式的支持（例如，用于输入的K-CircleS邻居（K-NN）和基于L_P-NUM-NUN），以构造与该输入的预测的理由。通过在模拟和真实数据集上进行的实验，我们证明我们的方法可以为各个预测提供可靠性，并表征这些可靠性无法确定的区域。

translated by 谷歌翻译

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Srinivasan Iyer , Xi Victoria Lin , Ramakanth Pasunuru , Todor Mihaylov , Daniel Simig , Ping Yu , Kurt Shuster , Tianlu Wang , Qing Liu , Punit Singh Koura

分类：自然语言处理

2022-12-22

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.

translated by 谷歌翻译

Decoding surface codes with deep reinforcement learning and probabilistic policy reuse

Elisha Siddiqui Matekole , Esther Ye , Ramya Iyer , Samuel Yen-Chi Chen

分类：人工智能 | 机器学习 | 神经与进化计算

2022-12-22

Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.

translated by 谷歌翻译

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective

Shihua Huang , Zhichao Lu , Kalyanmoy Deb , Vishnu Naresh Boddeti

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-21

Efforts to improve the adversarial robustness of convolutional neural networks have primarily focused on developing more effective adversarial training methods. In contrast, little attention was devoted to analyzing the role of architectural elements (such as topology, depth, and width) on adversarial robustness. This paper seeks to bridge this gap and present a holistic study on the impact of architectural design on adversarial robustness. We focus on residual networks and consider architecture design at the block level, i.e., topology, kernel size, activation, and normalization, as well as at the network scaling level, i.e., depth and width of each block in the network. In both cases, we first derive insights through systematic ablative experiments. Then we design a robust residual block, dubbed RobustResBlock, and a compound scaling rule, dubbed RobustScaling, to distribute depth and width at the desired FLOP count. Finally, we combine RobustResBlock and RobustScaling and present a portfolio of adversarially robust residual networks, RobustResNets, spanning a broad spectrum of model capacities. Experimental validation across multiple datasets and adversarial attacks demonstrate that RobustResNets consistently outperform both the standard WRNs and other existing robust architectures, achieving state-of-the-art AutoAttack robust accuracy of 61.1% without additional data and 63.7% with 500K external data while being $2\times$ more compact in terms of parameters. Code is available at \url{ https://github.com/zhichao-lu/robust-residual-network}

translated by 谷歌翻译

Maximal Initial Learning Rates in Deep ReLU Networks

Gaurav Iyer , Boris Hanin , David Rolnick

分类： (统计)机器学习 | 机器学习

2022-12-14

Training a neural network requires choosing a suitable learning rate, involving a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ demonstrates different behavior to the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of $(\text{depth} \times \text{width})$, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer of the network is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating that they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of $(\text{depth} \times \text{width})$ that align with our empirical results.

translated by 谷歌翻译

Demystifying Prompts in Language Models via Perplexity Estimation

Hila Gonen , Srini Iyer , Terra Blevins , Noah A. Smith , Luke Zettlemoyer

分类：自然语言处理

2022-12-08

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best prompts. In this work, we analyze the factors that contribute to this variance and establish a new empirical hypothesis: the performance of a prompt is coupled with the extent to which the model is familiar with the language it contains. Over a wide range of tasks, we show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task. As a result, we devise a method for creating prompts: (1) automatically extend a small seed set of manually written prompts by paraphrasing using GPT3 and backtranslation and (2) choose the lowest perplexity prompts to get significant gains in performance.

translated by 谷歌翻译

Physics Informed Neural Network for Dynamic Stress Prediction

Hamed Bolandi , Gautam Sreekumar , Xuyang Li , Nizar Lajnef , Vishnu Naresh Boddeti

分类：机器学习

2022-11-28

Structural failures are often caused by catastrophic events such as earthquakes and winds. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity. Therefore, to reduce computational cost while maintaining accuracy, a Physics Informed Neural Network (PINN), PINN-Stress model, is proposed to predict the entire sequence of stress distribution based on Finite Element simulations using a partial differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep neural network's loss function to incorporate information from measurements and PDEs. The PINN-Stress model can predict the sequence of stress distribution in almost real-time and can generalize better than the model without PINN.

translated by 谷歌翻译

Fully Bayesian inference for latent variable Gaussian process models

Suraj Yerramilli , Akshay Iyer , Wei Chen , Daniel W. Apley

分类： (统计)机器学习 | 机器学习

2022-11-04

Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first mapping each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.

translated by 谷歌翻译

Low-Stabilizer-Complexity Quantum States Are Not Pseudorandom

Sabee Grewal , Vishnu Iyer , William Kretschmer , Daniel Liang

分类：机器学习

2022-09-29

我们表明，具有“低稳定器复杂性”的量子状态可以有效地与HAAR随机区分开。具体而言，给定$ n $ qubit的纯状态$ | \ psi \ rangle $，我们给出了一种有效的算法，以区分$ | \ psi \ rangle $是（i）haar-random或（ii）具有稳定器保真度的状态至少$ \ frac {1} {k} $（即，具有一些稳定器状态的保真度至少$ \ frac {1} {k} $），保证就是其中之一。使用Black-box访问$ | \ psi \ rangle $，我们的算法使用$ o \！\ left（k^{12} \ log（1/\ delta）\ right）$ copies $ | \ psi \ rangle $和$ o \！\ left（n k^{12} \ log（1/\ delta）\ right）$ $时间以概率至少$ 1- \ delta $成功，并且随着访问状态准备统一，以$ | | \ psi \ rangle $（及其倒数），$ o \！\ left（k^{3} \ log（1/\ delta）\ right）$ queries和$ o \！\！ log（1/\ delta）\ right）$时间就足够了。作为推论，我们证明$ \ omega（\ log（n））$ $ t $ - 盖特对于任何Clifford+$ t $ circile都是必不可少的，以准备计算上的pseudorandom Quantum Quantum state，这是一种首要的下限。

translated by 谷歌翻译