关键应用程序中机器学习(ML)组件的集成引入了软件认证和验证的新挑战。正在开发新的安全标准和技术准则,以支持基于ML的系统的安全性,例如ISO 21448 SOTIF用于汽车域名,并保证机器学习用于自主系统(AMLAS)框架。 SOTIF和AMLA提供了高级指导,但对于每个特定情况,必须将细节凿出来。我们启动了一个研究项目,目的是证明开放汽车系统中ML组件的完整安全案例。本文报告说,Smikk的安全保证合作是由行业级别的行业合作的,这是一个基于ML的行人自动紧急制动示威者,在行业级模拟器中运行。我们演示了AMLA在伪装上的应用,以在简约的操作设计域中,即,我们为其基于ML的集成组件共享一个完整的安全案例。最后,我们报告了经验教训,并在开源许可下为研究界重新使用的开源许可提供了傻笑和安全案例。
translated by 谷歌翻译
The advances in Artificial Intelligence are creating new opportunities to improve lives of people around the world, from business to healthcare, from lifestyle to education. For example, some systems profile the users using their demographic and behavioral characteristics to make certain domain-specific predictions. Often, such predictions impact the life of the user directly or indirectly (e.g., loan disbursement, determining insurance coverage, shortlisting applications, etc.). As a result, the concerns over such AI-enabled systems are also increasing. To address these concerns, such systems are mandated to be responsible i.e., transparent, fair, and explainable to developers and end-users. In this paper, we present ComplAI, a unique framework to enable, observe, analyze and quantify explainability, robustness, performance, fairness, and model behavior in drift scenarios, and to provide a single Trust Factor that evaluates different supervised Machine Learning models not just from their ability to make correct predictions but from overall responsibility perspective. The framework helps users to (a) connect their models and enable explanations, (b) assess and visualize different aspects of the model, such as robustness, drift susceptibility, and fairness, and (c) compare different models (from different model families or obtained through different hyperparameter settings) from an overall perspective thereby facilitating actionable recourse for improvement of the models. It is model agnostic and works with different supervised machine learning scenarios (i.e., Binary Classification, Multi-class Classification, and Regression) and frameworks. It can be seamlessly integrated with any ML life-cycle framework. Thus, this already deployed framework aims to unify critical aspects of Responsible AI systems for regulating the development process of such real systems.
translated by 谷歌翻译
Text-to-text generation models have increasingly become the go-to solution for a wide variety of sequence labeling tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-$k$ predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
translated by 谷歌翻译
Graph neural networks (GNNs) have recently emerged as a promising learning paradigm in learning graph-structured data and have demonstrated wide success across various domains such as recommendation systems, social networks, and electronic design automation (EDA). Like other deep learning (DL) methods, GNNs are being deployed in sophisticated modern hardware systems, as well as dedicated accelerators. However, despite the popularity of GNNs and the recent efforts of bringing GNNs to hardware, the fault tolerance and resilience of GNNs has generally been overlooked. Inspired by the inherent algorithmic resilience of DL methods, this paper conducts, for the first time, a large-scale and empirical study of GNN resilience, aiming to understand the relationship between hardware faults and GNN accuracy. By developing a customized fault injection tool on top of PyTorch, we perform extensive fault injection experiments to various GNN models and application datasets. We observe that the error resilience of GNN models varies by orders of magnitude with respect to different models and application datasets. Further, we explore a low-cost error mitigation mechanism for GNN to enhance its resilience. This GNN resilience study aims to open up new directions and opportunities for future GNN accelerator design and architectural optimization.
translated by 谷歌翻译
We consider the problem of learning the structure underlying a Gaussian graphical model when the variables (or subsets thereof) are corrupted by independent noise. A recent line of work establishes that even for tree-structured graphical models, only partial structure recovery is possible and goes on to devise algorithms to identify the structure up to an (unavoidable) equivalence class of trees. We extend these results beyond trees and consider the model selection problem under noise for non tree-structured graphs, as tree graphs cannot model several real-world scenarios. Although unidentifiable, we show that, like the tree-structured graphs, the ambiguity is limited to an equivalence class. This limited ambiguity can help provide meaningful clustering information (even with noise), which is helpful in computer and social networks, protein-protein interaction networks, and power networks. Furthermore, we devise an algorithm based on a novel ancestral testing method for recovering the equivalence class. We complement these results with finite sample guarantees for the algorithm in the high-dimensional regime.
translated by 谷歌翻译
Recently, Robey et al. propose a notion of probabilistic robustness, which, at a high-level, requires a classifier to be robust to most but not all perturbations. They show that for certain hypothesis classes where proper learning under worst-case robustness is \textit{not} possible, proper learning under probabilistic robustness \textit{is} possible with sample complexity exponentially smaller than in the worst-case robustness setting. This motivates the question of whether proper learning under probabilistic robustness is always possible. In this paper, we show that this is \textit{not} the case. We exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are \textit{not} probabilistically robustly PAC learnable with \textit{any} proper learning rule. However, if we compare the output of the learner to the best hypothesis for a slightly \textit{stronger} level of probabilistic robustness, we show that not only is proper learning \textit{always} possible, but it is possible via empirical risk minimization.
translated by 谷歌翻译
Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.
translated by 谷歌翻译
Long-range context modeling is crucial to both dialogue understanding and generation. The most popular method for dialogue context representation is to concatenate the last-$k$ previous utterances. However, this method may not be ideal for conversations containing long-range dependencies. In this work, we propose DialoGX, a novel encoder-decoder based framework for conversational response generation with a generalized and explainable context representation that can look beyond the last-$k$ utterances. Hence the method is adaptive to conversations with long-range dependencies. The main idea of our approach is to identify and utilize the most relevant historical utterances instead of the last-$k$ utterances in chronological order. We study the effectiveness of our proposed method on both dialogue generation (open-domain) and understanding (DST) tasks. DialoGX achieves comparable performance with the state-of-the-art models on DailyDialog dataset. We also observe performance gain in existing DST models with our proposed context representation strategy on MultiWOZ dataset. We justify our context representation through the lens of psycholinguistics and show that the relevance score of previous utterances agrees well with human cognition which makes DialoGX explainable as well.
translated by 谷歌翻译
顺序标记是一项基本的NLP任务,构成了许多应用程序的骨干。对SEQ2SEQ模型的监督学习(如T5)在这些问题上取得了巨大的成功。但是,这些模型的培训目标与我们在实际应用中关心的指标和Desiderata之间存在显着脱节。例如,实用的序列标记应用程序可能需要优化某些Precision-Recall折衷(TOP-K预测),这与最大化金标记序列的可能性的标准目标完全不同。因此,为了弥合这一差距,我们提出了Groot,这是一个简单而有效的框架,用于生成文本序列的奖励优化。 Groot通过训练生成的顺序标记模型来工作,以将解码器输出分布与(Black-Box)奖励函数的输出分布相匹配。使用迭代培训制度,我们首先生成预测候选者,然后纠正其中的错误,最后对比这些候选者(基于其奖励价值)。正如通过四个公共基准测试的广泛实验所证明的那样,Groot显着改善了所有奖励指标。此外,Groot还导致了整体解码器分布的改善,这是由顶级$ K $候选者的质量提高所证明的。
translated by 谷歌翻译
检索演示的生成模型比独立语言模型提供了许多好处:除了对给定查询的文字答案外,它们还提供了从可更新知识库中检索到的出处项目。但是,它们也是更复杂的系统,需要处理长输入。在这项工作中,我们介绍了FID Light,以强烈提高最先进的检索功能模型的效率,同时保持相同的有效性。我们的FID光模型将信息流从编码器(分别编码段落)限制为解码器(使用串联编码表示)。此外,我们通过文本源指针通过重新排列的功能调整FID光,以提高排名最高的出处精度。我们对七个知识密集任务(KILT)的各种实验表明,FID光线始终改善了查询潜伏期和有效性之间的帕累托前沿。带有源指向的FID光设置为六个苏格兰短裙任务的新最新结果,用于合并文本生成和出处检索评估,同时保持合理的效率。
translated by 谷歌翻译