智能论文笔记

Leveraging Natural Language Processing to Augment Structured Social Determinants of Health Data in the Electronic Health Record

Kevin Lybarger , Nicholas J Dobbins , Ritche Long , Angad Singh , Patrick Wedgeworth , Ozlem Ozuner , Meliha Yetisgen

分类：自然语言处理

2022-12-14

Objective: Social Determinants of Health (SDOH) influence personal health outcomes and health systems interactions. Health systems capture SDOH information through structured data and unstructured clinical notes; however, clinical notes often contain a more comprehensive representation of several key SDOH. The objective of this work is to assess the SDOH information gain achievable by extracting structured semantic representations of SDOH from the clinical narrative and combining these extracted representations with available structured data. Materials and Methods: We developed a natural language processing (NLP) information extraction model for SDOH that utilizes a deep learning entity and relation extraction architecture. In an electronic health record (EHR) case study, we applied the SDOH extractor to a large existing clinical data set with over 200,000 patients and 400,000 notes and compared the extracted information with available structured data. Results: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found 19\% of current tobacco users, 10\% of drug users, and 32\% of homeless patients only include documentation of these risk factors in the clinical narrative. Conclusions: Patients who are at-risk for negative health outcomes due to SDOH may be better served if health systems are able to identify SDOH risk factors and associated social needs. Structured semantic representations of text-encoded SDOH information can augment existing structured, and this more comprehensive SDOH representation can assist health systems in identifying and addressing social needs.

translated by 谷歌翻译

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria

Nicholas J Dobbins , Tony Mullen , Ozlem Uzuner , Meliha Yetisgen

分类：自然语言处理

2022-07-27

根据诸如医疗条件，程序和药物使用之类的资格标准，识别患者队列对于临床试验的招募至关重要。这种标准通常是在自由文本中最自然地描述的，使用临床医生和研究人员熟悉的语言。为了大规模识别潜在参与者，必须首先将这些标准转换为临床数据库的查询，这可能是劳动密集型且容易出错的。自然语言处理（NLP）方法提供了一种可能自动转换为数据库查询的潜在手段。但是，必须首先使用Corpora对其进行培训和评估，该语料库详细列出临床试验标准。在本文中，我们介绍了叶片临床试验（LCT）语料库，该语料库是一种使用高度颗粒状结构化标签，捕获一系列生物医学现象的人类向超过1000个临床试验资格标准描述。我们提供了我们的模式，注释过程，语料库质量和统计数据的详细信息。此外，我们提出了该语料库的基线信息提取结果，作为未来工作的基准。

translated by 谷歌翻译

The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

Yao Yan , Thomas Yu , Kathleen Muenzen , Sijia Liu , Connor Boyle , George Koslowski , Jiaxin Zheng , Nicholas Dobbins , Clement Essien , Hongfang Liu

分类：自然语言处理 | 人工智能

2022-06-28

目的是对临床文本去识别的自然语言处理（NLP）模型的评估取决于临床注释的可用性，临床注释通常由于隐私问题而受到限制。 NLP沙盒是一种通过采用联合模型到数据的方法来减轻NLP模型缺乏数据和评估框架的方法。这使得无偏见的联合模型评估无需共享多个机构的敏感数据。材料和方法我们利用Synapse协作框架，容器化软件和OpenAPI Generator来构建NLP沙盒（NLPSANDBOX.IO）。我们使用来自三个机构的数据评估了两个最先进的NLP去识别注释模型Philter和Neuroner。我们使用来自外部验证站点的数据进一步验证了模型性能。结果我们通过去识别临床模型评估证明了NLP沙箱的有用性。外部开发人员能够将其模型纳入NLP沙盒模板中，并提供用户体验反馈。讨论我们证明了使用NLP沙箱对临床文本去识别模型进行多站点评估的可行性，而无需共享数据。标准化模型和数据模式可以使模型传输和实现平稳。为了概括NLP沙箱，数据所有者和模型开发人员需要进行工作，以开发合适和标准化的模式，并调整其数据或模型以适合模式。结论NLP沙箱降低了利用临床数据进行NLP模型评估的障碍，并促进了联合会的NLP模型的联合，多站点，无偏见的评估。

translated by 谷歌翻译

Machine Learning and Thermography Applied to the Detection and Classification of Cracks in Building

Angela Busheska , Nara Almeida , Nicholas Sabella , Eudes de A. Rocha

分类：计算机视觉 | 机器学习

2022-12-30

Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.

translated by 谷歌翻译

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data

Rupam Bhattacharyya , Nicholas Henderson , Veerabhadran Baladandayuthapani

分类： (统计)机器学习

2022-12-29

Rapid advancements in collection and dissemination of multi-platform molecular and genomics data has resulted in enormous opportunities to aggregate such data in order to understand, prevent, and treat human diseases. While significant improvements have been made in multi-omic data integration methods to discover biological markers and mechanisms underlying both prognosis and treatment, the precise cellular functions governing these complex mechanisms still need detailed and data-driven de-novo evaluations. We propose a framework called Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data (fiBAG), that allows simultaneous identification of upstream functional evidence of proteogenomic biomarkers and the incorporation of such knowledge in Bayesian variable selection models to improve signal detection. fiBAG employs a conflation of Gaussian process models to quantify (possibly non-linear) functional evidence via Bayes factors, which are then mapped to a novel calibrated spike-and-slab prior, thus guiding selection and providing functional relevance to the associations with patient outcomes. Using simulations, we illustrate how integrative methods with functional calibration have higher power to detect disease related markers than non-integrative approaches. We demonstrate the profitability of fiBAG via a pan-cancer analysis of 14 cancer types to identify and assess the cellular mechanisms of proteogenomic markers associated with cancer stemness and patient survival.

translated by 谷歌翻译

Publishing Efficient On-device Models Increases Adversarial Vulnerability

Sanghyun Hong , Nicholas Carlini , Alexey Kurakin

分类：机器学习

2022-12-28

Recent increases in the computational demands of deep neural networks (DNNs) have sparked interest in efficient deep learning mechanisms, e.g., quantization or pruning. These mechanisms enable the construction of a small, efficient version of commercial-scale models with comparable accuracy, accelerating their deployment to resource-constrained devices. In this paper, we study the security considerations of publishing on-device variants of large-scale models. We first show that an adversary can exploit on-device models to make attacking the large models easier. In evaluations across 19 DNNs, by exploiting the published on-device models as a transfer prior, the adversarial vulnerability of the original commercial-scale models increases by up to 100x. We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase. Based on the insights, we propose a defense, $similarity$-$unpairing$, that fine-tunes on-device models with the objective of reducing the similarity. We evaluated our defense on all the 19 DNNs and found that it reduces the transferability up to 90% and the number of queries required by a factor of 10-100x. Our results suggest that further research is needed on the security (or even privacy) threats caused by publishing those efficient siblings.

translated by 谷歌翻译

Teamwork under extreme uncertainty: AI for Pokemon ranks 33rd in the world

Nicholas R. Sarantinos

分类：人工智能

2022-12-27

The highest grossing media franchise of all times, with over \$90 billion in total revenue, is Pokemon. The video games belong to the class of Japanese Role Playing Games (J-RPG). Developing a powerful AI agent for these games is very hard because they present big challenges to MinMax, Monte Carlo Tree Search and statistical Machine Learning, as they are vastly different from the well explored in AI literature games. An AI agent for one of these games means significant progress in AI agents for the entire class. Further, the key principles of such work can hopefully inspire approaches to several domains that require excellent teamwork under conditions of extreme uncertainty, including managing a team of doctors, robots or employees in an ever changing environment, like a pandemic stricken region or a war-zone. In this paper we first explain the mechanics of the game and we perform a game analysis. We continue by proposing unique AI algorithms based on our understanding that the two biggest challenges in the game are keeping a balanced team and dealing with three sources of uncertainty. Later on, we describe why evaluating the performance of such agents is challenging and we present the results of our approach. Our AI agent performed significantly better than all previous attempts and peaked at the 33rd place in the world, in one of the most popular battle formats, while running on only 4 single socket servers.

translated by 谷歌翻译

Unsupervised Instance and Subnetwork Selection for Network Data

Lin Zhang , Nicholas Moskwa , Melinda Larsen , Petko Bogdanov

分类：机器学习

2022-12-24

Unlike tabular data, features in network data are interconnected within a domain-specific graph. Examples of this setting include gene expression overlaid on a protein interaction network (PPI) and user opinions in a social network. Network data is typically high-dimensional (large number of nodes) and often contains outlier snapshot instances and noise. In addition, it is often non-trivial and time-consuming to annotate instances with global labels (e.g., disease or normal). How can we jointly select discriminative subnetworks and representative instances for network data without supervision? We address these challenges within an unsupervised framework for joint subnetwork and instance selection in network data, called UISS, via a convex self-representation objective. Given an unlabeled network dataset, UISS identifies representative instances while ignoring outliers. It outperforms state-of-the-art baselines on both discriminative subnetwork selection and representative instance selection, achieving up to 10% accuracy improvement on all real-world data sets we use for evaluation. When employed for exploratory analysis in RNA-seq network samples from multiple studies it produces interpretable and informative summaries.

translated by 谷歌翻译

TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization

Fabrizio Guillaro , Davide Cozzolino , Avneesh Sud , Nicholas Dufour , Luisa Verdoliva

分类：计算机视觉

2022-12-21

In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture that combines the RGB image and a learned noise-sensitive fingerprint. The latter learns to embed the artifacts related to the camera internal and external processing by training only on real data in a self-supervised manner. Forgeries are detected as deviations from the expected regular pattern that characterizes each pristine image. Looking for anomalies makes the approach able to robustly detect a variety of local manipulations, ensuring generalization. In addition to a pixel-level localization map and a whole-image integrity score, our approach outputs a reliability map that highlights areas where localization predictions may be error-prone. This is particularly important in forensic applications in order to reduce false alarms and allow for a large scale analysis. Extensive experiments on several datasets show that our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works. Code will be publicly available at https://grip-unina.github.io/TruFor/

translated by 谷歌翻译

Analyzing Semantic Faithfulness of Language Models via Input Intervention on Conversational Question Answering

Akshay Chaturvedi , Swarnadeep Bhar , Soumadeep Saha , Utpal Garain , Nicholas Asher

分类：自然语言处理 | 机器学习

2022-12-21

Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large version, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions -- deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (~50% for deletion intervention, and ~20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ~50% to ~6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.

translated by 谷歌翻译