智能论文笔记

CQE in OWL 2 QL: A "Longest Honeymoon" Approach (extended version)

Piero Bonatti , Gianluca Cima , Domenico Lembo , Lorenzo Marconi , Riccardo Rosati , Luigi Sauro , Domenico Fabio Savo

分类：人工智能

2022-07-22

最近在语义Web本体论的背景下研究了受控查询评估（CQE）。 CQE的目标是隐藏一些查询答案，以防止外部用户推断机密信息。通常，存在多种隐藏答案的多种无与伦比的方法，并且先前的CQE方法提前选择了哪些答案是可见的，哪些是不可见的。相反，在本文中，我们研究了一种动态CQE方法，即，我们建议根据对先前的评估更改当前查询的答案。我们的目标是最大程度地合作，除了能够保护机密数据之外，该系统除了能够保护机密数据，这意味着它可以肯定地回答了尽可能多的查询；它通过尽可能延迟答案修改来实现这一目标。我们还表明，我们无法通过静态方法（独立于查询历史记录）在直觉上模拟这种行为。有趣的是，对于通过拒绝表达的OWL 2 QL本体和策略，我们的语义下的查询评估是一阶重写，因此在数据复杂性中是AC0。这为开发实用算法铺平了道路，我们在本文中也初步讨论了这一算法。

translated by 谷歌翻译

AI-based Data Preparation and Data Analytics in Healthcare: The Case of Diabetes

Marianna Maranghi , Aris Anagnostopoulos , Irene Cannistraci , Ioannis Chatzigiannakis , Federico Croce , Giulia Di Teodoro , Michele Gentile , Giorgio Grani , Maurizio Lenzerini , Stefano Leonardi

分类：机器学习

2022-06-13

Associazione Medici Diabetologi（AMD）收集并管理着全球最大的糖尿病患者记录集合之一，也称为AMD数据库。本文介绍了一个正在进行的项目的初步结果，该项目的重点是人工智能和机器学习技术的应用，以概念化，清洁和分析如此重要且有价值的数据集，目的是提供预测性见解，以更好地支持糖尿病学家的诊断糖尿病学家和治疗选择。

translated by 谷歌翻译

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Francisco Valentini , Germán Rosati , Diego Fernandez Slezak , Edgar Altszyler

分类：自然语言处理 | 人工智能

2023-01-02

Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.

translated by 谷歌翻译

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

Tommaso Mario Buonocore , Claudio Crema , Alberto Redolfi , Riccardo Bellazzi , Enea Parimbelli

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-20

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

translated by 谷歌翻译

Quantum Clustering with k-Means: a Hybrid Approach

Alessandro Poggiali , Alessandro Berti , Anna Bernasconi , Gianna Del Corso , Riccardo Guidotti

分类：机器学习

2022-12-13

Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including machine learning. In this paper, we design, implement, and evaluate three hybrid quantum k-Means algorithms, exploiting different degree of parallelism. Indeed, each algorithm incrementally leverages quantum parallelism to reduce the complexity of the cluster assignment step up to a constant cost. In particular, we exploit quantum phenomena to speed up the computation of distances. The core idea is that the computation of distances between records and centroids can be executed simultaneously, thus saving time, especially for big datasets. We show that our hybrid quantum k-Means algorithms can be more efficient than the classical version, still obtaining comparable clustering results.

translated by 谷歌翻译

Multimodal and Explainable Internet Meme Classification

Abhinav Kumar Thakur , Filip Ilievski , Hông-Ân Sandlin , Alain Mermoud , Zhivar Sourati , Luca Luceri , Riccardo Tommasini

分类：人工智能 | 自然语言处理 | 机器学习

2022-12-11

Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.

translated by 谷歌翻译

CALIME: Causality-Aware Local Interpretable Model-Agnostic Explanations

Martina Cinquini , Riccardo Guidotti

分类：人工智能 | 机器学习

2022-12-10

A significant drawback of eXplainable Artificial Intelligence (XAI) approaches is the assumption of feature independence. This paper focuses on integrating causal knowledge in XAI methods to increase trust and help users assess explanations' quality. We propose a novel extension to a widely used local and model-agnostic explainer that explicitly encodes causal relationships in the data generated around the input instance to explain. Extensive experiments show that our method achieves superior performance comparing the initial one for both the fidelity in mimicking the black-box and the stability of the explanations.

translated by 谷歌翻译

LSVL: Large-scale season-invariant visual localization for UAVs

Jouko Kinnari , Riccardo Renzulli , Francesco Verdoja , Ville Kyrki

分类：机器人

2022-12-07

Localization of autonomous unmanned aerial vehicles (UAVs) relies heavily on Global Navigation Satellite Systems (GNSS), which are susceptible to interference. Especially in security applications, robust localization algorithms independent of GNSS are needed to provide dependable operations of autonomous UAVs also in interfered conditions. Typical non-GNSS visual localization approaches rely on known starting pose, work only on a small-sized map, or require known flight paths before a mission starts. We consider the problem of localization with no information on initial pose or planned flight path. We propose a solution for global visual localization on a map at scale up to 100 km2, based on matching orthoprojected UAV images to satellite imagery using learned season-invariant descriptors. We show that the method is able to determine heading, latitude and longitude of the UAV at 12.6-18.7 m lateral translation error in as few as 23.2-44.4 updates from an uninformed initialization, also in situations of significant seasonal appearance difference (winter-summer) between the UAV image and the map. We evaluate the characteristics of multiple neural network architectures for generating the descriptors, and likelihood estimation methods that are able to provide fast convergence and low localization error. We also evaluate the operation of the algorithm using real UAV data and evaluate running time on a real-time embedded platform. We believe this is the first work that is able to recover the pose of an UAV at this scale and rate of convergence, while allowing significant seasonal difference between camera observations and map.

translated by 谷歌翻译

Quantum median filter for Total Variation image denoising

Simone De Santis , Damiana Lazzaro , Riccardo Mengoni , Serena Morigi

分类：计算机视觉

2022-12-02

In this new computing paradigm, named quantum computing, researchers from all over the world are taking their first steps in designing quantum circuits for image processing, through a difficult process of knowledge transfer. This effort is named Quantum Image Processing, an emerging research field pushed by powerful parallel computing capabilities of quantum computers. This work goes in this direction and proposes the challenging development of a powerful method of image denoising, such as the Total Variation (TV) model, in a quantum environment. The proposed Quantum TV is described and its sub-components are analysed. Despite the natural limitations of the current capabilities of quantum devices, the experimental results show a competitive denoising performance compared to the classical variational TV counterpart.

translated by 谷歌翻译

Measuring Reliability of Large Language Models through Semantic Consistency

Harsh Raj , Domenic Rosati , Subhabrata Majumdar

分类：自然语言处理 | 人工智能

2022-11-10

While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their outputs to be consistent under prompts that mean the same thing or convey the same intent. While some work has looked into how state-of-the-art PLMs address this need, they have been limited to only evaluating lexical equality of single- or multi-word answers and do not address consistency of generative text sequences. In order to understand consistency of PLMs under text generation settings, we develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions in the TruthfulQA dataset, we find that our proposed metrics are considerably more consistent than traditional metrics embodying lexical consistency, and also correlate with human evaluation of output consistency to a higher degree.

translated by 谷歌翻译