智能论文笔记

What can Speech and Language Tell us About the Working Alliance in Psychotherapy

Sebastian P. Bayerl , Gabriel Roccabruna , Shammur Absar Chowdhury , Tommaso Ciulli , Morena Danieli , Korbinian Riedhammer , Giuseppe Riccardi

分类：自然语言处理

2022-06-17

我们对会话分析的问题及其在健康领域的应用感兴趣。认知行为疗法是心理疗法的结构化方法，使治疗师能够帮助患者识别和修改恶意思想，行为或行动。可以使用工作联盟库存观察者评级的缩短来评估这项合作的工作 - 12个项目涵盖任务，目标和关系 - 对治疗结果具有相关的影响。在这项工作中，我们研究了该联盟库存与患者与心理治疗师之间的口头对话（会议）之间的关系。我们已经提供了八个星期的电子疗法，收集了他们的音频和视频通话课程，并手动转录它们。专业治疗师对口语对话进行了注释和评估。我们已经调查了语音和语言特征及其与WAI项目的关联。功能类型包括转弯动力学，词汇夹带以及从语音和语言信号中提取的对话描述符。我们的发现提供了有力的证据，表明这些特征的一部分是工作联盟的强有力指标。据我们所知，这是一项针对言语和语言来表征工作联盟的第一项和新颖的研究。

translated by 谷歌翻译

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Sebastian P. Bayerl , Aniruddha Tammewar , Korbinian Riedhammer , Giuseppe Riccardi

分类：自然语言处理 | 人工智能

2021-12-13

个人叙述（PN） - 口语或书面 - 来自自己经验的事实，人，活动和思想的回忆。情感识别和情感分析任务通常在话语或文档级别定义。然而，在这项工作中，我们专注于情感载体（EC）定义为最佳解释叙述者的情绪状态（“父亲失去”，“让我选择”）。一旦提取，这种EC可以提供用户状态的更丰富表示以改善自然语言理解和对话建模。在以前的工作中，已经表明可以使用词法特征来识别EC。但是，口语叙述应该提供对背景的更丰富的描述和用户的情绪状态。在本文中，我们利用基于文字的声学和文本嵌入以及早期和晚期融合技术，用于检测口语叙述中的ECS。对于声学词级表示，我们使用在单独的语音情绪上进行剩余的剩余神经网络（Reset）并进行微调以检测EC。不同融合和系统组合策略的实验表明，晚期融合导致这项任务的重大改进。

translated by 谷歌翻译

Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Juan Manuel Mayor-Torres , Sara Medina-DeVilliers , Tessa Clarkson , Matthew D. Lerner , Giuseppe Riccardi

分类：机器学习

2021-11-25

可说明的人工智能（XAI）的目前的模型显示出在提出统计上纠缠特征时，可以显而易见和量化缺乏可靠性，当提出统计上纠缠的特征时，为训练深层分类器。深度学习在临床试验中的应用增加了预测神经发育障碍的早期诊断，如自闭症谱系障碍（ASD）。然而，包含更可靠的显着图，以获得使用神经活动特征的更可靠和可解释的度量，对于诊断或临床试验中的实际应用仍然不充分。此外，在ASD研究中，包含使用神经措施来预测观察面部情绪的深层分类器相对未探索。因此，在本研究中，我们提出了对脑电图（EEG）的卷积神经网络（CNN）的评估，用于基于新颖的删除（咆哮）方法，以恢复分类器中使用的高度相关特征。具体而言，我们比较众所周知的相关性图，例如层性相关性传播（LRP），图案网络，图案归因和平滑级平方。本研究是第一个在通常开发的和ASD个体中使用内部训练的CNN内训练的基于EEG的面部情感识别来实现更透明的特征相关计算。

translated by 谷歌翻译

A Segmentation Method for fluorescence images without a machine learning approach

Giuseppe Giacopelli , Michele Migliore , Domenico Tegolo

分类：计算机视觉 | 人工智能

2022-12-28

Background: Image analysis applications in digital pathology include various methods for segmenting regions of interest. Their identification is one of the most complex steps, and therefore of great interest for the study of robust methods that do not necessarily rely on a machine learning (ML) approach. Method: A fully automatic and optimized segmentation process for different datasets is a prerequisite for classifying and diagnosing Indirect ImmunoFluorescence (IIF) raw data. This study describes a deterministic computational neuroscience approach for identifying cells and nuclei. It is far from the conventional neural network approach, but it is equivalent to their quantitative and qualitative performance, and it is also solid to adversative noise. The method is robust, based on formally correct functions, and does not suffer from tuning on specific data sets. Results: This work demonstrates the robustness of the method against the variability of parameters, such as image size, mode, and signal-to-noise ratio. We validated the method on two datasets (Neuroblastoma and NucleusSegData) using images annotated by independent medical doctors. Conclusions: The definition of deterministic and formally correct methods, from a functional to a structural point of view, guarantees the achievement of optimized and functionally correct results. The excellent performance of our deterministic method (NeuronalAlg) to segment cells and nuclei from fluorescence images was measured with quantitative indicators and compared with those achieved by three published ML approaches.

translated by 谷歌翻译

TypeFormer: Transformers for Mobile Keystroke Biometrics

Giuseppe Stragapede , Paula Delgado-Santos , Ruben Tolosana , Ruben Vera-Rodriguez , Richard Guest , Aythami Morales

分类：计算机视觉

2022-12-26

The broad usage of mobile devices nowadays, the sensitiveness of the information contained in them, and the shortcomings of current mobile user authentication methods are calling for novel, secure, and unobtrusive solutions to verify the users' identity. In this article, we propose TypeFormer, a novel Transformer architecture to model free-text keystroke dynamics performed on mobile devices for the purpose of user authentication. The proposed model consists in Temporal and Channel Modules enclosing two Long Short-Term Memory (LSTM) recurrent layers, Gaussian Range Encoding (GRE), a multi-head Self-Attention mechanism, and a Block-Recurrent structure. Experimenting on one of the largest public databases to date, the Aalto mobile keystroke database, TypeFormer outperforms current state-of-the-art systems achieving Equal Error Rate (EER) values of 3.25% using only 5 enrolment sessions of 50 keystrokes each. In such way, we contribute to reducing the traditional performance gap of the challenging mobile free-text scenario with respect to its desktop and fixed-text counterparts. Additionally, we analyse the behaviour of the model with different experimental configurations such as the length of the keystroke sequences and the amount of enrolment sessions, showing margin for improvement with more enrolment data. Finally, a cross-database evaluation is carried out, demonstrating the robustness of the features extracted by TypeFormer in comparison with existing approaches.

translated by 谷歌翻译

The URW-KG: a Resource for Tackling the Underrepresentation of non-Western Writers

Marco Antonio Stranisci , Giuseppe Spillo , Cataldo Musto , Viviana Patti , Rossana Damiano

分类：自然语言处理

2022-12-21

Digital media have enabled the access to unprecedented literary knowledge. Authors, readers, and scholars are now able to discover and share an increasing amount of information about books and their authors. Notwithstanding, digital archives are still unbalanced: writers from non-Western countries are less represented, and such a condition leads to the perpetration of old forms of discrimination. In this paper, we present the Under-Represented Writers Knowledge Graph (URW-KG), a resource designed to explore and possibly amend this lack of representation by gathering and mapping information about works and authors from Wikidata and three other sources: Open Library, Goodreads, and Google Books. The experiments based on KG embeddings showed that the integrated information encoded in the graph allows scholars and users to be more easily exposed to non-Western literary works and authors with respect to Wikidata alone. This opens to the development of fairer and effective tools for author discovery and exploration.

translated by 谷歌翻译

Attend to the Right Context: A Plug-and-Play Module for Content-Controllable Summarization

Wen Xiao , Lesly Miculicich , Yang Liu , Pengcheng He , Giuseppe Carenini

分类：自然语言处理

2022-12-21

Content-Controllable Summarization generates summaries focused on the given controlling signals. Due to the lack of large-scale training corpora for the task, we propose a plug-and-play module RelAttn to adapt any general summarizers to the content-controllable summarization task. RelAttn first identifies the relevant content in the source documents, and then makes the model attend to the right context by directly steering the attention weight. We further apply an unsupervised online adaptive parameter searching algorithm to determine the degree of control in the zero-shot setting, while such parameters are learned in the few-shot setting. By applying the module to three backbone summarization models, experiments show that our method effectively improves all the summarizers, and outperforms the prefix-based method and a widely used plug-and-play model in both zero- and few-shot settings. Tellingly, more benefit is observed in the scenarios when more control is needed.

translated by 谷歌翻译

Inductive Attention for Video Action Anticipation

Tsung-Ming Tai , Giuseppe Fiameni , Cheng-Kuang Lee , Simon See , Oswald Lanz

分类：计算机视觉

2022-12-17

Anticipating future actions based on video observations is an important task in video understanding, which would be useful for some precautionary systems that require response time to react before an event occurs. Since the input in action anticipation is only pre-action frames, models do not have enough information about the target action; moreover, similar pre-action frames may lead to different futures. Consequently, any solution using existing action recognition models can only be suboptimal. Recently, researchers have proposed using a longer video context to remedy the insufficient information in pre-action intervals, as well as the self-attention to query past relevant moments to address the anticipation problem. However, the indirect use of video input features as the query might be inefficient, as it only serves as the proxy to the anticipation goal. To this end, we propose an inductive attention model, which transparently uses prior prediction as the query to derive the anticipation result by induction from past experience. Our method naturally considers the uncertainty of multiple futures via the many-to-many association. On the large-scale egocentric video datasets, our model not only shows consistently better performance than state of the art using the same backbone, and is competitive to the methods that employ a stronger backbone, but also superior efficiency in less model parameters.

translated by 谷歌翻译

Understanding Online Migration Decisions Following the Banning of Radical Communities

Giuseppe Russo , Manoel Horta Ribeiro , Giona Casiraghi , Luca Verginer

分类：自然语言处理

2022-12-09

The proliferation of radical online communities and their violent offshoots has sparked great societal concern. However, the current practice of banning such communities from mainstream platforms has unintended consequences: (I) the further radicalization of their members in fringe platforms where they migrate; and (ii) the spillover of harmful content from fringe back onto mainstream platforms. Here, in a large observational study on two banned subreddits, r/The\_Donald and r/fatpeoplehate, we examine how factors associated with the RECRO radicalization framework relate to users' migration decisions. Specifically, we quantify how these factors affect users' decisions to post on fringe platforms and, for those who do, whether they continue posting on the mainstream platform. Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform. Whereas social-level factors, users' connection with the radical community, only affect the propensity to be coactive on both platforms. Overall, our findings pave the way for evidence-based moderation policies, as the decisions to migrate and remain coactive amplify unintended consequences of community bans.

translated by 谷歌翻译

Elixir: A system to enhance data quality for multiple analytics on a video stream

Sibendu Paul , Kunal Rao , Giuseppe Coviello , Murugan Sankaradas , Oliver Po , Y. Charlie Hu , Srimat T. Chakradhar

分类：计算机视觉

2022-12-08

IoT sensors, especially video cameras, are ubiquitously deployed around the world to perform a variety of computer vision tasks in several verticals including retail, healthcare, safety and security, transportation, manufacturing, etc. To amortize their high deployment effort and cost, it is desirable to perform multiple video analytics tasks, which we refer to as Analytical Units (AUs), off the video feed coming out of every camera. In this paper, we first show that in a multi-AU setting, changing the camera setting has disproportionate impact on different AUs performance. In particular, the optimal setting for one AU may severely degrade the performance for another AU, and further the impact on different AUs varies as the environmental condition changes. We then present Elixir, a system to enhance the video stream quality for multiple analytics on a video stream. Elixir leverages Multi-Objective Reinforcement Learning (MORL), where the RL agent caters to the objectives from different AUs and adjusts the camera setting to simultaneously enhance the performance of all AUs. To define the multiple objectives in MORL, we develop new AU-specific quality estimator values for each individual AU. We evaluate Elixir through real-world experiments on a testbed with three cameras deployed next to each other (overlooking a large enterprise parking lot) running Elixir and two baseline approaches, respectively. Elixir correctly detects 7.1% (22,068) and 5.0% (15,731) more cars, 94% (551) and 72% (478) more faces, and 670.4% (4975) and 158.6% (3507) more persons than the default-setting and time-sharing approaches, respectively. It also detects 115 license plates, far more than the time-sharing approach (7) and the default setting (0).

translated by 谷歌翻译