智能论文笔记

Q4EDA: A Novel Strategy for Textual Information Retrieval Based on User Interactions with Visual Representations of Time Series

Leonardo Christino , Martha D. Ferreira , Fernando V. Paulovich

分类：机器学习

2021-01-19

知道如何在搜索引擎（SES）（例如Google或Wikipedia）中构建基于文本的搜索查询（SQS）已成为一项基本技能。尽管可以通过此类SE提供大量数据，但大多数结构化数据集都生活在其范围之外。可视化工具有助于这一限制，但是没有这样的工具接近通过通用SES获得的大量信息。为了填补这一空白，本文介绍了Q4EDA，这是一个新颖的框架，可转换用户在时间序列的视觉表示上执行的视觉选择查询，提供有效且稳定的SQS，可用于通用SES和相关信息的建议。用户通过将Gapminder的线条复制品与填充有Wikipedia文档的SE联系起来的应用程序来介绍和验证Q4EDA的实用性，并显示了Q4EDA如何支持和增强联合国世界指标的探索性分析。尽管有一些局限性，Q4EDA在其建议中仍然是独一无二的，它代表了提供基于用户与视觉表示的用户交互来查询文本信息的解决方案的真正进步。

translated by 谷歌翻译

EDAssistant: Supporting Exploratory Data Analysis in Computational Notebooks with In-Situ Code Search and Recommendation

Xingjun Li , Yizhi Zhang , Justin Leung , Chengnian Sun , Jian Zhao

分类：机器学习

2021-12-15

使用计算笔记本（例如，Jupyter Notebook），数据科学家根据他们的先前经验和外部知识（如在线示例）合理化他们的探索性数据分析（EDA）。对于缺乏关于数据集或问题的具体了解的新手或数据科学家，有效地获得和理解外部信息对于执行EDA至关重要。本文介绍了eDassistant，一个jupyterlab扩展，支持EDA的原位搜索示例笔记本电脑和有用的API的推荐，由搜索结果的新颖交互式可视化供电。代码搜索和推荐是由最先进的机器学习模型启用的，培训在线收集的EDA笔记本电脑的大型语料库。进行用户学习，以调查埃迪卡斯特和数据科学家的当前实践（即，使用外部搜索引擎）。结果证明了埃迪斯坦特的有效性和有用性，与会者赞赏其对EDA的顺利和环境支持。我们还报告了有关代码推荐工具的几种设计意义。

translated by 谷歌翻译

Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

Lennart Küll

分类：自然语言处理

2022-12-01

The number of scientific publications continues to rise exponentially, especially in Computer Science (CS). However, current solutions to analyze those publications restrict access behind a paywall, offer no features for visual analysis, limit access to their data, only focus on niches or sub-fields, and/or are not flexible and modular enough to be transferred to other datasets. In this thesis, we conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata and to determine the state of CS research. Specifically, we investigate trends of the quantity, impact, and topics for authors, venues, document types (conferences vs. journals), and fields of study (compared to, e.g., medicine). To achieve this we introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations. The data underlying this system is the DBLP Discovery Dataset (D3), which contains metadata from 5 million CS publications. Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future. The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles receive twice as many citations compared to conference papers, but the contrast is much smaller for the most cited conferences and journals, and vi) journals also get more citations in all other investigated fields of study, while only CS and engineering publish more in conferences than journals.

translated by 谷歌翻译

Interactive Question Answering Systems: Literature Review

Giovanni Maria Biancofiore , Yashar Deldjoo , Tommaso Di Noia , Eugenio Di Sciascio , Fedelucio Narducci

分类：自然语言处理 | 人工智能

2022-09-04

问答系统被认为是流行且经常有效的信息在网络上寻求信息的手段。在这样的系统中，寻求信息者可以通过自然语言提出问题来获得对他们的查询的简短回应。交互式问题回答是一种最近提出且日益流行的解决方案，它位于问答和对话系统的交集。一方面，用户可以以普通语言提出问题，并找到对她的询问的实际回答；另一方面，如果在初始请求中有多个可能的答复，很少或歧义，则系统可以将问题交通会话延长到对话中。通过允许用户提出更多问题，交互式问题回答使用户能够与系统动态互动并获得更精确的结果。这项调查提供了有关当前文献中普遍存在的交互式提问方法的详细概述。它首先要解释提问系统的基本原理，从而定义新的符号和分类法，以将所有已确定的作品结合在统一框架内。然后，根据提出的方法，评估方法和数据集/应用程序域来介绍和检查有关交互式问题解答系统的审查已发表的工作。我们还描述了围绕社区提出的特定任务和问题的趋势，从而阐明了学者的未来利益。 GitHub页面的综合综合了本文献研究中涵盖的所有主要主题，我们的工作得到了进一步的支持。 https://sisinflab.github.io/interactive-question-answering-systems-survey/

translated by 谷歌翻译

Survey of Generative Methods for Social Media Analysis

Stan Matwin , Aristides Milios , Paweł Prałat , Amilcar Soares , François Théberge

分类：机器学习

2021-12-13

本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片（Sota）。它填补了空白，因为现有的调查文章在其范围内或被约会。我们包括两个重要方面，目前正在挖掘和建模社交媒体的重要性：动态和网络。社会动态对于了解影响影响或疾病的传播，友谊的形成，友谊的形成等，另一方面，可以捕获各种复杂关系，提供额外的洞察力和识别否则将不会被注意的重要模式。

translated by 谷歌翻译

Interactive Data Analysis with Next-step Natural Language Query Recommendation

Xingbo Wang , Furui Cheng , Yong Wang , Ke Xu , Jiang Long , Hong Lu , Huamin Qu

分类：自然语言处理

2022-01-13

自然语言界面（NLIS）为用户提供了一种方便的方式来通过自然语言查询交互分析数据。然而，交互式数据分析是一种苛刻的过程，特别是对于新手数据分析师。从不同域探索大型和复杂的数据集时，数据分析师不一定有足够的关于数据和应用域的知识。它使他们无法有效地引起一系列查询并广泛导出理想的数据洞察力。在本文中，我们使用Step-Wise查询推荐模块开发NLI，以帮助用户选择适当的下一步探索操作。该系统采用数据驱动方法，以基于其查询日志生成用户兴趣的应用域的逐步语义相关和上下文感知的查询建议。此外，该系统可帮助用户将查询历史和结果组织成仪表板以传达发现的数据洞察力。通过比较用户学习，我们表明我们的系统可以促进比没有推荐模块的基线更有效和系统的数据分析过程。

translated by 谷歌翻译

Supporting peace negotiations in the Yemen war through machine learning

M. Arana-Catania , F. A. Van Lier , Rob Procter

分类：自然语言处理 | 机器学习

2022-07-23

当今的冲突变得越来越复杂，流畅和分散，通常涉及许多具有多重且经常发散利益的国家和国际参与者。随着调解员努力使冲突动态有理由，例如冲突政党的范围和政治立场的演变，相关与较少相关的参与者在和平建立和认同之间的区别或身份证明，这一发展构成了冲突调解的重大挑战。关键冲突问题及其相互依存。国际和平努力似乎不足以成功应对这些挑战。尽管技术已经在与冲突相关的领域进行了试验和使用，例如预测冲突或信息收集，但对技术如何促进冲突调解的关注较少。该案例研究有助于有关在冲突调解过程中使用最先进的机器学习技术和技术的新兴研究。本研究使用也门和平谈判中的对话成绩单，通过为他们提供知识管理，提取和冲突分析的工具来有效地支持中介团队。除了说明冲突调解中的机器学习工具的潜力外，本文还强调了跨学科和参与性的共同创造方法对开发上下文敏感和有针对性的工具的重要性，并确保有意义和负责任的实施。

translated by 谷歌翻译

A Survey of Plagiarism Detection Systems: Case of Use with English, French and Arabic Languages

Mehdi Abdelhamid , Faical Azouaou , Sofiane Batata

分类：自然语言处理

2022-01-10

在学术界，抄袭肯定不是一个新兴的关注，但它随着互联网的普及和对全球内容来源的易于访问而变得更大的程度，使人类干预不足。尽管如此，由于计算机辅助抄袭检测，抄袭远远远非是一个未被解除的问题，目前是一个有效的研究领域，该研究落在信息检索（IR）和自然语言处理（NLP）领域。许多软件解决方案有助于满足这项任务，本文概述了用于阿拉伯语，法国和英语学术和教育环境的抄袭检测系统。比较在八个系统之间持有，并在检测不同来源的三个混淆水平的特征，可用性，技术方面以及它们的性能之间进行：逐字，释义和跨语言抄袭。在本研究的背景下也进行了对技术形式的抄袭技术形式的关注检查。此外，还提供了对不同作者提出的抄袭类型和分类的调查。

translated by 谷歌翻译

The Infinite Index: Information Retrieval on Generative Text-To-Image Models

Niklas Deckers , Maik Fröbe , Johannes Kiesel , Gianluca Pandolfo , Christopher Schröder , Benno Stein , Martin Potthast

分类：自然语言处理 | 计算机视觉

2022-12-14

The text-to-image model Stable Diffusion has recently become very popular. Only weeks after its open source release, millions are experimenting with image generation. This is due to its ease of use, since all it takes is a brief description of the desired image to "prompt" the generative model. Rarely do the images generated for a new prompt immediately meet the user's expectations. Usually, an iterative refinement of the prompt ("prompt engineering") is necessary for satisfying images. As a new perspective, we recast image prompt engineering as interactive image retrieval - on an "infinite index". Thereby, a prompt corresponds to a query and prompt engineering to query refinement. Selected image-prompt pairs allow direct relevance feedback, as the model can modify an image for the refined prompt. This is a form of one-sided interactive retrieval, where the initiative is on the user side, whereas the server side remains stateless. In light of an extensive literature review, we develop these parallels in detail and apply the findings to a case study of a creative search task on such a model. We note that the uncertainty in searching an infinite index is virtually never-ending. We also discuss future research opportunities related to retrieval models specialized for generative models and interactive generative image retrieval. The application of IR technology, such as query reformulation and relevance feedback, will contribute to improved workflows when using generative models, while the notion of an infinite index raises new challenges in IR research.

translated by 谷歌翻译

A Survey on Concept Drift in Process Mining

Denise Maria Vecino Sato , Sheila Cristiana de Freitas , Jean Paul Barddal , Edson Emilio Scalabrin

分类：机器学习

2021-12-03

概念漂移过程挖掘（PM）是一种挑战，因为古典方法假设进程处于稳态，即事件共享相同的进程版本。我们对这些领域的交叉点进行了系统的文献综述，从而审查了过程采矿中的概念漂移，并提出了用于漂移检测和在线流程挖掘的现有技术的分类，以实现不断发展的环境。现有的作品描绘了（i）PM仍然主要关注离线分析，并且（ii）由于缺乏公共评估协议，数据集和指标，过程中的概念漂移技术的评估是麻烦的。

translated by 谷歌翻译

Recent Advances in Automated Question Answering In Biomedical Domain

Krishanu Das Baksi

分类：人工智能 | 自然语言处理

2021-11-10

自动问题应答（QA）系统的目的是以时间有效的方式向用户查询提供答案。通常在数据库（或知识库）或通常被称为语料库的文件集合中找到答案。在过去的几十年里，收购知识的扩散，因此生物医学领域的新科学文章一直是指数增长。因此，即使对于领域专家，也难以跟踪域中的所有信息。随着商业搜索引擎的改进，用户可以在某些情况下键入其查询并获得最相关的一小组文档，以及在某些情况下从文档中的相关片段。但是，手动查找所需信息或答案可能仍然令人疑惑和耗时。这需要开发高效的QA系统，该系统旨在为用户提供精确和精确的答案提供了生物医学领域的自然语言问题。在本文中，我们介绍了用于开发普通域QA系统的基本方法，然后彻底调查生物医学QA系统的不同方面，包括使用结构化数据库和文本集合的基准数据集和几种提出的方法。我们还探讨了当前系统的局限性，并探索潜在的途径以获得进一步的进步。

translated by 谷歌翻译

Proceedings of the 2nd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 2nd International Workshop on Reading Music Systems, held in Delft on the 2nd of November 2019.

translated by 谷歌翻译

Analyzing social media with crowdsourcing in Crowd4SDG

Carlo Bono , Mehmet Oğuz Mülâyim , Cinzia Cappiello , Mark Carman , Jesus Cerquides , Jose Luis Fernandez-Marquez , Rosy Mondardini , Edoardo Ramalli , Barbara Pernici

分类：人工智能

2022-08-04

社交媒体有可能提供有关紧急情况和突然事件的及时信息。但是，在每天发布的数百万帖子中找到相关信息可能很困难，并且开发数据分析项目通常需要时间和技术技能。这项研究提出了一种为分析社交媒体的灵活支持的方法，尤其是在紧急情况下。引入了可以采用社交媒体分析的不同用例，并讨论了从大量帖子中检索信息的挑战。重点是分析社交媒体帖子中包含的图像和文本，以及一组自动数据处理工具，用于过滤，分类和使用人类的方法来支持数据分析师的内容。这种支持包括配置自动化工具的反馈和建议，以及众包收集公民的投入。通过讨论Crowd4SDG H2020欧洲项目中开发的三个案例研究来验证结果。

translated by 谷歌翻译

Intent Recognition in Conversational Recommender Systems

Sahar Moradizeyveh

分类：自然语言处理 | 机器学习

2022-12-06

Any organization needs to improve their products, services, and processes. In this context, engaging with customers and understanding their journey is essential. Organizations have leveraged various techniques and technologies to support customer engagement, from call centres to chatbots and virtual agents. Recently, these systems have used Machine Learning (ML) and Natural Language Processing (NLP) to analyze large volumes of customer feedback and engagement data. The goal is to understand customers in context and provide meaningful answers across various channels. Despite multiple advances in Conversational Artificial Intelligence (AI) and Recommender Systems (RS), it is still challenging to understand the intent behind customer questions during the customer journey. To address this challenge, in this paper, we study and analyze the recent work in Conversational Recommender Systems (CRS) in general and, more specifically, in chatbot-based CRS. We introduce a pipeline to contextualize the input utterances in conversations. We then take the next step towards leveraging reverse feature engineering to link the contextualized input and learning model to support intent recognition. Since performance evaluation is achieved based on different ML models, we use transformer base models to evaluate the proposed approach using a labelled dialogue dataset (MSDialogue) of question-answering interactions between information seekers and answer providers.

translated by 谷歌翻译

Are Query-Based Ontology Debuggers Really Helping Knowledge Engineers?

Patrick Rodler , Dietmar Jannach , Konstantin Schekotihin , Philipp Fleiss

分类：人工智能

2019-04-02

现实世界的语义或基于知识的系统，例如在生物医学领域，可能会变得大而复杂。因此，对此类系统知识库中故障的本地化和修复的工具支持对于它们的实际成功至关重要。相应地，近年来提出了许多知识库调试方法，尤其是基于本体的系统。基于查询的调试是一种相似的交互式方法，它通过向知识工程师提出一系列问题来定位观察到的问题的真正原因。存在这种方法的具体实现，例如本体论编辑器的OntodeBug插件prof \'eg \'e。为了验证新提出的方法比现有方法有利，研究人员通常依靠基于模拟的比较。但是，这种评估方法有一定的局限性，并且通常无法完全告知我们方法的真实性。因此，我们进行了不同的用户研究，以评估基于查询的本体调试的实际价值。研究的一个主要见解是，所考虑的交互方法确实比基于测试案例的替代算法调试更有效。我们还观察到，用户经常在此过程中犯错误，这突出了对用户需要回答的查询的仔细设计的重要性。

translated by 谷歌翻译

Proceedings of the 3rd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

translated by 谷歌翻译

A Review on Method Entities in the Academic Literature: Extraction, Evaluation, and Application

Yuzhuo Wang , Chengzhi Zhang , Kai Li

分类：自然语言处理

2022-09-08

在科学研究中，该方法是解决科学问题和关键研究对象的必不可少手段。随着科学的发展，正在提出，修改和使用许多科学方法。作者在抽象和身体文本中描述了该方法的详细信息，并且反映该方法名称的学术文献中的关键实体称为方法实体。在大量的学术文献中探索各种方法实体有助于学者了解现有方法，为研究任务选择适当的方法并提出新方法。此外，方法实体的演变可以揭示纪律的发展并促进知识发现。因此，本文对方法论和经验作品进行了系统的综述，重点是从全文学术文献中提取方法实体，并努力使用这些提取的方法实体来建立知识服务。首先提出了本综述涉及的关键概念的定义。基于这些定义，我们系统地审查了提取和评估方法实体的方法和指标，重点是每种方法的利弊。我们还调查了如何使用提取的方法实体来构建新应用程序。最后，讨论了现有作品的限制以及潜在的下一步。

translated by 谷歌翻译

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Kaustubh D. Dhole , Varun Gangal , Sebastian Gehrmann , Aadesh Gupta , Zhenhao Li , Saad Mahamood , Abinaya Mahendiran , Simon Mille , Ashish Srivastava , Samson Tan

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-06

数据增强是自然语言处理（NLP）模型的鲁棒性评估的重要组成部分，以及增强他们培训的数据的多样性。在本文中，我们呈现NL-Cogmenter，这是一种新的参与式Python的自然语言增强框架，它支持创建两个转换（对数据的修改）和过滤器（根据特定功能的数据拆分）。我们描述了框架和初始的117个变换和23个过滤器，用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构，Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用（\ url {https://github.com/gem-benchmark/nl-augmenter}）。

translated by 谷歌翻译

Dimensional Modeling of Emotions in Text with Appraisal Theories: Corpus Creation, Annotation Reliability, and Prediction

Enrica Troiano , Laura Oberländer , Roman Klinger

分类：自然语言处理

2022-06-10

情绪分析中最突出的任务是为文本分配情绪，并了解情绪如何在语言中表现出来。自然语言处理的一个重要观察结果是，即使没有明确提及情感名称，也可以通过单独参考事件来隐式传达情绪。在心理学中，被称为评估理论的情感理论类别旨在解释事件与情感之间的联系。评估可以被形式化为变量，通过他们认为相关的事件的人们的认知评估来衡量认知评估。其中包括评估事件是否是新颖的，如果该人认为自己负责，是否与自己的目标以及许多其他人保持一致。这样的评估解释了哪些情绪是基于事件开发的，例如，新颖的情况会引起惊喜或不确定后果的人可能引起恐惧。我们在文本中分析了评估理论对情绪分析的适用性，目的是理解注释者是否可以可靠地重建评估概念，如果可以通过文本分类器预测，以及评估概念是否有助于识别情感类别。为了实现这一目标，我们通过要求人们发短信描述触发特定情绪并披露其评估的事件来编译语料库。然后，我们要求读者重建文本中的情感和评估。这种设置使我们能够衡量是否可以纯粹从文本中恢复情绪和评估，并为判断模型的绩效指标提供人体基准。我们将文本分类方法与人类注释者的比较表明，两者都可以可靠地检测出具有相似性能的情绪和评估。我们进一步表明，评估概念改善了文本中情绪的分类。

translated by 谷歌翻译

VisRuler: Visual Analytics for Extracting Decision Rules from Bagged and Boosted Decision Trees

Angelos Chatzimparmpas , Rafael M. Martins , Andreas Kerren

分类：机器学习 | (统计)机器学习

2021-12-01

装袋和升压是在机器学习（ml）中的两个流行的集合方法，产生许多单独的决策树。由于这些方法的固有组合特性，它们通常以预测性能更优于单决定树或其他ML模型。然而，为每个决策树生成许多决定路径，增加了模型的整体复杂性，并阻碍了其在需要值得信赖和可解释的决策的域中的域，例如金融，社会护理和保健。因此，随着决策的数量升高，袋装和升降算法（例如随机森林和自适应升压）的解释性降低。在本文中，我们提出了一种视觉分析工具，该工具旨在帮助用户通过彻底的视觉检查工作流程从这种ML模型中提取决策，包括选择一套鲁棒和不同的模型（源自不同的集合学习算法），选择重要的功能根据他们的全球贡献，决定哪些决定对于全球解释（或本地，具体案件）是必不可少的。结果是基于多个模型的协议和用户出口的探索手动决策的最终决定。最后，我们通过用例，使用场景和用户学习评估患者的适用性和有效性。

translated by 谷歌翻译