智能论文笔记

Ensemble learning techniques for intrusion detection system in the context of cybersecurity

Andricson Abeline Moreira , Carlos A. C. Tojeiro , Carlos J. Reis , Gustavo Henrique Massaro , Igor Andrade Brito e Kelton A. P. da Costa

分类：机器学习

2022-12-21

Recently, there has been an interest in improving the resources available in Intrusion Detection System (IDS) techniques. In this sense, several studies related to cybersecurity show that the environment invasions and information kidnapping are increasingly recurrent and complex. The criticality of the business involving operations in an environment using computing resources does not allow the vulnerability of the information. Cybersecurity has taken on a dimension within the universe of indispensable technology in corporations, and the prevention of risks of invasions into the environment is dealt with daily by Security teams. Thus, the main objective of the study was to investigate the Ensemble Learning technique using the Stacking method, supported by the Support Vector Machine (SVM) and k-Nearest Neighbour (kNN) algorithms aiming at an optimization of the results for DDoS attack detection. For this, the Intrusion Detection System concept was used with the application of the Data Mining and Machine Learning Orange tool to obtain better results

translated by 谷歌翻译

Predição de Incidência de Lesão por Pressão em Pacientes de UTI usando Aprendizado de Máquina

Henrique P. Silva , Arthur D. Reys , Daniel S. Severo , Dominique H. Ruther , Flávio A. O. B. Silva , Maria C. S. S. Guimarães , Roberto Z. A. Pinto , Saulo D. S. Pedro , Túlio P. Navarro , Danilo Silva

分类：机器学习

2021-12-23

压力溃疡在ICU患者中具有很高的患病率，但如果以初始阶段识别，则可预防。在实践中，布拉登规模用于分类高风险患者。本文通过使用MIMIC-III V1.4中可用的数据调查了在电子健康中使用机器学习记录数据的使用。制定了两个主要贡献：评估考虑在住宿期间所有预测的模型的新方法，以及用于机器学习模型的新培训方法。结果与现有技术相比，表现出卓越的性能;此外，所有型号在精密召回曲线中的每个工作点都超过了Braden刻度。 - - les \〜oes por按\〜ao possuem alta preval \ ^ encia em pacientes de Uti e s \〜ao preven \'iveis ao serem endicidificadas em Est \'agios Iniciais。 na pr \'atica materiza-se a escala de braden para classifica \ c {c} \〜ao de pacientes em risco。 Este Artigo Investiga o Uso de Apenizado de M \'Aquina Em Dados de Registros Eletr \ ^ Onicos Para Este Fim，Parir Da Base dados Mimic-III V1.4。 s \〜ao feitas duas contribui \ c {c} \〜oes principais：uma nova abordagem para a avalia \ c {c} \〜ao dos modelos e da escala da escala de braden levando em conta todas作为predi \ c {c} \ 〜oes feitas ao longo das interna \ c {c} \〜oes，euro novo m \'etodo de treinamento para os modelos de aprendizo de m \'aquina。 os结果os overidos superam o estado da arte everifica-se que os modelos superam意义a escala de braden em todos oS pontos de Opera \ c {c} \〜〜ao da curva de precis \〜ao por sensibilidade。

translated by 谷歌翻译

ALT: um software para análise de legibilidade de textos em Língua Portuguesa

Gleice Carvalho de Lima Moreno , Marco P. M. de Souza , Nelson Hein , Adriana Kroenke Hein

分类：自然语言处理

2022-03-23

在人类生活的最初阶段，沟通被视为社会互动的过程，始终是达成当事方之间达成共识的最佳方法。在此过程中的理解和可信度对于相互协议的验证至关重要。但是，如何做到这一沟通才能达到巨大的群众？当寻求的是信息及其批准时，这是主要的挑战。在这种情况下，本研究介绍了ALT软件，该软件是由适应葡萄牙语的原始可读性指标开发的，以减少通信困难。该软件的开发是由哈贝马斯（Habermas）的沟通行动理论激励的，哈贝马斯（Habermas）使用多学科风格来衡量与公众建立和维持与公众建立和保持安全健康关系的沟通渠道中话语的可信度。 - 没有est \'agio da vida humana a comunica \ c {c} \ 〜ao，vista como um como um como um como de intera \ c {c} \ 〜ao社交，foi semper o melhor caminho para para para o consenso Entre作为partes。 o entendimento e credibilidade nesse processo s \ 〜Ao Fundamentais para para que o acordo m \'utuo seja seja valyado。 Mas，Como faz \^e-lo de forma que essa comunica \ c {c} \ 〜ao alcance a grande massa？ eSse \'o principtal desafio que se busca \'e difus \ 〜ao da informa \ c {c} \ 〜ao a sua aprova \ c {c {c} \ 〜ao。 Nesse Contectiono，Este estudo apresenta o Software Alt，desenvolvido a partir de m \'eTricas de legibilidade originais aDaptadas para a l \'ingua polduguesa，dispon \'ivel'ivel na web，para reduzir，dificuldades na comunica na comunica \ comunica \ c \ c} AO。 O desenvolvimento do software foi motivado pela teoria do agir comunicativo de Habermas, que faz uso de um estilo multidisciplinar para medir a credibilidade do discurso nos canais de comunica\c{c}\~ao utilizados para construir e manter uma rela\c{c } \ 〜Ao Segura E Saud \'avel com o p \'ublico。

translated by 谷歌翻译

Predição da Idade Cerebral a partir de Imagens de Ressonância Magnética utilizando Redes Neurais Convolucionais

Victor H. R. Oliveira , Augusto Antunes , Alexandre S. Soares , Arthur D. Reys , Robson Z. Júnior , Saulo D. S. Pedro , Danilo Silva

分类：计算机视觉

2021-12-23

在这项工作中，研究了来自磁共振图像的脑年龄预测的深度学习技术，旨在帮助鉴定天然老化过程的生物标志物。生物标志物的鉴定可用于检测早期神经变性过程，以及预测与年龄相关或与非年龄相关的认知下降。在这项工作中实施并比较了两种技术：应用于体积图像的3D卷积神经网络和应用于从轴向平面的切片的2D卷积神经网络，随后融合各个预测。通过2D模型获得的最佳结果，其达到了3.83年的平均绝对误差。 - Neste Trabalho S \〜AO InvestigaDAS T \'Ecnicas de Aprendizado Profundo Para a previ \ c {c} \〜ate daade脑电站a partir de imagens de resson \ ^ ancia magn \'etica，Visando辅助Na Identifica \ c {C} \〜AO de BioMarcadores Do Processo Natural de Envelhecimento。一个identifica \ c {c} \〜ao de bioMarcarcores \'e \'util para a detec \ c {c} \〜ao de um processo neurodegenerativo em Est \'Agio无数，Al \'em de possibilitar Prever Um decl 'inio cognitivo relacionado ou n \〜ao \`一个懒惰。 Duas T \'ECICAS S \〜AO ImportyAdas E Comparadas Teste Trabalho：Uma Rede神经卷应3D APLICADA NA IMAGEM VOLUM \'ETRICA E UME REDE神经卷轴2D APLICADA A FATIAS DO PANIAS轴向，COM后面fus \〜AO DAS PREDI \ C {c} \ \ oes个人。 o Melhor ResultAdo Foi optido Pelo Modelo 2D，Que Alcan \ C {C} OU UM ERRO M \'EDIO ABSOLUTO DE 3.83 ANOS。

translated by 谷歌翻译

The Brazilian Data at Risk in the Age of AI?

Raoni F. da S. Teixeira , Rafael B. Januzi , Fabio A. Faria

分类：计算机视觉 | 人工智能

2022-05-03

Advances in image processing and analysis as well as machine learning techniques have contributed to the use of biometric recognition systems in daily people tasks. These tasks range from simple access to mobile devices to tagging friends in photos shared on social networks and complex financial operations on self-service devices for banking transactions. In China, the use of these systems goes beyond personal use becoming a country's government policy with the objective of monitoring the behavior of its population. On July 05th 2021, the Brazilian government announced acquisition of a biometric recognition system to be used nationwide. In the opposite direction to China, Europe and some American cities have already started the discussion about the legality of using biometric systems in public places, even banning this practice in their territory. In order to open a deeper discussion about the risks and legality of using these systems, this work exposes the vulnerabilities of biometric recognition systems, focusing its efforts on the face modality. Furthermore, it shows how it is possible to fool a biometric system through a well-known presentation attack approach in the literature called morphing. Finally, a list of ten concerns was created to start the discussion about the security of citizen data and data privacy law in the Age of Artificial Intelligence (AI).

translated by 谷歌翻译

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

Omar Alfarisi , Zeyar Aung , Mohamed Sassi

分类：机器学习

2021-11-10

为了定义最佳机器学习算法，该决定并不容易，我们将选择它。为了帮助未来的研究人员，我们在本文中描述了最好的算法中的最佳状态。我们构建了一个合成数据集，并执行了5个不同算法的监督机器学习。对于异质性，我们确定了随机森林等，是最好的算法。

translated by 谷歌翻译

Explainable predictions of different machine learning algorithms used to predict Early Stage diabetes

V. Vakil , S. Pachchigar , C. Chavda , S. Soni

分类：机器学习 | 人工智能

2021-11-18

机器学习和人工智能可广泛用于诊断慢性疾病，以便可以在关键时间内进行必要的预防治疗。糖尿病是由几种机器学习算法容易诊断的主要疾病之一。早期诊断至关重要，以防止危险后果。在本文中，我们对多种机器学习算法的比较分析了。随机森林，决策树，人工神经网络，K最近邻居，支持向量机和XGBoost以及使用Shav的特征归因，以确定预测从Sylhet医院收集的数据集上的糖尿病的最重要特征。根据所获得的实验结果，随机森林算法表现优于所有其他算法，在该特定数据集中的精度为99％。

translated by 谷歌翻译

Common human diseases prediction using machine learning based on survey data

Jabir Al Nahian , Abu Kaisar Mohammad Masum , Sheikh Abujar , Md. Jueal Mia

分类：机器学习

2022-09-22

在这个时代，作为医疗的主要重点，这一时刻已经到来了。尽管令人印象深刻，但已经开发出来检测疾病的多种技术。此时，有一些类型的疾病COVID-19，正常烟，偏头痛，肺病，心脏病，肾脏疾病，糖尿病，胃病，胃病，胃病，骨骼疾病，自闭症是非常常见的疾病。在此分析中，我们根据疾病的症状进行了分析疾病症状的预测。我们研究了一系列症状，并接受了人们的调查以完成任务。已经采用了几种分类算法来训练模型。此外，使用性能评估矩阵来衡量模型的性能。最后，我们发现零件分类器超过了其他分类器。

translated by 谷歌翻译

Examining stability of machine learning methods for predicting dementia at early phases of the disease

Sinan Faouri , Mahmood AlBashayreh , Mohammad Azzeh

分类：机器学习 | 人工智能

2022-09-10

痴呆症是一种神经精神脑障碍，通常会在一个或多个脑细胞停止部分或根本停止工作时发生。在疾病的早期阶段诊断这种疾病是从不良后果中挽救生命并为他们提供更好的医疗保健的至关重要的任务。事实证明，机器学习方法在预测疾病早期痴呆症方面是准确的。痴呆的预测在很大程度上取决于通常从归一化的全脑体积（NWBV）和地图集缩放系数（ASF）收集的收集数据类型，这些数据通常测量并从磁共振成像（MRIS）中进行校正。年龄和性别等其他生物学特征也可以帮助诊断痴呆症。尽管许多研究使用机器学习来预测痴呆症，但我们无法就这些方法的稳定性得出结论，而这些方法在不同的实验条件下更准确。因此，本文研究了有关痴呆预测的机器学习算法的性能的结论稳定性。为此，使用7种机器学习算法和两种功能还原算法，即信息增益（IG）和主成分分析（PCA）进行大量实验。为了检查这些算法的稳定性，IG的特征选择阈值从20％更改为100％，PCA尺寸从2到8。这导致了7x9 + 7x7 = 112实验。在每个实验中，都记录了各种分类评估数据。获得的结果表明，在七种算法中，支持向量机和天真的贝叶斯是最稳定的算法，同时更改选择阈值。同样，发现使用IG似乎比使用PCA预测痴呆症更有效。

translated by 谷歌翻译

Digital Twin-based Intrusion Detection for Industrial Control Systems

Seba Anna Varghese , Alireza Dehlaghi Ghadim , Ali Balador , Zahra Alimadadi , Panos Papadimitratos

分类：机器学习

2022-07-20

数字双胞胎最近对工业控制系统（ICS）的模拟，优化和预测维护产生了重大兴趣。最近的研究讨论了在工业系统中使用数字双胞胎进行入侵检测的可能性。因此，这项研究为工业控制系统的基于数字双胞胎的安全框架做出了贡献，从而扩展了其模拟攻击和防御机制的能力。在独立的开源数字双胞胎上实施了四种类型的过程感知攻击方案：命令注入，网络拒绝服务（DOS），计算的测量修改和天真的测量修改。根据八种监督机器学习算法的离线评估，建议将堆叠的合奏分类器作为实时入侵检测。通过组合各种算法的预测，设计的堆叠模型就F1得分和准确性而言优于先前的方法，同时可以在接近实时（0.1秒）中检测和分类入侵。这项研究还讨论了拟议的基于数字双胞胎的安全框架的实用性和好处。

translated by 谷歌翻译

Predicting Ulnar Collateral Ligament Injury in Rookie Major League Baseball Pitchers

Sean A. Rendar , Fenglong Ma

分类：机器学习

2022-06-30

在不断增长的机器学习和数据分析世界中，学者们正在寻找解决现实世界中问题的新方法。一种解决方案是通过医疗保健，体育统计和数据科学之间的交集来实现的。在美国职棒大联盟（MLB）的领域内，投手被视为最重要的阵容。他们通常是薪水最高的球员之一，对于特许经营的成功至关重要，但是他们更容易受到伤害，使他们在整个赛季中占据一席之地。尺骨副韧带（UCL）是肘部中的小韧带，可控制投手投掷手臂的强度和稳定性。由于重复的压力，投手在职业生涯中部分或完全撕裂它并不少见。修复这种伤害需要UCL重建手术，以非正式地称为汤米·约翰手术。在这个讲台摘要中，我们想研究是否可以通过分析在线投手数据来使用机器学习技术来预测UCL伤害。

translated by 谷歌翻译

A Combined PCA-MLP Network for Early Breast Cancer Detection

Md. Wahiduzzaman Khan Arnob , Arunima Dey Pooja , Md. Saif Hassan Onim

分类：计算机视觉

2022-06-18

乳腺癌是所有癌症类型的第二大责任，多年来一直是许多死亡的原因，尤其是在女性中。现有诊断系统的任何即兴创作以检测癌症，都可以最大程度地减少死亡率。此外，最近阶段的癌症检测是科学界的主要研究领域，以提高生存率。正确选择机器学习工具可以确保高精度的早期预后。在本文中，我们研究了不同的机器学习算法，以检测患者是否可能面临乳腺癌。由于早期特征的隐式行为，我们实施了与PCA集成的多层感知模型，并建议它比其他检测算法更可行。我们的4层MLP-PCA网络已获得100％的最佳精度，而BCCD数据集的平均精度为90.48％。

translated by 谷歌翻译

CPS Attack Detection under Limited Local Information in Cyber Security: A Multi-node Multi-class Classification Ensemble Approach

Junyi Liu , Yifu Tang , Haimeng Zhao , Xieheng Wang , Fangyu Li , Jingyi Zhang

分类：机器学习 | (统计)机器学习

2022-09-01

网络安全漏洞是分布式网络物理系统（CPS）的常见异常。但是，即使使用尖端人工智能（AI）方法，网络安全漏洞分类仍然是一个困难的问题。在本文中，我们研究了网络安全性的多类分类问题，以进行攻击检测。考虑了一个具有挑战性的多节点数据审查案例。在这种情况下，当本地数据不完整时，每个数据中心/节点中的数据都无法共享。特别是，本地节点仅包含多个类别的一部分。为了培训全球多级分类器而不在所有节点上共享原始数据，我们研究的主要结果是设计多节点多级分类合奏方法。通过从每个局部节点收集二进制分类器和数据密度的估计参数，每个局部节点的丢失信息都可以完成，以构建全局多类分类器。进行数值实验以验证在多节点数据审查情况下提出的方法的有效性。在这种情况下，我们甚至显示了对全数据ATA方法的拟议方法的表现。

translated by 谷歌翻译

Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text

Mai A. Shaaban , Yasser F. Hassan , Shawkat K. Guirguis

分类：自然语言处理 | 人工智能 | 机器学习

2021-10-10

人们使用移动消息传递服务的增加导致了像网络钓鱼一样的社会工程攻击的传播，考虑到垃圾邮件文本是传播网络钓鱼攻击的主要因素之一，以窃取信用卡和密码等敏感数据。此外，关于Covid-19大流行的谣言和不正确的医疗信息在社交媒体上广泛分享，导致人们的恐惧和混乱。因此，过滤垃圾邮件内容对于降低风险和威胁至关重要。以前的研究依赖于机器学习和深入学习的垃圾邮件分类方法，但这些方法有两个限制。机器学习模型需要手动功能工程，而深度神经网络需要高计算成本。本文介绍了一种动态的深度集合模型，用于垃圾邮件检测，调整其复杂性并自动提取功能。所提出的模型利用卷积和汇集层进行特征提取以及基础分类器，如随机森林和极其随机的树木，用于将文本分类为垃圾邮件或合法的树。此外，该模型采用了Boosting和Bagging等集合学习程序。结果，该模型达到了高精度，召回，F1分数和精度为98.38％。

translated by 谷歌翻译

Machine Learning and Ensemble Approach Onto Predicting Heart Disease

Aaditya Surya

分类：机器学习 | 人工智能

2021-11-16

一个躺在胸腔里的心脏的四个基本腔腔对一个人的生存至关重要，但讽刺地证明是最脆弱的。心血管疾病（CVD）也通常被称为心脏病，在过去几十年中，人类在人类死亡原因中稳步发展。考虑到这一点统计，很明显，患有CVDS的患者需要快速且正确的诊断，以便于早期治疗来减少死亡的机会。本文试图利用提供的数据，以培训分类模型，如逻辑回归，k最近邻居，支持向量机，决策树，高斯天真贝叶斯，随机森林和多层感知（人工神经网络），最终使用柔软投票合奏技术，以便尽可能多地诊断。

translated by 谷歌翻译

Time Majority Voting, a PC-based EEG Classifier for Non-expert Users

Guangyao Dou , Zheng Zhou , Xiaodong Qu

分类：机器学习

2022-07-26

使用机器学习和深度学习来预测脑电图（EEG）信号的认知任务是脑部计算机界面（BCI）的快速前进的领域。与计算机视觉和自然语言处理的领域相反，这些试验的数据数量仍然很小。开发基于PC的机器学习技术来增加非专家最终用户的参与，可以帮助解决此数据收集问题。我们为机器学习创建了一种新颖的算法，称为时间多数投票（TMV）。在我们的实验中，TMV的性能比尖端算法更好。它可以在个人计算机上有效运行，以进行涉及BCI的分类任务。这些可解释的数据还可以帮助最终用户和研究人员更好地理解脑电图测试。

translated by 谷歌翻译

A Survey on Ensemble Learning under the Era of Deep Learning

Yongquan Yang , Haijun Lv , Ning Chen

分类：机器学习 | 人工智能

2021-01-21

由于深度学习（主要是深度神经网络）在各种人工智能应用中的主导地位，最近基于深度神经网络（集成深度学习）的合奏学习表明，在改善学习系统的概括方面表现出了重要的表现。但是，由于现代深层神经网络通常具有数百万到数十亿的参数，因此训练多个基础深度学习者和与合奏深层学习者进行测试的时间和空间远大于传统的合奏学习。尽管已经提出了一些快速整体深度学习的算法，以促进某些应用程序中的集合深度学习的部署，但仍需要在特定领域的许多应用程序中取得进一步的进步，在这些领域中，开发时间和计算资源通常受到限制或数据。要处理的是很大的维度。需要解决的紧急问题是如何利用整体深度学习的重要优势，同时减少所需的费用，从而使特定领域的更多应用程序可以从中受益。为了减轻这个问题，必须了解在深度学习时代的合奏学习如何发展。因此，在本文中，我们提出了基本讨论，重点关注已发表的作品，方法，最新进展和传统合奏学习和整体深度学习的不可涉及的数据分析。我们希望本文将有助于实现在深度学习时代，合奏学习未来发展所面临的内在问题和技术挑战。

translated by 谷歌翻译

Graph-based Ensemble Machine Learning for Student Performance Prediction

Yinkai Wang , Aowei Ding , Kaiyi Guan , Shixi Wu , Yuanqi Du

分类：机器学习 | 人工智能

2021-12-15

学生绩效预测是了解学生需求的重要研究问题，呈现适当的学习机会/资源，并培养教学质量。但是，传统的机器学习方法无法产生稳定和准确的预测结果。在本文中，我们提出了一种基于图的集合机器学习方法，旨在通过多种方法的共识来提高单机学习方法的稳定性。具体而言，我们利用监督预测方法和无监督的聚类方法，构建一种迭代方法，它在二分图中传播，以及收敛到更稳定和准确的预测结果。广泛的实验表明了我们提出的方法在预测更准确的学生表现方面的有效性。具体而言，我们的模型优于最佳的传统机器学习算法，以预测准确度高达14.8％。

translated by 谷歌翻译

Data transformation based optimized customer churn prediction model for the telecommunication industry

Joydeb Kumar Sana , Mohammad Zoynul Abedin , M. Sohel Rahman , M. Saifur Rahman

分类：机器学习

2022-01-11

数据转换（DT）是将原始数据转换为支持特定分类算法的形式的过程，并有助于分析特殊目的的数据。为了提高预测性能，我们调查了各种数据变换方法。本研究在电信行业（TCI）中的客户流失预测（CCP）背景下进行，客户疲劳是一种常见的现象。我们提出了一种与CCP问题的机器学习模型相结合的数据转换方法的新方法。我们在公开的TCI数据集中进行了实验，并在广泛使用的评估措施方面评估了性能（例如，AUC，精确，召回和F测量）。在这项研究中，我们提出了全面的比较来肯定转化方法的影响。比较结果和统计检验证明，大多数所提出的基于数据转换的优化模型显着提高了CCP的性能。总的来说，通过这份手稿介绍了电信行业的有效和优化的CCP模型。

translated by 谷歌翻译

Towards Understanding Fairness and its Composition in Ensemble Machine Learning

Usman Gohar , Sumon Biswas , Hridesh Rajan

分类：机器学习

2022-12-08

Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in different categories of ensembles. Understanding the impact of ensemble hyperparameters on fairness will help programmers design fair ensembles. Today, we do not understand these fully for different ensemble algorithms. In this paper, we comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design. Finally, our benchmark can be leveraged for further research on fair ensembles. To the best of our knowledge, this is one of the first and largest studies on fairness composition in ensembles yet presented in the literature.

translated by 谷歌翻译