Identifying anomalies has become one of the primary strategies towards security and protection procedures in computer networks. In this context, machine learning-based methods emerge as an elegant solution to identify such scenarios and learn irrelevant information so that a reduction in the identification time and possible gain in accuracy can be obtained. This paper proposes a novel feature selection approach called Finite Element Machines for Feature Selection (FEMa-FS), which uses the framework of finite elements to identify the most relevant information from a given dataset. Although FEMa-FS can be applied to any application domain, it has been evaluated in the context of anomaly detection in computer networks. The outcomes over two datasets showed promising results.
translated by 谷歌翻译
这项工作提供了可靠的nids(R-nids),一种新的机器学习方法(ML)的网络入侵检测系统(NIDS),允许ML模型在集成数据集上工作,从不同数据集中具有不同信息的学习过程。因此,R-NIDS针对更强大的模型的设计,比传统方法更好地概括。我们还提出了一个名为UNK21的新数据集。它是由三个最着名的网络数据集(UGR'16,USNW-NB15和NLS-KDD)构建,每个网络环境收集,使用不同的特征和类,通过使用数据聚合方法R-nids。在r-nids之后,在这项工作中,我们建议基于文献中的三个最常见的数据集的信息来构建两个着名的ML模型(一个线性和非线性的一个),用于NIDS评估中的三个,集成在UNK21中的那些。所提出的方法优惠展示了作为NIDS解决方案训练的两种ML模型的结果可以从这种方法中受益,在新提议的UNK21数据集上培训时能够更好地概括。此外,这些结果用统计工具仔细分析了对我们的结论提供了高度信心的统计工具。
translated by 谷歌翻译
机器学习算法已被广泛用于入侵检测系统,包括多层感知器(MLP)。在这项研究中,我们提出了一个两阶段模型,该模型结合了桦木聚类算法和MLP分类器,以提高网络异常多分类的性能。在我们提出的方法中,我们首先将桦木或kmeans作为无监督的聚类算法应用于CICIDS-2017数据集,以预先分组数据。然后,将生成的伪标签作为基于MLP分类器的训练的附加功能添加。实验结果表明,使用桦木和K-均值聚类进行数据预组化可以改善入侵检测系统的性能。我们的方法可以使用桦木聚类实现多分类的99.73%的精度,这比使用独立的MLP模型的类似研究要好。
translated by 谷歌翻译
Cyber intrusion attacks that compromise the users' critical and sensitive data are escalating in volume and intensity, especially with the growing connections between our daily life and the Internet. The large volume and high complexity of such intrusion attacks have impeded the effectiveness of most traditional defence techniques. While at the same time, the remarkable performance of the machine learning methods, especially deep learning, in computer vision, had garnered research interests from the cyber security community to further enhance and automate intrusion detections. However, the expensive data labeling and limitation of anomalous data make it challenging to train an intrusion detector in a fully supervised manner. Therefore, intrusion detection based on unsupervised anomaly detection is an important feature too. In this paper, we propose a three-stage deep learning anomaly detection based network intrusion attack detection framework. The framework comprises an integration of unsupervised (K-means clustering), semi-supervised (GANomaly) and supervised learning (CNN) algorithms. We then evaluated and showed the performance of our implemented framework on three benchmark datasets: NSL-KDD, CIC-IDS2018, and TON_IoT.
translated by 谷歌翻译
随着网络攻击和网络间谍活动的增长,如今需要更好,更强大的入侵检测系统(IDS)的需求更加有必要。 ID的基本任务是在检测Internet的攻击方面充当第一道防线。随着入侵者的入侵策略变得越来越复杂且难以检测,研究人员已经开始应用新颖的机器学习(ML)技术来有效地检测入侵者,从而保留互联网用户对整个互联网网络安全的信息和整体信任。在过去的十年中,基于ML和深度学习(DL)架构的侵入检测技术的爆炸激增,这些架构在各种基于网络安全的数据集上,例如DARPA,KDDCUP'99,NSL-KDD,CAIDA,CAIDA,CTU--- 13,UNSW-NB15。在这项研究中,我们回顾了当代文献,并提供了对不同类型的入侵检测技术的全面调查,该技术将支持向量机(SVMS)算法作为分类器。我们仅专注于在网络安全中对两个最广泛使用的数据集进行评估的研究,即KDDCUP'99和NSL-KDD数据集。我们提供了每种方法的摘要,确定了SVMS分类器的作用以及研究中涉及的所有其他算法。此外,我们以表格形式对每种方法进行了批判性综述,突出了所调查的每种方法的性能指标,优势和局限性。
translated by 谷歌翻译
Machine Learning (ML) approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs). Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks. Such systems depend on the availability of both (benign and malicious) network data classes during the training phase. However, attack data samples are often challenging to collect in most organisations due to security controls preventing the penetration of known malicious traffic to their networks. Therefore, this paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples. The novel one-class classification architecture consists of a histogram-based deep feed-forward classifier to extract useful network data features and use efficient outlier detection. The DOC classifier has been extensively evaluated using two benchmark NIDS datasets. The results demonstrate its superiority over current state-of-the-art one-class classifiers in terms of detection and false positive rates.
translated by 谷歌翻译
需要在机器学习模型中对最小参数设置的需求,以避免耗时的优化过程。$ k $ - 最终的邻居是在许多问题中使用的最有效,最直接的模型之一。尽管具有众所周知的性能,但它仍需要特定数据分布的$ K $值,从而需要昂贵的计算工作。本文提出了一个$ k $ - 最终的邻居分类器,该分类器绕过定义$ k $的值的需求。考虑到训练集的数据分布,该模型计算$ k $值。我们将提出的模型与标准$ K $ - 最近的邻居分类器和文献中的两个无参数版本进行了比较。11个公共数据集的实验证实了所提出方法的鲁棒性,因为所获得的结果相似甚至更好。
translated by 谷歌翻译
网络流量数据是不同网络协议下不同数据字节数据包的组合。这些流量数据包具有复杂的时变非线性关系。现有的最先进的方法通过基于相关性和使用提取空间和时间特征的混合分类技术将特征融合到多个子集中,通过将特征融合到多个子集中来提高这一挑战。这通常需要高计算成本和手动支持,这限制了它们的网络流量的实时处理。为了解决这个问题,我们提出了一种基于协方差矩阵的新型新颖特征提取方法,提取网络流量数据的空间时间特征来检测恶意网络流量行为。我们所提出的方法中的协方差矩阵不仅自然地对不同网络流量值之间的相互关系进行了编码,而且还具有落在riemannian歧管中的明确的几何形状。利莫曼歧管嵌入距离度量,便于提取用于检测恶意网络流量的判别特征。我们在NSL-KDD和UNSW-NB15数据集上进行了评估模型,并显示了我们提出的方法显着优于与数据集上的传统方法和其他现有研究。
translated by 谷歌翻译
人工智能(AI)和机器学习(ML)在网络安全挑战中的应用已在行业和学术界的吸引力,部分原因是对关键系统(例如云基础架构和政府机构)的广泛恶意软件攻击。入侵检测系统(IDS)使用某些形式的AI,由于能够以高预测准确性处理大量数据,因此获得了广泛的采用。这些系统托管在组织网络安全操作中心(CSOC)中,作为一种防御工具,可监视和检测恶意网络流,否则会影响机密性,完整性和可用性(CIA)。 CSOC分析师依靠这些系统来决定检测到的威胁。但是,使用深度学习(DL)技术设计的IDS通常被视为黑匣子模型,并且没有为其预测提供理由。这为CSOC分析师造成了障碍,因为他们无法根据模型的预测改善决策。解决此问题的一种解决方案是设计可解释的ID(X-IDS)。这项调查回顾了可解释的AI(XAI)的最先进的ID,目前的挑战,并讨论了这些挑战如何涉及X-ID的设计。特别是,我们全面讨论了黑匣子和白盒方法。我们还在这些方法之间的性能和产生解释的能力方面提出了权衡。此外,我们提出了一种通用体系结构,该建筑认为人类在循环中,该架构可以用作设计X-ID时的指南。研究建议是从三个关键观点提出的:需要定义ID的解释性,需要为各种利益相关者量身定制的解释以及设计指标来评估解释的需求。
translated by 谷歌翻译
Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues.
translated by 谷歌翻译
As the number of heterogenous IP-connected devices and traffic volume increase, so does the potential for security breaches. The undetected exploitation of these breaches can bring severe cybersecurity and privacy risks. Anomaly-based \acp{IDS} play an essential role in network security. In this paper, we present a practical unsupervised anomaly-based deep learning detection system called ARCADE (Adversarially Regularized Convolutional Autoencoder for unsupervised network anomaly DEtection). With a convolutional \ac{AE}, ARCADE automatically builds a profile of the normal traffic using a subset of raw bytes of a few initial packets of network flows so that potential network anomalies and intrusions can be efficiently detected before they cause more damage to the network. ARCADE is trained exclusively on normal traffic. An adversarial training strategy is proposed to regularize and decrease the \ac{AE}'s capabilities to reconstruct network flows that are out-of-the-normal distribution, thereby improving its anomaly detection capabilities. The proposed approach is more effective than state-of-the-art deep learning approaches for network anomaly detection. Even when examining only two initial packets of a network flow, ARCADE can effectively detect malware infection and network attacks. ARCADE presents 20 times fewer parameters than baselines, achieving significantly faster detection speed and reaction time.
translated by 谷歌翻译
该行业许多领域的自动化越来越多地要求为检测异常事件设计有效的机器学习解决方案。随着传感器的普遍存在传感器监测几乎连续地区的复杂基础设施的健康,异常检测现在可以依赖于以非常高的频率进行采样的测量,从而提供了在监视下的现象的非常丰富的代表性。为了充分利用如此收集的信息,观察不能再被视为多变量数据,并且需要一个功能分析方法。本文的目的是探讨近期对实际数据集的功能设置中异常检测技术的性能。在概述最先进的和视觉描述性研究之后,比较各种异常检测方法。虽然功能设置中的异常分类(例如,形状,位置)在文献中记录,但为所识别的异常分配特定类型似乎是一个具有挑战性的任务。因此,鉴于模拟研究中的这些突出显示类型,现有方法的强度和弱点是基准测试。接下来在两个数据集上评估异常检测方法,与飞行中的直升机监测和建筑材料的光谱相同有关。基准分析由从业者的建议指导结束。
translated by 谷歌翻译
在智能交通系统中,交通拥堵异常检测至关重要。运输机构的目标有两个方面:监视感兴趣领域的一般交通状况,并在异常拥堵状态下定位道路细分市场。建模拥塞模式可以实现这些目标,以实现全市道路的目标,相当于学习多元时间序列(MTS)的分布。但是,现有作品要么不可伸缩,要么无法同时捕获MTS中的空间信息。为此,我们提出了一个由数据驱动的生成方法组成的原则性和全面的框架,该方法可以执行可拖动的密度估计来检测流量异常。我们的方法在特征空间中的第一群段段,然后使用条件归一化流以在无监督的设置下在群集级别识别异常的时间快照。然后,我们通过在异常群集上使用内核密度估计器来识别段级别的异常。关于合成数据集的广泛实验表明,我们的方法在召回和F1得分方面显着优于几种最新的拥塞异常检测和诊断方法。我们还使用生成模型来采样标记的数据,该数据可以在有监督的环境中训练分类器,从而减轻缺乏在稀疏设置中进行异常检测的标记数据。
translated by 谷歌翻译
互联的战地信息共享设备的扩散,称为战场互联网(Iobt),介绍了几个安全挑战。 Iobt运营环境所固有的是对抗机器学习的实践,试图规避机器学习模型。这项工作探讨了在网络入侵检测系统设置中对异常检测的成本效益无监督学习和基于图形的方法的可行性,并利用了集合方法来监督异常检测问题的学习。我们在培训监督模型时纳入了一个现实的对抗性培训机制,以实现对抗性环境的强大分类性能。结果表明,无监督和基于图形的方法在通过两个级别的监督堆叠集合方法检测异常(恶意活动)时表现优于检测异常(恶意活动)。该模型由第一级别的三个不同的分类器组成,然后是第二级的天真贝叶斯或决策树分类器。对于所有测试水平的两个分类器,该模型将在0.97高于0.97以上的F1分数。值得注意的是,天真贝叶斯是最快的两个分类器平均1.12秒,而决策树保持最高的AUC评分为0.98。
translated by 谷歌翻译
The Internet of Things (IoT) is a system that connects physical computing devices, sensors, software, and other technologies. Data can be collected, transferred, and exchanged with other devices over the network without requiring human interactions. One challenge the development of IoT faces is the existence of anomaly data in the network. Therefore, research on anomaly detection in the IoT environment has become popular and necessary in recent years. This survey provides an overview to understand the current progress of the different anomaly detection algorithms and how they can be applied in the context of the Internet of Things. In this survey, we categorize the widely used anomaly detection machine learning and deep learning techniques in IoT into three types: clustering-based, classification-based, and deep learning based. For each category, we introduce some state-of-the-art anomaly detection methods and evaluate the advantages and limitations of each technique.
translated by 谷歌翻译
The detection of anomalies in time series data is crucial in a wide range of applications, such as system monitoring, health care or cyber security. While the vast number of available methods makes selecting the right method for a certain application hard enough, different methods have different strengths, e.g. regarding the type of anomalies they are able to find. In this work, we compare six unsupervised anomaly detection methods with different complexities to answer the questions: Are the more complex methods usually performing better? And are there specific anomaly types that those method are tailored to? The comparison is done on the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We compare the six methods by analyzing the experimental results on a dataset- and anomaly type level after tuning the necessary hyperparameter for each method. Additionally we examine the ability of individual methods to incorporate prior knowledge about the anomalies and analyse the differences of point-wise and sequence wise features. We show with broad experiments, that the classical machine learning methods show a superior performance compared to the deep learning methods across a wide range of anomaly types.
translated by 谷歌翻译
自动日志文件分析可以尽早发现相关事件,例如系统故障。特别是,自我学习的异常检测技术在日志数据中捕获模式,随后向系统操作员报告意外的日志事件事件,而无需提前提供或手动对异常情况进行建模。最近,已经提出了越来越多的方法来利用深度学习神经网络为此目的。与传统的机器学习技术相比,这些方法证明了出色的检测性能,并同时解决了不稳定数据格式的问题。但是,有许多不同的深度学习体系结构,并且编码由神经网络分析的原始和非结构化日志数据是不平凡的。因此,我们进行了系统的文献综述,概述了部署的模型,数据预处理机制,异常检测技术和评估。该调查没有定量比较现有方法,而是旨在帮助读者了解不同模型体系结构的相关方面,并强调未来工作的开放问题。
translated by 谷歌翻译
为了允许机器学习算法从原始数据中提取知识,必须首先清除,转换,并将这些数据置于适当的形式。这些通常很耗时的阶段被称为预处理。预处理阶段的一个重要步骤是特征选择,其目的通过减少数据集的特征量来更好地执行预测模型。在这些数据集中,不同事件的实例通常是不平衡的,这意味着某些正常事件被超出,而其他罕见事件非常有限。通常,这些罕见的事件具有特殊的兴趣,因为它们具有比正常事件更具辨别力。这项工作的目的是过滤提供给这些罕见实例的特征选择方法的实例,从而积极影响特征选择过程。在这项工作过程中,我们能够表明这种过滤对分类模型的性能以及异常值检测方法适用于该过滤。对于某些数据集,所产生的性能增加仅为百分点,但对于其他数据集,我们能够实现高达16%的性能的增加。这项工作应导致预测模型的改进以及在预处理阶段的过程中的特征选择更好的可解释性。本着公开科学的精神,提高了我们的研究领域的透明度,我们已经在公开的存储库中提供了我们的所有源代码和我们的实验结果。
translated by 谷歌翻译
In the era of Internet of Things (IoT), network-wide anomaly detection is a crucial part of monitoring IoT networks due to the inherent security vulnerabilities of most IoT devices. Principal Components Analysis (PCA) has been proposed to separate network traffics into two disjoint subspaces corresponding to normal and malicious behaviors for anomaly detection. However, the privacy concerns and limitations of devices' computing resources compromise the practical effectiveness of PCA. We propose a federated PCA-based Grassmannian optimization framework that coordinates IoT devices to aggregate a joint profile of normal network behaviors for anomaly detection. First, we introduce a privacy-preserving federated PCA framework to simultaneously capture the profile of various IoT devices' traffic. Then, we investigate the alternating direction method of multipliers gradient-based learning on the Grassmann manifold to guarantee fast training and the absence of detecting latency using limited computational resources. Empirical results on the NSL-KDD dataset demonstrate that our method outperforms baseline approaches. Finally, we show that the Grassmann manifold algorithm is highly adapted for IoT anomaly detection, which permits drastically reducing the analysis time of the system. To the best of our knowledge, this is the first federated PCA algorithm for anomaly detection meeting the requirements of IoT networks.
translated by 谷歌翻译
Multi-class ensemble classification remains a popular focus of investigation within the research community. The popularization of cloud services has sped up their adoption due to the ease of deploying large-scale machine-learning models. It has also drawn the attention of the industrial sector because of its ability to identify common problems in production. However, there are challenges to conform an ensemble classifier, namely a proper selection and effective training of the pool of classifiers, the definition of a proper architecture for multi-class classification, and uncertainty quantification of the ensemble classifier. The robustness and effectiveness of the ensemble classifier lie in the selection of the pool of classifiers, as well as in the learning process. Hence, the selection and the training procedure of the pool of classifiers play a crucial role. An (ensemble) classifier learns to detect the classes that were used during the supervised training. However, when injecting data with unknown conditions, the trained classifier will intend to predict the classes learned during the training. To this end, the uncertainty of the individual and ensemble classifier could be used to assess the learning capability. We present a novel approach for novel detection using ensemble classification and evidence theory. A pool selection strategy is presented to build a solid ensemble classifier. We present an architecture for multi-class ensemble classification and an approach to quantify the uncertainty of the individual classifiers and the ensemble classifier. We use uncertainty for the anomaly detection approach. Finally, we use the benchmark Tennessee Eastman to perform experiments to test the ensemble classifier's prediction and anomaly detection capabilities.
translated by 谷歌翻译