智能论文笔记

Multi-view Multi-label Anomaly Network Traffic Classification based on MLP-Mixer Neural Network

Yu Zheng , Zhangxuan Dang , Chunlei Peng , Chao Yang , Xinbo Gao

分类：机器学习 | 人工智能 | 计算机视觉

2022-10-30

Network traffic classification is the basis of many network security applications and has attracted enough attention in the field of cyberspace security. Existing network traffic classification based on convolutional neural networks (CNNs) often emphasizes local patterns of traffic data while ignoring global information associations. In this paper, we propose a MLP-Mixer based multi-view multi-label neural network for network traffic classification. Compared with the existing CNN-based methods, our method adopts the MLP-Mixer structure, which is more in line with the structure of the packet than the conventional convolution operation. In our method, the packet is divided into the packet header and the packet body, together with the flow features of the packet as input from different views. We utilize a multi-label setting to learn different scenarios simultaneously to improve the classification performance by exploiting the correlations between different scenarios. Taking advantage of the above characteristics, we propose an end-to-end network traffic classification method. We conduct experiments on three public datasets, and the experimental results show that our method can achieve superior performance.

translated by 谷歌翻译

Semi-WTC: A Practical Semi-supervised Framework for Attack Categorization through Weight-Task Consistency

Zihan Li , Wentao Chen , Zhiqing Wei , Xingqi Luo , Bing Su

分类：机器学习

2022-05-19

监督学习已被广泛用于攻击分类，需要高质量的数据和标签。但是，数据通常是不平衡的，很难获得足够的注释。此外，有监督的模型应遵守现实世界的部署问题，例如防御看不见的人造攻击。为了应对挑战，我们提出了一个半监督的细粒攻击分类框架，该框架由编码器和两个分支机构结构组成，并且该框架可以推广到不同的监督模型。具有残留连接的多层感知器用作提取特征并降低复杂性的编码器。提出了复发原型模块（RPM）以半监督的方式有效地训练编码器。为了减轻数据不平衡问题，我们将重量任务一致性（WTC）引入RPM的迭代过程中，通过将较大的权重分配给损失函数中较少样本的类别。此外，为了应对现实世界部署中的新攻击，我们提出了一种主动调整重新采样（AAR）方法，该方法可以更好地发现看不见的样本数据的分布并调整编码器的参数。实验结果表明，我们的模型优于最先进的半监督攻击检测方法，分类精度提高了3％，训练时间降低了90％。

translated by 谷歌翻译

Open-Source Framework for Encrypted Internet and Malicious Traffic Classification

Ofek Bader , Adi Lichy , Amit Dvir , Ran Dubin , Chen Hajaj

分类：机器学习

2022-06-21

互联网流量分类在网络可见性，服务质量（QoS），入侵检测，经验质量（QOE）和交通趋势分析中起关键作用。为了提高隐私，完整性，机密性和协议混淆，当前的流量基于加密协议，例如SSL/TLS。随着文献中机器学习（ML）和深度学习（DL）模型的使用增加，由于缺乏标准化的框架，不同模型和方法之间的比较变得繁琐且困难。在本文中，我们提出了一个名为OSF-EIMTC的开源框架，该框架可以提供学习过程的完整管道。从著名的数据集到提取新的和知名的功能，它提供了著名的ML和DL模型（来自交通分类文献）的实现以及评估。这样的框架可以促进交通分类域的研究，从而使其更可重复，可重复，更易于执行，并可以更准确地比较知名和新颖的功能和新颖的功能和模型。作为框架评估的一部分，我们演示了可以使用多个数据集，模型和功能集的各种情况。我们展示了公开可用数据集的分析，并邀请社区使用OSF-EIMTC参与我们的公开挑战。

translated by 谷歌翻译

ARCADE: Adversarially Regularized Convolutional Autoencoder for Network Anomaly Detection

Willian T. Lunardi , Martin Andreoni Lopez , Jean-Pierre Giacalone

分类：机器学习

2022-05-03

As the number of heterogenous IP-connected devices and traffic volume increase, so does the potential for security breaches. The undetected exploitation of these breaches can bring severe cybersecurity and privacy risks. Anomaly-based \acp{IDS} play an essential role in network security. In this paper, we present a practical unsupervised anomaly-based deep learning detection system called ARCADE (Adversarially Regularized Convolutional Autoencoder for unsupervised network anomaly DEtection). With a convolutional \ac{AE}, ARCADE automatically builds a profile of the normal traffic using a subset of raw bytes of a few initial packets of network flows so that potential network anomalies and intrusions can be efficiently detected before they cause more damage to the network. ARCADE is trained exclusively on normal traffic. An adversarial training strategy is proposed to regularize and decrease the \ac{AE}'s capabilities to reconstruct network flows that are out-of-the-normal distribution, thereby improving its anomaly detection capabilities. The proposed approach is more effective than state-of-the-art deep learning approaches for network anomaly detection. Even when examining only two initial packets of a network flow, ARCADE can effectively detect malware infection and network attacks. ARCADE presents 20 times fewer parameters than baselines, achieving significantly faster detection speed and reaction time.

translated by 谷歌翻译

Visualization Of Class Activation Maps To Explain AI Classification Of Network Packet Captures

Igor Cherepanov , Alex Ulmer , Jonathan Geraldi Joewono , Jörn Kohlhammer

分类：机器学习

2022-09-05

由于当今网络和应用程序的快速增长，互联网流量的分类变得越来越重要。我们网络中的连接数量和新应用程序的添加会导致大量日志数据，并使专家搜索常见模式变得复杂。在特定类别的应用程序中找到此类模式对于满足网络分析中的各种要求是必要的。深度学习方法同时从单个系统中的数据中提供特征提取和分类。但是，这些网络非常复杂，被用作黑框模型，它削弱了专家对分类的信任。此外，通过将它们用作黑色框，尽管其表现出色，但仍无法从模型预测中获得新知识。因此，分类的解释性至关重要。除了增加信任外，该解释还可以用于模型评估，从数据中获得新的见解并改善模型。在本文中，我们提出了一个视觉交互式工具，该工具将网络数据的分类与解释技术结合在一起，以在专家，算法和数据之间形成接口。

translated by 谷歌翻译

A Multi-View Framework for BGP Anomaly Detection via Graph Attention Network

Songtao Peng , Jiaqi Nie , Xincheng Shu , Zhongyuan Ruan , Lei Wang , Yunxuan Sheng , Qi Xuan

分类：机器学习 | 人工智能

2021-12-23

作为在Internet交换路由到达性信息的默认协议，边界网关协议（BGP）的流量异常行为与互联网异常事件密切相关。 BGP异常检测模型通过其实时监控和警报功能确保互联网上的稳定路由服务。以前的研究要么专注于特征选择问题或数据中的内存特征，同时忽略特征之间的关系和特征中的精确时间相关（无论是长期还是短期依赖性）。在本文中，我们提出了一种用于捕获来自BGP更新流量的异常行为的多视图模型，其中使用黄土（STL）方法的季节性和趋势分解来减少原始时间序列数据中的噪声和图表网络中的噪声（GAT）用于分别发现功能中的特征关系和时间相关性。我们的结果优于异常检测任务的最先进的方法，平均F1分别在平衡和不平衡数据集上得分高达96.3％和93.2％。同时，我们的模型可以扩展以对多个异常进行分类并检测未知事件。

translated by 谷歌翻译

Improving Multilayer-Perceptron(MLP)-based Network Anomaly Detection with Birch Clustering on CICIDS-2017 Dataset

Yuhua Yin , Julian Jang-Jaccard , Fariza Sabrina , Jin Kwak

分类：机器学习

2022-08-20

机器学习算法已被广泛用于入侵检测系统，包括多层感知器（MLP）。在这项研究中，我们提出了一个两阶段模型，该模型结合了桦木聚类算法和MLP分类器，以提高网络异常多分类的性能。在我们提出的方法中，我们首先将桦木或kmeans作为无监督的聚类算法应用于CICIDS-2017数据集，以预先分组数据。然后，将生成的伪标签作为基于MLP分类器的训练的附加功能添加。实验结果表明，使用桦木和K-均值聚类进行数据预组化可以改善入侵检测系统的性能。我们的方法可以使用桦木聚类实现多分类的99.73％的精度，这比使用独立的MLP模型的类似研究要好。

translated by 谷歌翻译

Classification of Traffic Using Neural Networks by Rejecting: a Novel Approach in Classifying VPN Traffic

Ali Parchekani , Salar Nouri , Vahid Shah-Mansouri , Seyed Pooya Shariatpanahi

分类：机器学习

2020-01-10

在本文中，我们介绍了一种新的端到端流量分类方法，以区分包括在开放系统互连（OSI）模型的三层中的VPN流量的流量等级。由于其加密性质，VPN流量的分类并不是使用传统分类方法的琐碎。我们利用了两个知名的神经网络，即多层的感知者和经常性神经网络，以创建我们的级联神经网络，专注于两个指标：课程得分和距离课程中心的距离。这种方法将提取，选择和分类功能组合成单个端到端系统以系统地学习输入和预测性能之间的非线性关系。因此，我们可以通过拒绝VPN类的无关功能将VPN流量与非VPN流量区分开来。此外，我们同时获得非VPN流量的应用类型。使用常规交通数据集iSCX VPN-NONVPN和获取的数据集进行评估该方法。结果证明了框架方法对加密流量分类的功效，同时也实现了极端准确性，95美元百分比，高于最先进模型的准确性和强大的泛化能力。

translated by 谷歌翻译

Computer Vision on X-ray Data in Industrial Production and Security Applications: A survey

Mehdi Rafiei , Jenni Raitoharju , Alexandros Iosifidis

分类：计算机视觉

2022-11-10

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

translated by 谷歌翻译

When a RF Beats a CNN and GRU, Together -- A Comparison of Deep Learning and Classical Machine Learning Approaches for Encrypted Malware Traffic Classification

Adi Lichy , Ofek Bader , Ran Dubin , Amit Dvir , Chen Hajaj

分类：机器学习

2022-06-16

互联网流量分类广泛用于促进网络管理。它在服务质量（QoS），经验质量（QOE），网络可见性，入侵检测和交通趋势分析中起着至关重要的作用。尽管没有理论上的保证，即基于深度学习的解决方案比经典的机器学习（ML）的解决方案更好，但基于DL的模型已成为常见默认值。本文比较了著名的基于DL和基于ML的模型，并表明，在恶意交通分类的情况下，最先进的基于DL的解决方案不一定优于基于经典的ML的解决方案。我们使用两个知名数据集来体现这一发现，用于各种任务，例如：恶意软件检测，恶意软件家庭分类，零日攻击的检测以及对迭代增长数据集的分类。请注意，评估所有可能的模型以做出具体陈述是不可行的，因此，上述发现不是避免基于DL的模型的建议，而是经验证明，在某些情况下，有更简单的解决方案，即更简单的解决方案，即可能表现更好。

translated by 谷歌翻译

STC-IDS: Spatial-Temporal Correlation Feature Analyzing based Intrusion Detection System for Intelligent Connected Vehicles

Pengzhou Cheng , Mu Han , Aoxue Li , Fengwei Zhang

分类：人工智能

2022-04-23

入侵检测是汽车通信安全的重要防御措施。准确的框架检测模型有助于车辆避免恶意攻击。攻击方法的不确定性和多样性使此任务具有挑战性。但是，现有作品仅考虑本地功能或多功能的弱特征映射的限制。为了解决这些局限性，我们提出了一个新型的模型，用于通过车载通信流量（STC-IDS）的时空相关特征（STC-IDS）进行汽车入侵检测。具体而言，提出的模型利用编码检测体系结构。在编码器部分中，空间关系和时间关系是同时编码的。为了加强特征之间的关系，基于注意力的卷积网络仍然捕获空间和频道特征以增加接受场，而注意力LSTM则建立了以前的时间序列或关键字节的有意义的关系。然后将编码的信息传递给检测器，以产生有力的时空注意力特征并实现异常分类。特别是，构建了单帧和多帧模型，分别呈现不同的优势。在基于贝叶斯优化的自动超参数选择下，该模型经过培训以达到最佳性能。基于现实世界中车辆攻击数据集的广泛实证研究表明，STC-IDS优于基线方法，并且在保持效率的同时获得了较少的假警报率。

translated by 谷歌翻译

A Dependable Hybrid Machine Learning Model for Network Intrusion Detection

Md. Alamin Talukder , Khondokar Fida Hasan , Md. Manowarul Islam , Md Ashraf Uddin , Arnisha Akhter , Mohammand Abu Yousuf , Fares Alharbi , Mohammad Ali Moni

分类：机器学习

2022-12-08

Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues.

translated by 谷歌翻译

Darknet Traffic Classification and Adversarial Attacks

Nhien Rust-Nguyen , Mark Stamp

分类：机器学习

2022-06-12

Darknets的匿名性质通常用于非法活动。先前的研究已经采用了机器学习和深度学习技术来自动对暗网流量的检测，以阻止这些犯罪活动。这项研究旨在通过评估支持向量机（SVM），随机森林（RF），卷积神经网络（CNN）和辅助分类器生成对抗网络（AC-GAN）来改善暗网流量检测申请类型。我们发现，我们的RF模型优于与CIC-Darknet2020数据集的先前工作中使用的最新机器学习技术。为了评估RF分类器的鲁棒性，我们混淆选择应用程序类型类，以模拟现实的对抗攻击方案。我们证明，我们表现最好的分类器可能会被这种攻击击败，我们考虑处理这种对抗性攻击的方法。

translated by 谷歌翻译

Galaxy Image Classification using Hierarchical Data Learning with Weighted Sampling and Label Smoothing

Xiaohua Ma , Xiangru Li , Ali Luo , Jinqu Zhang , Hui Li

分类：机器学习

2022-12-20

With the development of a series of Galaxy sky surveys in recent years, the observations increased rapidly, which makes the research of machine learning methods for galaxy image recognition a hot topic. Available automatic galaxy image recognition researches are plagued by the large differences in similarity between categories, the imbalance of data between different classes, and the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changes from one morphological class to the adjacent class (DDRGC). These limitations have motivated several astronomers and machine learning experts to design projects with improved galaxy image recognition capabilities. Therefore, this paper proposes a novel learning method, ``Hierarchical Imbalanced data learning with Weighted sampling and Label smoothing" (HIWL). The HIWL consists of three key techniques respectively dealing with the above-mentioned three problems: (1) Designed a hierarchical galaxy classification model based on an efficient backbone network; (2) Utilized a weighted sampling scheme to deal with the imbalance problem; (3) Adopted a label smoothing technique to alleviate the DDRGC problem. We applied this method to galaxy photometric images from the Galaxy Zoo-The Galaxy Challenge, exploring the recognition of completely round smooth, in between smooth, cigar-shaped, edge-on and spiral. The overall classification accuracy is 96.32\%, and some superiorities of the HIWL are shown based on recall, precision, and F1-Score in comparing with some related works. In addition, we also explored the visualization of the galaxy image features and model attention to understand the foundations of the proposed scheme.

translated by 谷歌翻译

Zero-day DDoS Attack Detection

Cameron Boeder , Troy Januchowski

分类：机器学习

2022-08-31

检测零日（新颖）攻击的能力在网络安全行业中变得至关重要。由于不断发展的攻击签名，现有的网络入侵检测系统通常无法检测到这些威胁。该项目旨在通过利用进入私人网络之前捕获的网络流量来解决检测零日DDO（分布式拒绝服务）攻击的任务。现代特征提取技术与神经网络结合使用，以确定网络数据包是良性还是恶意。

translated by 谷歌翻译

HTML版本

Deep Learning for Time Series Anomaly Detection: A Survey

Zahra Zamanzadeh Darban , Geoffrey I. Webb , Shirui Pan , Charu C. Aggarwal , Mahsa Salehi

分类：机器学习 | 人工智能

2022-11-09

Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.

translated by 谷歌翻译

Mapping the Internet: Modelling Entity Interactions in Complex Heterogeneous Networks

Simon Mandlik , Tomas Pevny

分类：机器学习

2021-04-19

即使机器学习算法已经在数据科学中发挥了重要作用，但许多当前方法对输入数据提出了不现实的假设。由于不兼容的数据格式，或数据集中的异质，分层或完全缺少的数据片段，因此很难应用此类方法。作为解决方案，我们提出了一个用于样本表示，模型定义和培训的多功能，统一的框架，称为“ Hmill”。我们深入审查框架构建和扩展的机器学习的多个范围范式。从理论上讲，为HMILL的关键组件的设计合理，我们将通用近似定理的扩展显示到框架中实现的模型所实现的所有功能的集合。本文还包含有关我们实施中技术和绩效改进的详细讨论，该讨论将在MIT许可下发布供下载。该框架的主要资产是其灵活性，它可以通过相同的工具对不同的现实世界数据源进行建模。除了单独观察到每个对象的一组属性的标准设置外，我们解释了如何在框架中实现表示整个对象系统的图表中的消息推断。为了支持我们的主张，我们使用框架解决了网络安全域的三个不同问题。第一种用例涉及来自原始网络观察结果的IoT设备识别。在第二个问题中，我们研究了如何使用以有向图表示的操作系统的快照可以对恶意二进制文件进行分类。最后提供的示例是通过网络中实体之间建模域黑名单扩展的任务。在所有三个问题中，基于建议的框架的解决方案可实现与专业方法相当的性能。

translated by 谷歌翻译

Fine-grained TLS services classification with reject option

Jan Luxemburk , Tomáš Čejka

分类：机器学习

2022-02-24

The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata. The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic. The number of service labels, which is important to make the problem hard and realistic, is four times higher than in the public dataset with the most class labels. The published dataset is intended as a benchmark for identifying services in encrypted traffic. Service identification can be further extended with the task of "rejecting" unknown services, i.e., the traffic not seen during the training phase. Neural networks offer superior performance for tackling this more challenging problem. To showcase the dataset's usefulness, we implemented a neural network with a multi-modal architecture, which is the state-of-the-art approach, and achieved 97.04% classification accuracy and detected 91.94% of unknown services with 5% false positive rate.

translated by 谷歌翻译

Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling

Yang Long , Gui-Song Xia , Liangpei Zhang , Gong Cheng , Deren Li

分类：计算机视觉

2022-01-06

给定空中图像，空中场景解析（ASP）目标，以解释图像内容的语义结构，例如，通过将语义标签分配给图像的每个像素来解释图像内容的语义结构。随着数据驱动方法的推广，过去几十年通过在使用高分辨率航空图像时，通过接近基于瓦片级场景分类或分段的图像分析的方案来解决了对ASP的有希望的进展。然而，前者的方案通常会产生瓷砖技术边界的结果，而后者需要处理从像素到语义的复杂建模过程，这通常需要具有像素 - 明智语义标签的大规模和良好的图像样本。在本文中，我们在ASP中解决了这些问题，从瓷砖级场景分类到像素明智语义标签的透视图。具体而言，我们首先通过文献综述重新审视空中图像解释。然后，我们提出了一个大规模的场景分类数据集，其中包含一百万个空中图像被称为百万援助。使用所提出的数据集，我们还通过经典卷积神经网络（CNN）报告基准测试实验。最后，我们通过统一瓦片级场景分类和基于对象的图像分析来实现ASP，以实现像素明智的语义标记。密集实验表明，百万援助是一个具有挑战性但有用的数据集，可以作为评估新开发的算法的基准。当从百万辅助救援方面传输知识时，百万辅助的微调CNN模型始终如一，而不是那些用于空中场景分类的预磨料想象。此外，我们设计的分层多任务学习方法实现了对挑战GID的最先进的像素 - 明智的分类，拓宽了用于航空图像解释的像素明智语义标记的瓦片级场景分类。

translated by 谷歌翻译

Self-Supervised Vision Transformers for Malware Detection

Sachith Seneviratne , Ridwan Shariffdeen , Sanka Rasnayaka , Nuran Kasthuriarachchi

分类：计算机视觉

2022-08-15

恶意软件检测在网络安全中起着至关重要的作用，随着恶意软件增长的增加和网络攻击的进步。以前看不见的恶意软件不是由安全供应商确定的，这些恶意软件通常在这些攻击中使用，并且不可避免地要找到可以从未标记的样本数据中自学习的解决方案。本文介绍了Sherlock，这是一种基于自学的深度学习模型，可根据视觉变压器（VIT）体系结构检测恶意软件。 Sherlock是一种新颖的恶意软件检测方法，它可以通过使用基于图像的二进制表示形式来学习独特的功能，以区分恶意软件和良性程序。在47种类型和696个家庭的层次结构中使用120万个Android应用的实验结果表明，自我监督的学习可以达到97％的恶意软件分类，而恶意软件的二进制分类比现有的最新技术更高。我们提出的模型还能够胜过针对多级恶意软件类型和家庭的最先进技术，分别为.497和.491。

translated by 谷歌翻译