一般的入侵检测系统(IDS)基本上基于异常检测系统(ADS)或异常检测和基于签名的方法的组合,收集和分析观察结果并报告可能的可疑案例给系统管理员或其他用户以进行进一步调查。即使是最先进的ADS和IDS尚未克服的臭名昭着的挑战之一是可能出现非常高的误报率。特别是在非常大而复杂的系统设置中,低级别警报的数量很容易超过管理员,并增加了忽略警报的倾向。我们可以将现有的误报警策略分为两大系列:第一组涵盖直接定制的方法,并应用于更高级别的方法。质量异常评分ADS。第二组包括在相关环境中使用的方法,作为降低误报率可能性的过滤方法。由于缺乏关于可能的方法来控制误报率的综合研究,在本文中,我们回顾了现有的误报警减轻技术。 ADS并介绍了每种技术的优缺点。我们还研究了一些应用于基于签名的IDS和其他相关背景的有前景的技术,如商业安全信息和事件管理(SIEM)工具,这些工具在ADS背景下是适用和推广的。最后,我们总结了未来研究的一些方向。
translated by 谷歌翻译
重症监护病房(ICU)中的高误报率是在医院使用医疗技术的挑战之一。这些误报是由患者的移动,监测传感器的分离,或影响来自不同监测设备的收集信号的不同噪声和干扰源引起的。在本文中,我们提出了一套基于无监督特征学习技术的新型高级特征,以有效地捕获不同心律失常的心电图(ECG)信号的特征,并将它们与由于不同信号干扰源引起的不规则信号区分开来。这种无监督的特征学习技术首先从患者的心脏周期中提取一组低级特征,然后为每个患者群集这些片段,以提供一组突出的高级特征。聚类阶段的目标是使分类方法能够区分从正常和非正常循环中提取的高级特征(即,由于心律失常或信号中不同的畸变源),以便更多地关注从异常部分提取的特征。有助于报警的信号。使用2015 PhysioNet / Computing inCardiology Challenge数据集评估该方法的性能,以减少ICU中的假性心律失常警报。如实验结果所证实,所提出的方法仅在使用警报检测的准确性,灵敏度和特异性方面提供了可观的性能。从单个导联ECG信号中提取的高级特征很少。
translated by 谷歌翻译
如今,多变量时间序列数据越来越多地收集在各​​种现实世界系统中,例如发电厂,可穿戴设备等。多变量时间序列中的异常检测和诊断是指在某些时间步骤中识别异常状态并查明根本原因。然而,这样的系统具有挑战性,因为它不仅需要捕获每个时间序列中的时间依赖性,而且还需要编码不同时间序列对之间的相关性。此外,系统应该对噪声具有鲁棒性,并根据不同事件的严重程度为操作员提供不同级别的异常分数。尽管已经开发了许多无监督的异常检测算法,但是它们中很少能够共同解决这些挑战。在本文中,我们提出了一种多尺度卷积递归编码器 - 解码器(MSCRED),用于在多变量时间序列数据中进行性能检测和诊断。具体来说,MSCRED首先构建多尺度(分辨率)签名矩阵,以在不同的时间步长中表征系统状态的多个级别。随后,给定签名矩阵,使用卷积编码器来编码传感器间(时间序列)相关性和注意力。基于卷积长短期记忆(ConvLSTM)网络被开发用于捕获时间模式。最后,基于编码传感器间相关性和时间信息的特征图,使用卷积解码器重建输入签名矩阵,并且进一步利用残余签名矩阵来检测和诊断异常。基于合成数据集和真实发电厂数据集的广泛实证研究表明,MSCRED可以胜过最先进的基线方法。
translated by 谷歌翻译
复杂的心脏复杂性是与健康护理成本相关的主要因素,也是世界上死亡率最高的原因。然而,像心脏异常的早期检测这样的预防性措施可以预防严重的心血管逮捕不同的复杂性,并可能对医疗保健成本产生非实质性影响。遇到这种情况通常心电图(ECG或EKG)是医学执业者或临床工作人员用于测量个体心脏的电学和肌肉健康状况的第一诊断选择。本文提出了一种系统,该系统能够在不受人类专家干预的情况下读取记录的ECG并预测心脏异常。本文的目的是一种读取和执行心电图数据集分析的算法。所提出的架构首先使用离散小波变换(DWT)来执行ECG数据的预处理,然后是未抽取的小波变换(UWT)以提取心脏病学家高度关注的九个相关特征。使用在UCL心律失常数据集上提取的九个参数来训练名为贝叶斯网络分类器的概率模式。所提出的系统使用贝叶斯网络分类器和Tukey的盒子分析将记录的心跳分类为四个类。预测aheartbeat的四个类别是(a)正常搏动,(b)室性早搏(PVC)(c)过早心房收缩(PAC)和(d)心肌梗塞。实验装置的结果表明,所提出的系统的PAC平均精度为96.6%,MI为92.8%,PVC为87%,PAC平均误差率为3.3%,MI为6%,平均误差为12.5%。在包括Physionet和欧洲ST-T数据库(EDB)的再电图心电图数据集上的PVC%。
translated by 谷歌翻译
循环自动编码器模型通过编码器结构将顺序数据汇总成固定长度的矢量,然后通过解码器结构重建原始序列。汇总向量可用于表示时间序列特征。在本文中,我们建议放宽解码器输出的维度,以便执行部分重构。因此,固定长度矢量仅表示所选尺寸的特征。此外,我们建议使用滚动固定窗口方法从无界时间序列数据生成训练样本。随着时间的推移,时间序列特征的变化可以概括为平滑的轨迹路径。使用附加可视化和无监督聚类技术进一步分析固定长度矢量。所提出的方法可以应用于用于传感器信号分析的大规模工业过程,其中矢量表示的集群可以反映工业系统的操作状态。
translated by 谷歌翻译
及时预测重症监护室(ICU)中的临床关键事件对于提高护理和存活率非常重要。大多数现有方法基于各种分类方法的应用,从生命信号中明确地提取统计特征。在这项工作中,我们建议通过使用序列到序列自动编码器来学习它们的代表性,从多变量的生理信号时间序列中消除工程手工制作特征的高成本。然后,我们建议对已学习的表示进行分析,以便对关键事件的预测进行信号相似性评估。我们将这种方法论框架应用于急性低血压事件(AHE),对大量不同的生命信号记录数据集进行了应用。实验证明了所提出的框架能够准确预测即将到来的AHE的能力。
translated by 谷歌翻译
Mechanical devices such as engines, vehicles, aircrafts, etc., are typically instrumented with numerous sensors to capture the behavior and health of the machine. However, there are often external factors or variables which are not captured by sensors leading to time-series which are inherently unpredictable. For instance, manual controls and/or unmonitored environmental conditions or load may lead to inherently unpredictable time-series. Detecting anomalies in such scenarios becomes challenging using standard approaches based on mathematical models that rely on stationarity, or prediction models that utilize prediction errors to detect anomalies. We propose a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) that learns to reconstruct 'normal' time-series behavior, and thereafter uses reconstruction error to detect anomalies. We experiment with three publicly available quasi predictable time-series datasets: power demand , space shuttle, and ECG, and two real-world engine datasets with both predictive and unpredictable behavior. We show that EncDec-AD is robust and can detect anomalies from predictable , unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, we show that EncDec-AD is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).
translated by 谷歌翻译
We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.
translated by 谷歌翻译
Videos represent the primary source of information for surveillance applications. Video material is often available in large quantities but in most cases it contains little or no annotation for supervised learning. This article reviews the state-of-the-art deep learning based methods for video anomaly detection and categorizes them based on the type of model and criteria of detection. We also perform simple studies to understand the different approaches and provide the criteria of evaluation for spatio-temporal anomaly detection.
translated by 谷歌翻译
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore , we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training. We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery , etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work. The dataset is available at: http://crcv.ucf.edu/projects/real-world/
translated by 谷歌翻译
我们将异常事件检测问题表示为异常检测任务,并提出了一种基于k均值聚类和一类支持向量机(SVM)的两阶段算法来消除异常值。在从仅包含正常事件的训练视频中提取运动特征之后,我们应用k均值聚类来找到表示不同类型运动的聚类。在第一阶段,我们认为具有较少样本的集群(相对于给定的阈值)仅包含异常值,并且我们完全消除这些集群。在第二阶段,我们通过在每个集群上训练一类SVM模型来缩小剩余集群的边界。为了检测测试视频中的异常事件,我们分析每个测试样本并考虑由训练的一类SVM模型提供的最大正态性分数,基于测试样本只能属于一个正常运动集群的直觉。如果测试样品不适合任何变窄的簇,则标记为异常。我们还将基于运动特征的方法与基于使用预先训练的卷积神经网络(CNN)提取的深度外观特征的近期方法相结合。我们使用后期融合策略将我们的两阶段算法与深度框架相结合,使两个方法的管道保持独立。我们将我们的方法与四个基准数据集上的几种最先进的监督和无监督方法进行比较。实证结果表明,在大多数情况下,我们的异常事件检测框架可以获得更好的结果,同时在CPU上以每秒32帧的速度实时处理测试视频。
translated by 谷歌翻译
Anomaly detection is a critical step towards building a secure and trustworthy system. e primary purpose of a system log is to record system states and signiicant events at various critical points to help debug system failures and perform root cause analysis. Such log data is universally available in nearly all computer systems. Log data is an important and valuable resource for understanding system status and performance issues; therefore, the various system logs are naturally excellent source of information for online monitoring and anomaly detection. We propose DeepLog, a deep neural network model utilizing Long Short-Term Memory (LSTM), to model a system log as a natural language sequence. is allows DeepLog to automatically learn log paaerns from normal execution, and detect anomalies when log paaerns deviate from the model trained from log data under normal execution. In addition, we demonstrate how to incrementally update the DeepLog model in an online fashion so that it can adapt to new log paaerns over time. Furthermore, DeepLog constructs workkows from the underlying system log so that once an anomaly is detected, users can diagnose the detected anomaly and perform root cause analysis eeectively. Extensive experimental evaluations over large log data have shown that DeepLog has outperformed other existing log-based anomaly detection methods based on traditional data mining methodologies.
translated by 谷歌翻译
由于数据量大,因此增加了对自主和通用异常检测系统的需求。然而,开发一种准确且快速的独立的通用异常检测系统仍然是一个挑战。在本文中,我们提出了传统的时间序列分析方法,季节自回归整合移动平均(SARIMA)模型和使用黄土(STL)的SeasonalTrend分解,以检测复杂和各种异常。通常,SARIMA和STL仅用于静止和周期时间 - 系列,但通过组合,我们表明他们可以检测高精度的异常,甚至嘈杂和非周期性的数据。我们将该算法与Long ShortTerm Memory(LSTM)进行了比较,LSTM是一种用于异常检测系统的基于深度学习的算法。我们总共使用了七个真实数据集和四个具有不同时间序列属性的人工数据集来验证所提算法的性能。
translated by 谷歌翻译
在这项工作中,我们研究了医疗时间系列的无监督表示学习,它承诺利用大量现有的标记数据,以便最终协助临床决策。通过评估临床相关结果的预测,我们表明,在实用设置中,无监督表示学习可以提供比端到端监督体系结构更好的性能优势。我们尝试以两种不同的方式使用序列到序列(Seq2Seq)模型,作为自动编码器和预测器,并且表明通过具有集成注意机制的预测Seq2Seq模型实现了最佳性能,在设置中首次提出无监督学习的医疗时间系列。
translated by 谷歌翻译
As advances in networking technology help to connect the distant corners of the globe and as the Internet continues to expand its influence as a medium for communications and commerce, the threat from spammers, attackers and criminal enterprises has also grown accordingly. It is the prevalence of such threats that has made intrusion detection systems-the cyberspace's equivalent to the burglar alarm-join ranks with firewalls as one of the fundamental technologies for network security. However, today's commercially available intrusion detection systems are predominantly signature-based intrusion detection systems that are designed to detect known attacks by utilizing the signatures of those attacks. Such systems require frequent rule-base updates and signature updates, and are not capable of detecting unknown attacks. In contrast, anomaly detection systems, a subset of intrusion detection systems, model the normal system/network behavior which enables them to be extremely effective in finding and foiling both known as well as unknown or ''zero day'' attacks. While anomaly detection systems are attractive conceptually, a host of technological problems need to be overcome before they can be widely adopted. These problems include: high false alarm rate, failure to scale to gigabit speeds, etc. In this paper, we provide a comprehensive survey of anomaly detection systems and hybrid intrusion detection systems of the recent past and present. We also discuss recent technological trends in anomaly detection and identify open problems and challenges in this area.
translated by 谷歌翻译
许多现实世界系统中的网络传感器和执行器的普及,例如智能建筑,工厂,发电厂和数据中心,为这些系统产生了大量的多变量时间序列数据。可以连续监测Therich传感器数据以检测入侵事件。然而,由于这些系统的动态复杂性,传统的基于阈值的异常检测方法是不充分的,而由于缺少标记数据,监督机器学习方法不能利用大量数据。另一方面,当前的无监督机器学习方法尚未充分利用系统中用于检测异常的多个变量(传感器/致动器)之间的空间 - 时间相关性和其他依赖性。在这项工作中,我们提出了一种基于遗传对抗网络(GAN)的无监督多变量异常检测方法。我们提出的MAD-GAN框架不是独立地处理每个数据流,而是考虑整个变量同时捕获变量之间的潜在相互作用。我们同时利用GAN生成的生成器和鉴别器,使用称为DR-score的异常异常分数来通过区分和重构来检测异常。我们使用从现实世界CPS中收集的两个最近的数据集测试了我们提出的MAD-GAN:安全水处理(SWaT)和水分布(WADI)数据集。我们的实验结果表明,提出的MAD-GAN可以有效地报告在这些复杂的现实世界系统中由各种网络入侵引起的异常。
translated by 谷歌翻译
Identifying anomalies rapidly and accurately is critical to the efficient operation of large computer networks. Accurately characterizing important classes of anomalies greatly facilitates their identification; however , the subtleties and complexities of anomalous traffic can easily confound this process. In this paper we report results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures. Data for this study consists of IP flow and SNMP measurements collected over a six month period at the border router of a large university. Our results show that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic. Specifically, we show that a pseudo-spline filter tuned at specific aggregation levels will expose distinct characteristics of each class of anomaly. We show that an effective way of exposing anomalies is via the detection of a sharp increase in the local variance of the filtered data. We evaluate traffic anomaly signals at different points within a network based on topological distance from the anomaly source or destination. We show that anomalies can be exposed effectively even when aggregated with a large amount of additional traffic. We also compare the difference between the same traffic anomaly signals as seen in SNMP and IP flow data, and show that the more coarse-grained SNMP data can also be used to expose anomalies effectively.
translated by 谷歌翻译
时间序列的建模在各种各样的应用中变得越来越重要。总体而言,数据通过遵循不同的模式而发展,这些模式通常由不同的用户行为引起。给定时间序列,我们定义进化基因以捕获潜在的用户行为并描述行为导致时间序列的生成。特别是,我们提出了一个统一的框架,通过学习分类器来识别不同的分段演化基因,并采用对抗生成器通过估计分段的分布来实现进化基因。基于合成数据集和五个真实世界数据集的实验结果表明我们的方法不仅可以获得良好的预测结果(例如,就F1而言平均为+ 10.56%),而且还能够提供结果的解释。
translated by 谷歌翻译
Reliable uncertainty estimation for time series prediction is critical in many fields, including physics, biology, and manufacturing. At Uber, probabilistic time series forecasting is used for robust prediction of number of trips during special events, driver incentive allocation, as well as real-time anomaly detection across millions of metrics. Classical time series models are often used in conjunction with a probabilistic formulation for uncertainty estimation. However, such models are hard to tune, scale, and add exogenous variables to. Motivated by the recent resurgence of Long Short Term Memory networks, we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation. We provide detailed experiments of the proposed solution on completed trips data, and successfully apply it to large-scale time series anomaly detection at Uber.
translated by 谷歌翻译
We are seeing an enormous increase in the availability of streaming, time-series data. Largely driven by the rise of connected real-time data sources, this data presents technical challenges and opportunities. One fundamental capability for streaming analytics is to model each stream in an unsupervised fashion and detect unusual, anomalous behaviors in real-time. Early anomaly detection is valuable, yet it can be difficult to execute reliably in practice. Application constraints require systems to process data in real-time, not batches. Streaming data inherently exhibits concept drift, favoring algorithms that learn continuously. Furthermore, the massive number of independent streams in practice requires that anomaly detectors be fully automated. In this paper we propose a novel anomaly detection algorithm that meets these constraints. The technique is based on an online sequence memory algorithm called Hierarchical Temporal Memory (HTM). We also present results using the Numenta Anomaly Benchmark (NAB), a benchmark containing real-world data streams with labeled anomalies. The benchmark, the first of its kind, provides a controlled open-source environment for testing anomaly detection algorithms on streaming data. We present results and analysis for a wide range of algorithms on this benchmark, and discuss future challenges for the emerging field of streaming analytics.
translated by 谷歌翻译