我们提出了一种轻巧,准确的方法,用于检测视频中的异常情况。现有方法使用多个实体学习(MIL)来确定视频每个段的正常/异常状态。最近的成功研​​究认为,学习细分市场之间的时间关系很重要,以达到高精度,而不是只关注单个细分市场。因此,我们分析了近年来成功的现有方法,并发现同时学习所有细分市场确实很重要,但其中的时间顺序与实现高准确性无关。基于这一发现,我们不使用MIL框架,而是提出具有自发机制的轻质模型,以自动提取对于确定所有输入段正常/异常非常重要的特征。结果,我们的神经网络模型具有现有方法的参数数量的1.3%。我们在三个基准数据集(UCF-Crime,Shanghaitech和XD-Violence)上评估了方法的帧级检测准确性,并证明我们的方法可以比最新方法实现可比或更好的准确性。
translated by 谷歌翻译
对于弱监督的异常检测,由于无法对长期上下文信息进行建模,大多数现有工作仅限于视频表示不足的问题。为了解决这个问题,我们提出了一个新型弱监督的自适应图卷积网络(WAGCN),以模拟视频片段之间复杂的上下文关系。通过此,我们完全考虑了其他视频片段对当前段的影响,在为每个段的异常概率分数生成当前段。首先,我们结合了视频片段的时间一致性以及功能相似性来构建全局图,该图可以充分利用视频中异常事件的时空特征之间的关联信息。其次,我们提出了一个图形学习层,以打破手动设置拓扑的限制,该拓扑可以根据数据自适应地提取图形邻接矩阵。在两个公共数据集(即UCF-Crime数据集和Shanghaitech数据集)上进行了广泛的实验,证明了我们的方法的有效性,从而实现了最先进的性能。
translated by 谷歌翻译
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at videolevel instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training.We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work. The dataset is
translated by 谷歌翻译
Weakly supervised video anomaly detection (WSVAD) is a challenging task since only video-level labels are available for training. In previous studies, the discriminative power of the learned features is not strong enough, and the data imbalance resulting from the mini-batch training strategy is ignored. To address these two issues, we propose a novel WSVAD method based on cross-batch clustering guidance. To enhance the discriminative power of features, we propose a batch clustering based loss to encourage a clustering branch to generate distinct normal and abnormal clusters based on a batch of data. Meanwhile, we design a cross-batch learning strategy by introducing clustering results from previous mini-batches to reduce the impact of data imbalance. In addition, we propose to generate more accurate segment-level anomaly scores based on batch clustering guidance further improving the performance of WSVAD. Extensive experiments on two public datasets demonstrate the effectiveness of our approach.
translated by 谷歌翻译
视频异常检测最近在弱监督下作为多个实例学习任务制定,其中每个视频都被视为要确定是否包含异常的片段。先前的努力主要集中于摘要本身的歧视,而无需对时间动力进行建模,这是指相邻摘要的变化。因此,我们提出了一种具有两个目标函数的歧视动力学学习(DDL)方法,即动态排名损耗和动态对齐损失。前者的目标是扩大正面和负袋之间的分数动态差距,而后者则在袋中进行特征动力学和得分动力学的时间对齐。此外,构建了一个局部意识的注意网络(LA-NET),以捕获全局相关性并重新校准跨段的位置偏好,然后是带有因果卷积的多层感知器以获得异常得分。实验结果表明,我们的方法在两个具有挑战性的基准(即UCF-Crime和XD-Violence)上取得了重大改进。
translated by 谷歌翻译
这项工作解决了弱监督的异常检测,其中允许预测指标不仅可以从正常示例中学习,而且还可以从训练期间提供的一些标签异常。特别是,我们处理视频流中异常活动的本地化:这是一个非常具有挑战性的情况,因为培训示例仅带有视频级别的注释(而不是帧级)。最近的几项工作提出了各种正则化术语来解决它,即通过对弱学习的框架级异常得分的稀疏性和平滑度约束。在这项工作中,我们受到自我监督学习领域的最新进展的启发,并要求模型为同一视频序列的不同增强而产生相同的分数。我们表明,执行这种对齐能够提高模型在XD暴力方面的性能。
translated by 谷歌翻译
Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches.
translated by 谷歌翻译
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels. Recently, two-stage self-training methods have achieved significant improvements by self-generating pseudo labels and self-refining anomaly scores with these labels. As the pseudo labels play a crucial role, we propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training. Specifically, we first design a multi-head classification module (each head serves as a classifier) with a diversity loss to maximize the distribution differences of predicted pseudo labels across heads. This encourages the generated pseudo labels to cover as many abnormal events as possible. We then devise an iterative uncertainty pseudo label refinement strategy, which improves not only the initial pseudo labels but also the updated ones obtained by the desired classifier in the second stage. Extensive experimental results demonstrate the proposed method performs favorably against state-of-the-art approaches on the UCF-Crime, TAD, and XD-Violence benchmark datasets.
translated by 谷歌翻译
开放式视频异常检测(OpenVAD)旨在从视频数据中识别出异常事件,在测试中都存在已知的异常和新颖的事件。无监督的模型仅从普通视频中学到的模型适用于任何测试异常,但遭受高误报率的损失。相比之下,弱监督的方法可有效检测已知的异常情况,但在开放世界中可能会失败。我们通过将证据深度学习(EDL)和将流量(NFS)归一化为多个实例学习(MIL)框架来开发出一种新颖的OpenVAD问题的弱监督方法。具体而言,我们建议使用图形神经网络和三重态损失来学习训练EDL分类器的区分特征,在该特征中,EDL能够通过量化不确定性来识别未知异常。此外,我们制定了一种不确定性感知的选择策略,以获取清洁异常实例和NFS模块以生成伪异常。我们的方法通过继承无监督的NF和弱监督的MIL框架的优势来优于现有方法。多个现实世界视频数据集的实验结果显示了我们方法的有效性。
translated by 谷歌翻译
Video anomaly detection (VAD) -- commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature -- is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model long- and short-range temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study conducted on each component confirms its effectiveness in the problem, and the extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on two commonly-used benchmark datasets in the VAD problem (UCF-Crime and ShanghaiTech Campus). The source code will be made publicly available upon acceptance.
translated by 谷歌翻译
Detecting abnormal crowd motion emerging from complex interactions of individuals is paramount to ensure the safety of crowds. Crowd-level abnormal behaviors (CABs), e.g., counter flow and crowd turbulence, are proven to be the crucial causes of many crowd disasters. In the recent decade, video anomaly detection (VAD) techniques have achieved remarkable success in detecting individual-level abnormal behaviors (e.g., sudden running, fighting and stealing), but research on VAD for CABs is rather limited. Unlike individual-level anomaly, CABs usually do not exhibit salient difference from the normal behaviors when observed locally, and the scale of CABs could vary from one scenario to another. In this paper, we present a systematic study to tackle the important problem of VAD for CABs with a novel crowd motion learning framework, multi-scale motion consistency network (MSMC-Net). MSMC-Net first captures the spatial and temporal crowd motion consistency information in a graph representation. Then, it simultaneously trains multiple feature graphs constructed at different scales to capture rich crowd patterns. An attention network is used to adaptively fuse the multi-scale features for better CAB detection. For the empirical study, we consider three large-scale crowd event datasets, UMN, Hajj and Love Parade. Experimental results show that MSMC-Net could substantially improve the state-of-the-art performance on all the datasets.
translated by 谷歌翻译
视频异常检测是现在计算机视觉中的热门研究主题之一,因为异常事件包含大量信息。异常是监控系统中的主要检测目标之一,通常需要实时行动。关于培训的标签数据的可用性(即,没有足够的标记数据进行异常),半监督异常检测方法最近获得了利益。本文介绍了该领域的研究人员,以新的视角,并评论了最近的基于深度学习的半监督视频异常检测方法,基于他们用于异常检测的共同策略。我们的目标是帮助研究人员开发更有效的视频异常检测方法。由于选择右深神经网络的选择对于这项任务的几个部分起着重要作用,首先准备了对DNN的快速比较审查。与以前的调查不同,DNN是从时空特征提取观点审查的,用于视频异常检测。这部分审查可以帮助本领域的研究人员选择合适的网络,以获取其方法的不同部分。此外,基于其检测策略,一些最先进的异常检测方法受到严格调查。审查提供了一种新颖,深入了解现有方法,并导致陈述这些方法的缺点,这可能是未来作品的提示。
translated by 谷歌翻译
视频异常检测是视觉中的核心问题。正确检测和识别视频数据中行人中的异常行为将使安全至关重要的应用,例如监视,活动监测和人类机器人的互动。在本文中,我们建议利用无监督的行人异常事件检测的轨迹定位和预测。与以前的基于重建的方法不同,我们提出的框架依赖于正常和异常行人轨迹的预测误差来在空间和时间上检测异常。我们介绍了有关不同时间尺度的现实基准数据集的实验结果,并表明我们提出的基于轨迹预言的异常检测管道在识别视频中行人的异常活动方面有效有效。代码将在https://github.com/akanuasiegbu/leveraging-trajectory-prediction-for-pedestrian-video-anomaly-detection上提供。
translated by 谷歌翻译
检测视频中的异常事件通常被帧为单级分类任务,其中培训视频仅包含正常事件,而测试视频包含正常和异常事件。在这种情况下,异常检测是一个开放式问题。然而,一些研究吸收异常检测行动识别。这是一个封闭式场景,无法测试检测到新的异常类型时系统的能力。为此,我们提出UbnorMal,这是一个由多个虚拟场景组成的新的监督开放式基准,用于视频异常检测。与现有数据集不同,我们首次引入在训练时间的像素级别注释的异常事件,从而实现了用于异常事件检测的完全监督的学习方法。为了保留典型的开放式配方,我们确保在我们的培训和测试集合中包括Disjoint集的异常类型。据我们所知,Ubnormal是第一个视频异常检测基准,以允许一流的开放模型和监督闭合模型之间的公平头部比较,如我们的实验所示。此外,我们提供了实证证据,表明Ubnormal可以提高两个突出数据集,大道和上海学习的最先进的异常检测框架的性能。
translated by 谷歌翻译
视频中的战斗检测是当今监视系统和流媒体的流行率的新兴深度学习应用程序。以前的工作主要依靠行动识别技术来解决这个问题。在本文中,我们提出了一种简单但有效的方法,该方法从新的角度解决了任务:我们将战斗检测模型设计为动作感知功能提取器和异常得分生成器的组成。另外,考虑到视频收集帧级标签太费力了,我们设计了一个弱监督的两阶段训练计划,在此我们使用在视频级别标签上计算出的多个实体学习损失来培训得分生成器,并采用自我训练的技术以进一步提高其性能。在公开可用的大规模数据集(UBI-Fights)上进行了广泛的实验,证明了我们方法的有效性,并且数据集的性能超过了几种先前的最先进的方法。此外,我们收集了一个新的数据集VFD-2000,该数据集专门研究视频战斗检测,比现有数据集更大,场景更大。我们的方法的实现和拟议的数据集将在https://github.com/hepta-col/videofightdetection上公开获得。
translated by 谷歌翻译
Object movement identification is one of the most researched problems in the field of computer vision. In this task, we try to classify a pixel as foreground or background. Even though numerous traditional machine learning and deep learning methods already exist for this problem, the two major issues with most of them are the need for large amounts of ground truth data and their inferior performance on unseen videos. Since every pixel of every frame has to be labeled, acquiring large amounts of data for these techniques gets rather expensive. Recently, Zhao et al. [1] proposed one of a kind Arithmetic Distribution Neural Network (ADNN) for universal background subtraction which utilizes probability information from the histogram of temporal pixels and achieves promising results. Building onto this work, we developed an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
translated by 谷歌翻译
在当代社会中,监视异常检测,即在监视视频中发现异常事件,例如犯罪或事故,是一项关键任务。由于异常发生很少发生,大多数培训数据包括没有标记的视频,没有异常事件,这使得任务具有挑战性。大多数现有方法使用自动编码器(AE)学习重建普通视频;然后,他们根据未能重建异常场景的出现来检测异常。但是,由于异常是通过外观和运动来区分的,因此许多先前的方法使用预训练的光流模型明确分开了外观和运动信息,例如。这种明确的分离限制了两种类型的信息之间的相互表示功能。相比之下,我们提出了一个隐式的两路AE(ITAE),其中两个编码器隐含模型外观和运动特征以及一个将它们组合在一起以学习正常视频模式的结构。对于正常场景的复杂分布,我们建议通过归一化流量(NF)的生成模型对ITAE特征的正常密度估计,以学习可拖动的可能性,并使用无法分布的检测来识别异常。 NF模型通过隐式学习的功能通过学习正常性来增强ITAE性能。最后,我们在六个基准测试中演示了ITAE及其特征分布建模的有效性,包括在现实世界中包含各种异常的数据库。
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译
最近在文献中引入了用于视频异常检测的自我监督的多任务学习(SSMTL)框架。由于其准确的结果,该方法吸引了许多研究人员的注意。在这项工作中,我们重新审视了自我监督的多任务学习框架,并提出了对原始方法的几个更新。首先,我们研究各种检测方法,例如基于使用光流或背景减法检测高运动区域,因为我们认为当前使用的预训练的Yolov3是次优的,例如从未检测到运动中的对象或来自未知类的对象。其次,我们通过引入多头自发项模块的启发,通过引入多头自我发项模块,使3D卷积骨干链现代化。因此,我们替代地引入了2D和3D卷积视觉变压器(CVT)块。第三,为了进一步改善模型,我们研究了其他自我监督的学习任务,例如通过知识蒸馏来预测细分图,解决拼图拼图,通过知识蒸馏估算身体的姿势,预测掩盖的区域(Inpaining)和对抗性学习具有伪异常。我们进行实验以评估引入变化的性能影响。在找到框架的更有希望的配置后,称为SSMTL ++ V1和SSMTL ++ V2后,我们将初步实验扩展到了更多数据集,表明我们的性能提高在所有数据集中都是一致的。在大多数情况下,我们在大道,上海the夫和Ubnormal上的结果将最新的表现提升到了新的水平。
translated by 谷歌翻译
在计算机视觉领域,异常检测最近引起了越来越多的关注,这可能是由于其广泛的应用程序从工业生产线上的产品故障检测到视频监视中即将发生的事件检测到在医疗扫描中发现病变。不管域如何,通常将异常检测构架为一级分类任务,其中仅在正常示例上进行学习。整个成功的异常检测方法的家庭基于学习重建掩盖的正常输入(例如贴片,未来帧等),并将重建误差的幅度作为异常水平的指标。与其他基于重建的方法不同,我们提出了一种新颖的自我监督蒙面的卷积变压器块(SSMCTB),该卷积变压器块(SSMCTB)包括基于重建的功能在核心架构层面上。拟议的自我监督块非常灵活,可以在神经网络的任何层上掩盖信息,并与广泛的神经体系结构兼容。在这项工作中,我们扩展了以前的自我监督预测性卷积专注块(SSPCAB),并具有3D掩盖的卷积层,以及用于频道注意的变压器。此外,我们表明我们的块适用于更广泛的任务,在医学图像和热视频中添加异常检测到基于RGB图像和监视视频的先前考虑的任务。我们通过将SSMCTB的普遍性和灵活性整合到多个最先进的神经模型中,以进行异常检测,从而带来了经验结果,可以证实对五个基准的绩效改进:MVTEC AD,BRATS,BRATS,Avenue,Shanghaitech和Thermal和Thermal和Thermal罕见事件。我们在https://github.com/ristea/ssmctb上发布代码和数据作为开源。
translated by 谷歌翻译