视频异常分析是在计算机视觉领域积极执行的一项核心任务,其应用程序扩展到了监视录像中现实世界中的犯罪检测。在这项工作中,我们解决了与人有关的犯罪分类的任务。在我们提出的方法中,用作骨骼关节轨迹的视频框架中的人体被用作探索的主要来源。首先,我们介绍了扩展HR-Crime数据集的地面真相标签的意义,因此提出了一种监督和无监督的方法,以生成轨迹级别的地面真相标签。接下来,鉴于轨迹级的地面真相的可用性,我们引入了基于轨迹的犯罪分类框架。消融研究是通过各种体系结构和特征融合策略来代表人类轨迹进行的。进行的实验证明了任务的可行性,并为该领域的进一步研究铺平了道路。
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译
深度学习模型已广泛用于监控视频中的异常检测。典型模型配备了重建普通视频的能力,并评估异常视频的重建错误以指示异常的程度。然而,现有方法遭受了两个缺点。首先,它们只能独立地编码每个身份的运动,而不考虑身份之间的相互作用,这也可以指示异常。其次,他们利用了结构在不同场景下固定的粘合模型,这种配置禁止了对场景的理解。在本文中,我们提出了一个分层时空图卷积神经网络(HSTGCNN)来解决这些问题,HSTGCNN由对应于不同级别的图形表示的多个分支组成。高级图形表示编码人们的轨迹以及多个身份之间的交互,而低级图表表示编码每个人的本地身体姿势。此外,我们建议加权组合在不同场景中更好的多个分支。以这种方式实现了对单级图形表示的改进。实现了对场景的理解并提供异常检测。在低分辨率视频中为在低分辨率视频中编码低分辨率视频中的人员的移动速度和方向编码高级别的图表表示,而在高分辨率视频中将更高的权重分配更高的权重。实验结果表明,建议的HSTGCNN在四个基准数据集(UCSD Spistrian,Shanghaitech,Cuhk Aveance和IITB-Whent)上的当前最先进的模型显着优于最新的最先进模型。
translated by 谷歌翻译
视频异常检测是视觉中的核心问题。正确检测和识别视频数据中行人中的异常行为将使安全至关重要的应用,例如监视,活动监测和人类机器人的互动。在本文中,我们建议利用无监督的行人异常事件检测的轨迹定位和预测。与以前的基于重建的方法不同,我们提出的框架依赖于正常和异常行人轨迹的预测误差来在空间和时间上检测异常。我们介绍了有关不同时间尺度的现实基准数据集的实验结果,并表明我们提出的基于轨迹预言的异常检测管道在识别视频中行人的异常活动方面有效有效。代码将在https://github.com/akanuasiegbu/leveraging-trajectory-prediction-for-pedestrian-video-anomaly-detection上提供。
translated by 谷歌翻译
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at videolevel instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training.We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work. The dataset is
translated by 谷歌翻译
室内场景识别是一种不断增长的领域,具有巨大的行为理解,机器人本地化和老年人监测等。在这项研究中,我们使用从社交媒体收集的多模态学习和视频数据来从新的角度来看场景识别的任务。社交媒体视频的可访问性和各种可以为现代场景识别技术和应用提供现实数据。我们提出了一种基于转录语音的融合到文本和视觉功能的模型,用于在名为Instaindoor的室内场景的社交媒体视频的新型数据集上进行分类。我们的模型可实现高达70%的精度和0.7 F1分数。此外,我们通过在室内场景的YouTube-8M子集上基准测试,我们突出了我们的方法的潜力,在那里它达到了74%的精度和0.74f1分数。我们希望这项工作的贡献铺平了在挑战领域的室内场景认可领域的新型研究。
translated by 谷歌翻译
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.
translated by 谷歌翻译
无监督的异常检测旨在通过在正常数据上训练来建立模型以有效地检测看不见的异常。尽管以前的基于重建的方法取得了富有成效的进展,但由于两个危急挑战,他们的泛化能力受到限制。首先,训练数据集仅包含正常模式,这限制了模型泛化能力。其次,现有模型学到的特征表示通常缺乏代表性,妨碍了保持正常模式的多样性的能力。在本文中,我们提出了一种称为自适应存储器网络的新方法,具有自我监督的学习(AMSL)来解决这些挑战,并提高无监督异常检测中的泛化能力。基于卷积的AutoEncoder结构,AMSL包含一个自我监督的学习模块,以学习一般正常模式和自适应内存融合模块来学习丰富的特征表示。四个公共多变量时间序列数据集的实验表明,与其他最先进的方法相比,AMSL显着提高了性能。具体而言,在具有9亿个样本的最大帽睡眠阶段检测数据集上,AMSL以精度和F1分数\ TextBF {4} \%+优于第二个最佳基线。除了增强的泛化能力之外,AMSL还针对输入噪声更加强大。
translated by 谷歌翻译
视频异常检测是现在计算机视觉中的热门研究主题之一,因为异常事件包含大量信息。异常是监控系统中的主要检测目标之一,通常需要实时行动。关于培训的标签数据的可用性(即,没有足够的标记数据进行异常),半监督异常检测方法最近获得了利益。本文介绍了该领域的研究人员,以新的视角,并评论了最近的基于深度学习的半监督视频异常检测方法,基于他们用于异常检测的共同策略。我们的目标是帮助研究人员开发更有效的视频异常检测方法。由于选择右深神经网络的选择对于这项任务的几个部分起着重要作用,首先准备了对DNN的快速比较审查。与以前的调查不同,DNN是从时空特征提取观点审查的,用于视频异常检测。这部分审查可以帮助本领域的研究人员选择合适的网络,以获取其方法的不同部分。此外,基于其检测策略,一些最先进的异常检测方法受到严格调查。审查提供了一种新颖,深入了解现有方法,并导致陈述这些方法的缺点,这可能是未来作品的提示。
translated by 谷歌翻译
我们提出了Bipoco,这是一种带有姿势限制的双向轨迹预测指标,用于检测视频中行人的异常活动。与基于特征重建的先前工作相反,我们的工作通过预测他们的未来轨迹并将预测与他们的期望进行比较来确定行人异常事件。我们引入了一组新型的基于姿势的损失,并通过我们的预测指标和利用每个身体关节的预测误差来进行行人异常检测。实验结果表明,我们的Bipoco方法可以检测具有高检测率的行人异常活动(高达87.0%),并且纳入姿势限制有助于区分预测中的正常和异常姿势。这项工作扩展了使用基于预测的方法进行异常检测的当前文献,并可以受益于安全至关重要的应用,例如自动驾驶和监视。代码可从https://github.com/akanuasiegbu/bipoco获得。
translated by 谷歌翻译
设计可以成功部署在日常生活环境中的活动检测系统需要构成现实情况典型挑战的数据集。在本文中,我们介绍了一个新的未修剪日常生存数据集,该数据集具有几个现实世界中的挑战:Toyota Smarthome Untrimmed(TSU)。 TSU包含以自发方式进行的各种活动。数据集包含密集的注释,包括基本的,复合活动和涉及与对象相互作用的活动。我们提供了对数据集所需的现实世界挑战的分析,突出了检测算法的开放问题。我们表明,当前的最新方法无法在TSU数据集上实现令人满意的性能。因此,我们提出了一种新的基线方法,以应对数据集提供的新挑战。此方法利用一种模态(即视线流)生成注意力权重,以指导另一种模态(即RGB)以更好地检测活动边界。这对于检测以高时间差异为特征的活动特别有益。我们表明,我们建议在TSU和另一个受欢迎的挑战数据集Charades上优于最先进方法的方法。
translated by 谷歌翻译
People living with dementia often exhibit behavioural and psychological symptoms of dementia that can put their and others' safety at risk. Existing video surveillance systems in long-term care facilities can be used to monitor such behaviours of risk to alert the staff to prevent potential injuries or death in some cases. However, these behaviours of risk events are heterogeneous and infrequent in comparison to normal events. Moreover, analyzing raw videos can also raise privacy concerns. In this paper, we present two novel privacy-protecting video-based anomaly detection approaches to detect behaviours of risks in people with dementia. We either extracted body pose information as skeletons and use semantic segmentation masks to replace multiple humans in the scene with their semantic boundaries. Our work differs from most existing approaches for video anomaly detection that focus on appearance-based features, which can put the privacy of a person at risk and is also susceptible to pixel-based noise, including illumination and viewing direction. We used anonymized videos of normal activities to train customized spatio-temporal convolutional autoencoders and identify behaviours of risk as anomalies. We show our results on a real-world study conducted in a dementia care unit with patients with dementia, containing approximately 21 hours of normal activities data for training and 9 hours of data containing normal and behaviours of risk events for testing. We compared our approaches with the original RGB videos and obtained an equivalent area under the receiver operating characteristic curve performance of 0.807 for the skeleton-based approach and 0.823 for the segmentation mask-based approach. This is one of the first studies to incorporate privacy for the detection of behaviours of risks in people with dementia.
translated by 谷歌翻译
在监控视频中的异常检测是挑战,对确保公共安全有挑战性。不同于基于像素的异常检测方法,基于姿势的方法利用高结构化的骨架数据,这降低了计算负担,并避免了背景噪声的负面影响。然而,与基于像素的方法不同,这可以直接利用显式运动特征,例如光学流,基于姿势的方法缺乏替代动态表示。在本文中,提出了一种新的运动嵌入器(ME)以提供从概率的角度来提供姿态运动表示。此外,为自我监控姿势序列重建部署了一种新型任务特定的空间 - 时间变压器(STT)。然后将这两个模块集成到统一规律学习的统一框架中,该框架被称为运动先前规律学习者(MOPLL)。 MOPRL在几个具有挑战性的数据集中实现了4.7%AUC的平均改善,实现了最先进的性能。广泛的实验验证每个提出的模块的多功能性。
translated by 谷歌翻译
自动交通事故检测已吸引机器视觉社区,因为它对自动智能运输系统(ITS)的发展产生了影响和对交通安全的重要性。然而,大多数关于有效分析和交通事故预测的研究都使用了覆盖范围有限的小规模数据集,从而限制了其效果和适用性。交通事故中现有的数据集是小规模,不是来自监视摄像机,而不是开源的,或者不是为高速公路场景建造的。由于在高速公路上发生事故,因此往往会造成严重损坏,并且太快了,无法赶上现场。针对从监视摄像机收集的高速公路交通事故的开源数据集非常需要和实际上。为了帮助视觉社区解决这些缺点,我们努力收集涵盖丰富场景的真实交通事故的视频数据。在通过各个维度进行集成和注释后,在这项工作中提出了一个名为TAD的大规模交通事故数据集。在这项工作中,使用公共主流视觉算法或框架进行了有关图像分类,对象检测和视频分类任务的各种实验,以证明不同方法的性能。拟议的数据集以及实验结果将作为改善计算机视觉研究的新基准提出,尤其是在其中。
translated by 谷歌翻译
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
In recent years, we have seen a significant interest in data-driven deep learning approaches for video anomaly detection, where an algorithm must determine if specific frames of a video contain abnormal behaviors. However, video anomaly detection is particularly context-specific, and the availability of representative datasets heavily limits real-world accuracy. Additionally, the metrics currently reported by most state-of-the-art methods often do not reflect how well the model will perform in real-world scenarios. In this article, we present the Charlotte Anomaly Dataset (CHAD). CHAD is a high-resolution, multi-camera anomaly dataset in a commercial parking lot setting. In addition to frame-level anomaly labels, CHAD is the first anomaly dataset to include bounding box, identity, and pose annotations for each actor. This is especially beneficial for skeleton-based anomaly detection, which is useful for its lower computational demand in real-world settings. CHAD is also the first anomaly dataset to contain multiple views of the same scene. With four camera views and over 1.15 million frames, CHAD is the largest fully annotated anomaly detection dataset including person annotations, collected from continuous video streams from stationary cameras for smart video surveillance applications. To demonstrate the efficacy of CHAD for training and evaluation, we benchmark two state-of-the-art skeleton-based anomaly detection algorithms on CHAD and provide comprehensive analysis, including both quantitative results and qualitative examination.
translated by 谷歌翻译
这项工作解决了弱监督的异常检测,其中允许预测指标不仅可以从正常示例中学习,而且还可以从训练期间提供的一些标签异常。特别是,我们处理视频流中异常活动的本地化:这是一个非常具有挑战性的情况,因为培训示例仅带有视频级别的注释(而不是帧级)。最近的几项工作提出了各种正则化术语来解决它,即通过对弱学习的框架级异常得分的稀疏性和平滑度约束。在这项工作中,我们受到自我监督学习领域的最新进展的启发,并要求模型为同一视频序列的不同增强而产生相同的分数。我们表明,执行这种对齐能够提高模型在XD暴力方面的性能。
translated by 谷歌翻译
在计算机视觉领域,异常检测最近引起了越来越多的关注,这可能是由于其广泛的应用程序从工业生产线上的产品故障检测到视频监视中即将发生的事件检测到在医疗扫描中发现病变。不管域如何,通常将异常检测构架为一级分类任务,其中仅在正常示例上进行学习。整个成功的异常检测方法的家庭基于学习重建掩盖的正常输入(例如贴片,未来帧等),并将重建误差的幅度作为异常水平的指标。与其他基于重建的方法不同,我们提出了一种新颖的自我监督蒙面的卷积变压器块(SSMCTB),该卷积变压器块(SSMCTB)包括基于重建的功能在核心架构层面上。拟议的自我监督块非常灵活,可以在神经网络的任何层上掩盖信息,并与广泛的神经体系结构兼容。在这项工作中,我们扩展了以前的自我监督预测性卷积专注块(SSPCAB),并具有3D掩盖的卷积层,以及用于频道注意的变压器。此外,我们表明我们的块适用于更广泛的任务,在医学图像和热视频中添加异常检测到基于RGB图像和监视视频的先前考虑的任务。我们通过将SSMCTB的普遍性和灵活性整合到多个最先进的神经模型中,以进行异常检测,从而带来了经验结果,可以证实对五个基准的绩效改进:MVTEC AD,BRATS,BRATS,Avenue,Shanghaitech和Thermal和Thermal和Thermal罕见事件。我们在https://github.com/ristea/ssmctb上发布代码和数据作为开源。
translated by 谷歌翻译
未来的活动预期是在Egocentric视觉中具有挑战性问题。作为标准的未来活动预期范式,递归序列预测遭受错误的累积。为了解决这个问题,我们提出了一个简单有效的自我监管的学习框架,旨在使中间表现为连续调节中间代表性,以产生表示(a)与先前观察到的对比的当前时间戳框架中的新颖信息内容和(b)反映其与先前观察到的帧的相关性。前者通过最小化对比损失来实现,并且后者可以通过动态重量机制来实现在观察到的内容中的信息帧中,具有当前帧的特征与观察到的帧之间的相似性比较。通过多任务学习可以进一步增强学习的最终视频表示,该多任务学习在目标活动标签上执行联合特征学习和自动检测到的动作和对象类令牌。在大多数自我传统视频数据集和两个第三人称视频数据集中,SRL在大多数情况下急剧表现出现有的现有最先进。通过实验性事实,还可以准确识别支持活动语义的行动和对象概念的实验性。
translated by 谷歌翻译