视频异常检测是计算机视觉社区的一项具有挑战性的任务。大多数基于任务的方法都不考虑独特的空间和时间模式的独立性,而两流结构则缺乏对相关性的探索。在本文中,我们提出了时空记忆增强了两个流动自动编码器框架,该框架可以独立学习外观正常和运动正常,并通过对抗性学习探索相关性。具体而言,我们首先设计了两个代理任务来训练两流结构,以隔离地提取外观和运动特征。然后,将原型特征记录在相应的空间和时间内存池中。最后,编码编码网络通过歧视者进行对抗学习,以探索空间和时间模式之间的相关性。实验结果表明,我们的框架优于最先进的方法,在UCSD PED2和CUHK Avenue数据集上,AUC达到98.1%和89.8%。
translated by 谷歌翻译
可靠且稳定的6D姿势估计不合作空间对象在轨道维修和清除碎片清除任务中起着至关重要的作用。考虑到姿势估计器对背景干扰很敏感,本文提出了一个名为CaspaceNet的反事实分析框架,以完成复杂背景下的Spaceborne目标的稳健6D姿势估计。具体而言,采用常规方法在事实情况下提取整个图像的特征。在反事实情况下,不存在无目标的图像,但只想想象背景。反事实分析降低了由背景干扰引起的副作用,从而导致最终结果中的预测无偏见。此外,我们还对Ca-paceNet进行了低位宽度量化,并将部分框架部署到FPGA上的内存加速器(PIM)加速器上。定性和定量结果证明了我们提出的方法的有效性和效率。据我们所知,本文首次将因果推理和网络量化应用于6D姿势估计太空源目标。该代码可在https://github.com/shunli-wang/ca-pacenet上获得。
translated by 谷歌翻译
近年来,评估视频的行动质量引起了计算机视觉群落和人机互动中的不断关注。大多数现有方法通常通过直接从动作识别任务迁移模型来解决这个问题,这忽略了特征映射内的内在差异,例如前景和背景信息。为了解决这个问题,我们提出了一种用于行动质量评估(AQA)的管自我关注网络(TSA网)。具体地,我们将单个对象跟踪器引入AQA并提出了管自我关注模块(TSA),可以通过采用稀疏特征交互有效地产生丰富的时空上下文信息。 TSA模块嵌入在现有的视频网络中以形成TSA-Net。总体而言,我们的TSA-网具有以下优点:1)高计算效率,2)灵活性高,3)最先进的性能。在包括AQA-7和MTL-AQA的流行动作质量评估数据集上进行了广泛的实验。此外,提出了一个名为Fint识别的数据集(FR-FS),以探索花样滑冰场景中的基本动作评估。
translated by 谷歌翻译
最近,大多数手写的数学表达识别(HMER)方法采用编码器 - 编码器网络,该网络直接从具有注意机制的公式图像中直接预测标记序列。但是,此类方法可能无法准确读取具有复杂结构的公式或生成长的标记序列,因为由于写作样式或空间布局的差异很大,注意结果通常是不准确的。为了减轻此问题,我们为HMER提出了一个名为Counting-Aware-Aware网络(CAN)的非常规网络,该网络共同优化了两个任务:HMER和符号计数。具体而言,我们设计了一个弱监督的计数模块,该模块可以预测每个符号类的数量,而无需符号级别的位置注释,然后将其插入HMER的典型基于注意力的编码器模型。在基准数据集上进行的实验验证了关节优化和计数结果既有益于纠正编码器模型的预测误差,又可以始终如一地胜过最先进的方法。特别是,与HMER的编码器模型相比,提议的计数模块引起的额外时间成本是边缘的。源代码可从https://github.com/lbh1024/can获得。
translated by 谷歌翻译
人群本地化(预测头部位置)是一项更实用,更高的任务,而不是仅仅计数。现有方法采用伪装框或预设计的本地化图,依靠复杂的后处理来获得头部位置。在本文中,我们提出了一个名为CLTR的优雅,端到端的人群本地化变压器,该变压器在基于回归的范式中解决了任务。所提出的方法将人群定位视为直接设置的预测问题,将提取的功能和可训练的嵌入作为变压器描述器的输入。为了减少模棱两可的点并产生更合理的匹配结果,我们引入了基于KMO的匈牙利匹配器,该匹配器采用附近的环境作为辅助匹配成本。在各种数据设置中在五个数据集上进行的广泛实验显示了我们方法的有效性。特别是,所提出的方法在NWPU-Crowd,UCF-QNRF和Shanghaitech a部分A部分上实现了最佳的本地化性能。
translated by 谷歌翻译
主流人群计数方法通常利用卷积神经网络(CNN)回归密度图,需要点级注释。但是,用一点点注释每个人是一个昂贵且费力的过程。在测试阶段,未考虑点级注释来评估计数精度,这意味着点级注释是冗余的。因此,希望开发仅依赖计数级注释的弱监督计数方法,这是一种更经济的标签方式。当前的弱监督计数方法采用了CNN来通过图像计数范式回归人群的总数。但是,对于上下文建模的接受场有限是这些基于CNN的弱监督法的内在局限性。因此,在现实世界中的应用有限的情况下,这些方法无法实现令人满意的性能。变压器是自然语言处理(NLP)中流行的序列到序列预测模型,其中包含一个全球接收场。在本文中,我们提出了transercroderd,从基于变压器的序列到计数的角度来重新制定了弱监督的人群计数问题。我们观察到,所提出的译者可以使用变压器的自发机制有效地提取语义人群信息。据我们所知,这是第一项采用纯变压器进行人群计算研究的工作。五个基准数据集的实验表明,与所有基于弱的CNN的计数方法相比,所提出的transercroud的性能优于较高的性能,并且与某些流行的完全监督的计数方法相比,基于CNN的计数方法和提高了竞争激烈的计数性能。
translated by 谷歌翻译
在本文中,我们专注于人群本地化任务,这是人群分析的关键主题。大多数基于回归的方法都利用卷积神经网络(CNN)回归密度图,该密度图无法准确地定位在极度密集的场景中,这两个至关重要的原因是:1)密度图由一系列模糊的高斯斑点组成,2)密度图的致密区域中存在严重的重叠。为了解决这个问题,我们为人群本地化任务提出了一个新颖的焦点反向变换(FIDT)图。与密度图相比,FIDT地图准确地描述了人们的位置,而不会在密集区域重叠。基于FIDT地图,得出了局部Maxima-detection-Strategy(LMDS),以有效地为每个人提取中心点。此外,我们引入了独立的SSIM(I-SSIM)损失,以使模型倾向于学习局部结构信息,从而更好地识别局部最大值。广泛的实验表明,提出的方法报告在六个人群数据集和一个车辆数据集上的最先进的本地化性能。此外,我们发现所提出的方法在负面和极密密集的场景上显示出优异的鲁棒性,这进一步验证了FIDT地图的有效性。该代码和模型将在https://github.com/dk-liang/fidtm上找到。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译