The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition ( The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data. However, the gradient of the log-likelihood function, i.e., the score function, is not properly defined for discrete spaces. This makes it non-trivial to adapt \textcolor{\cdiff}{the score-based modeling} to categorical data. In this paper, we extend diffusion models to discrete variables by introducing a stochastic jump process where the reverse process denoises via a continuous-time Markov chain. This formulation admits an analytical simulation during backward sampling. To learn the reverse process, we extend score matching to general categorical data and show that an unbiased estimator can be obtained via simple matching of the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
translated by 谷歌翻译
Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and NLP Transformers: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the enforced denser/sparser workloads and encoder/decoder engines for boosted hardware utilization. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x, 86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90%, respectively.
translated by 谷歌翻译
在连续空间中,已经对大都市杂货(M-H)算法进行了充分的研究,但在离散空间中缺乏类似的理解。最近,事实证明,一个本地平衡的建议(LBP)是渐进的最佳选择,但最佳缩放问题仍然开放。在本文中,我们首次确定离散空间中M-H的效率也可以以独立于目标分布的渐近可接受率来表征。此外,我们从理论和经验上验证了LBP和Randy Walk Metropolis(RWM)的最佳接受率分别为$ 0.574 $和0.234美元。这些结果还有助于确定LBP是渐近的$ o(n^\ frac {2} {3})$比RWM相对于模型尺寸$ n $更有效。了解最佳接受率的知识使人们可以在离散空间中自动调整提案分布的邻域大小,直接类似于连续空间中的尺寸控制。我们从经验上证明,这种适应性M-H采样可以在离散空间中的各种目标分布(包括训练深度能量模型)中的各种目标分布中进行稳健改进采样。
translated by 谷歌翻译
捕获不规则点云的局部和全局特征对于3D对象检测(3OD)至关重要。但是,主流3D探测器,例如,投票机及其变体,要么放弃池操作过程中的大量本地功能,要么忽略整个场景中的许多全球功能。本文探讨了新的模块,以同时学习积极服务3OD的场景点云的局部全球特征。为此,我们通过同时局部全球特征学习(称为3DLG-detector)提出了一个有效的3OD网络。 3DLG检测器有两个关键贡献。首先,它会开发一个动态点交互(DPI)模块,该模块可在合并过程中保留有效的本地特征。此外,DPI是可拆卸的,可以将其合并到现有的3OD网络中以提高其性能。其次,它开发了一个全局上下文聚合模块,以汇总编码器不同层的多尺度特征,以实现场景上下文意识。我们的方法在SUN RGB-D和扫描仪数据集的检测准确性和鲁棒性方面显示了13个竞争对手的进步。源代码将在出版物时提供。
translated by 谷歌翻译
基于3DCNN,ConvlSTM或光流的先前方法在视频显着对象检测(VSOD)方面取得了巨大成功。但是,它们仍然遭受高计算成本或产生的显着图质量较差的困扰。为了解决这些问题,我们设计了一个基于时空存储器(STM)网络,该网络从相邻帧中提取当前帧的有用时间信息作为VSOD的时间分支。此外,以前的方法仅考虑无时间关联的单帧预测。结果,模型可能无法充分关注时间信息。因此,我们最初将框架间的对象运动预测引入VSOD。我们的模型遵循标准编码器 - 编码器体系结构。在编码阶段,我们通过使用电流及其相邻帧的高级功能来生成高级的时间特征。这种方法比基于光流的方法更有效。在解码阶段,我们提出了一种有效的空间和时间分支融合策略。高级特征的语义信息用于融合低级特征中的对象细节,然后逐步获得时空特征以重建显着性图。此外,受图像显着对象检测(ISOD)中常用的边界监督的启发,我们设计了一种运动感知损失,用于预测对象边界运动,并同时对VSOD和对象运动预测执行多任务学习,这可以进一步促进模型以提取提取的模型时空特征准确并保持对象完整性。在几个数据集上进行的广泛实验证明了我们方法的有效性,并且可以在某些数据集上实现最新指标。所提出的模型不需要光流或其他预处理,并且在推理过程中可以达到近100 fps的速度。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
图形对比学习已成为无监督图表示学习的强大工具。图形对比学习成功的关键是获取高质量的正和负样本作为对比对,以学习输入图的基础结构语义。最近的作品通常从同一训练批次的阳性样品或外部无关图中采样负样品。但是,一个重要的限制在于此类策略,这是对假阴性样本进行采样的不可避免的问题。在本文中,我们提出了一种新颖的方法来利用\ textbf {c} ounterfactual机制来生成\ textbf {g} raph \ textbf {c} ontrastive学习的人造硬性样本与那些基于抽样的策略相比,观点。我们利用反事实机制来产生硬性样品,从而确保生成的样品与类似,但具有与正样品不同的标签。与一些传统的无监督图学习方法和一些SOTA图对比度学习方法相比,所提出的方法在几个数据集上获得了令人满意的结果。我们还进行了一些补充实验,为提出的方法提供了广泛的说明,包括具有不同硬性样品的CGC的性能以及对具有不同相似性测量的硬性阴性样品的评估。
translated by 谷歌翻译
translated by 谷歌翻译