Time series anomaly detection strives to uncover potential abnormal behaviors and patterns from temporal data, and has fundamental significance in diverse application scenarios. Constructing an effective detection model usually requires adequate training data stored in a centralized manner, however, this requirement sometimes could not be satisfied in realistic scenarios. As a prevailing approach to address the above problem, federated learning has demonstrated its power to cooperate with the distributed data available while protecting the privacy of data providers. However, it is still unclear that how existing time series anomaly detection algorithms perform with decentralized data storage and privacy protection through federated learning. To study this, we conduct a federated time series anomaly detection benchmark, named FedTADBench, which involves five representative time series anomaly detection algorithms and four popular federated learning methods. We would like to answer the following questions: (1)How is the performance of time series anomaly detection algorithms when meeting federated learning? (2) Which federated learning method is the most appropriate one for time series anomaly detection? (3) How do federated time series anomaly detection approaches perform on different partitions of data in clients? Numbers of results as well as corresponding analysis are provided from extensive experiments with various settings. The source code of our benchmark is publicly available at https://github.com/fanxingliu2020/FedTADBench.
translated by 谷歌翻译
Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention
translated by 谷歌翻译
Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm. Existing deep ensemble methods usually naively train many different models and then aggregate their predictions. This is not optimal in our view from two aspects: i) Naively training multiple models adds much more computational burden, especially in the deep learning era; ii) Purely optimizing each base model without considering their interactions limits the diversity of ensemble and performance gains. We tackle these issues by proposing deep negative correlation classification (DNCC), in which the accuracy and diversity trade-off is systematically controlled by decomposing the loss function seamlessly into individual accuracy and the correlation between individual models and the ensemble. DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated. Thanks to the optimized diversities, DNCC works well even when utilizing a shared network backbone, which significantly improves its efficiency when compared with most existing ensemble systems. Extensive experiments on multiple benchmark datasets and network structures demonstrate the superiority of the proposed method.
translated by 谷歌翻译
Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed "X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP +6.5 mask AP on long-tail classes.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
A lack of driver's vigilance is the main cause of most vehicle crashes. Electroencephalography(EEG) has been reliable and efficient tool for drivers' drowsiness estimation. Even though previous studies have developed accurate and robust driver's vigilance detection algorithms, these methods are still facing challenges on following areas: (a) small sample size training, (b) anomaly signal detection, and (c) subject-independent classification. In this paper, we propose a generalized few-shot model, namely EEG-Fest, to improve aforementioned drawbacks. The EEG-Fest model can (a) classify the query sample's drowsiness with a few samples, (b) identify whether a query sample is anomaly signals or not, and (c) achieve subject independent classification. The proposed algorithm achieves state-of-the-art results on the SEED-VIG dataset and the SADT dataset. The accuracy of the drowsy class achieves 92% and 94% for 1-shot and 5-shot support samples in the SEED-VIG dataset, and 62% and 78% for 1-shot and 5-shot support samples in the SADT dataset.
translated by 谷歌翻译
推理是计算机的基本问题,并且在人工智能中深入研究。在本文中,我们专门针对回答知识图(KGS)的多跳逻辑查询。这是一项复杂的任务,因为在实际情况下,图形往往很大且不完整。以前的大多数作品都无法创建模型,这些模型接受了完整的一阶逻辑(fol)查询,其中包括负查询,并且只能处理有限的查询结构集。此外,大多数方法都呈现只能执行其制作的逻辑操作的逻辑运算符。我们介绍了一组模型,这些模型使用神经网络来创建单点矢量嵌入以回答查询。神经网络的多功能性允许该框架处理连词($ \ wedge $),脱节($ \ vee $)和否定($ \ neg $)运算符的框架查询。我们通过对众所周知的基准数据集进行了广泛的实验,通过实验证明了模型的性能。除了拥有更多多功能运营商外,模型还获得了10 \%的相对增加,而基于单点矢量嵌入的最佳性能状态和比原始方法的相对增加了30 \%。
translated by 谷歌翻译
据报道,深度学习系统可以在许多应用程序中实现最新的性能,关键是在基准数据集中存在训练有素的分类器。作为主流损失函数,交叉熵很容易导致我们找到表现出严重过度拟合行为的模型。在本文中,我们表明现有的交叉熵损失最小化问题基本上了解了数据集的基础数据分布的标签条件熵(CE)。但是,以这种方式学习的CE并不能很好地表征标签和输入共享的信息。在本文中,我们提出了一个共同的信息学习框架,在该框架中,我们通过学习标签和输入之间的相互信息来训练深层神经网络分类器。从理论上讲,我们在相互信息方面给出了人口分类误差的下限。此外,我们在$ \ mathbb {r}^n $中的混凝土二进制分类数据模型以及在这种情况下的错误概率下限中得出了相互信息的下限和上限。从经验上讲,我们在几个基准数据集上进行了广泛的实验,以支持我们的理论。相互学习的分类器(MILC)比有条件的熵学习分类器(CELC)取得更好的概括性能,其改进在测试准确性方面可能超过10 \%。
translated by 谷歌翻译
尽管深度神经网络(DNN)最近取得了巨大进步,但它们通常容易受到对抗攻击的影响。已经做出了深入的研究工作,以改善DNN的鲁棒性;但是,大多数经验防御能力可以再次自适应攻击,理论上认证的鲁棒性受到限制,尤其是在大规模数据集上。这种脆弱性DNN的潜在根本原因是,尽管它们表现出了强大的表现力,但它们缺乏做出可靠和可靠预测的推理能力。在本文中,我们旨在集成领域知识,以使强大的学习与推理范式进行稳健的学习。特别是,我们通过推理管道(CARE)提出了一个认证的健壮学习,该学习由学习组成部分和推理组成部分组成。具体而言,我们使用一组标准DNN作为进行语义预测的学习组件,并利用概率图形模型(例如Markov Logic Networks(MLN))作为推理组件,以实现知识/逻辑推理。然而,众所周知,MLN(推理)的确切推断是#P-Complete,它限制了管道的可扩展性。为此,我们建议根据有效的期望最大化算法通过变异推断近似MLN推断。特别是,我们利用图形卷积网络(GCN)在变异推理过程中编码后分布,并更新MLN(M-step)中GCN(E-step)的参数(E-step)和知识规则的权重。我们在不同的数据集上进行了广泛的实验,并表明与最先进的基线相比,CARE的认证鲁棒性明显更高。我们还进行了不同的消融研究,以证明护理的经验鲁棒性和不同知识整合的有效性。
translated by 谷歌翻译
知识蒸馏(KD)已广泛发展并增强了各种任务。经典的KD方法将KD损失添加到原始的跨熵(CE)损失中。我们尝试分解KD损失,以探索其与CE损失的关系。令人惊讶的是,我们发现它可以被视为CE损失和额外损失的组合,其形式与CE损失相同。但是,我们注意到额外的损失迫使学生学习教师绝对概率的相对可能性。此外,这两个概率的总和是不同的,因此很难优化。为了解决这个问题,我们修改了配方并提出分布式损失。此外,我们将教师的目标输出作为软目标,提出软损失。结合软损失和分布式损失,我们提出了新的KD损失(NKD)。此外,我们将学生的目标输出稳定,将其视为无需教师的培训的软目标,并提出了无教师的新KD损失(TF-NKD)。我们的方法在CIFAR-100和Imagenet上实现了最先进的性能。例如,以Resnet-34为老师,我们将Imagenet TOP-1的RESNET18的TOP-1精度从69.90%提高到71.96%。在没有教师的培训中,Mobilenet,Resnet-18和Swintransformer-tiny的培训占70.04%,70.76%和81.48%,分别比基线高0.83%,0.86%和0.30%。该代码可在https://github.com/yzd-v/cls_kd上找到。
translated by 谷歌翻译