来自数据的顺序模式是各种时间序列预测任务的核心。深度学习模型大大优于许多传统模型,但是这些黑框模型通常缺乏预测和决策的解释性。为了揭示具有可理解的数学表达式的潜在趋势,科学家和经济学家倾向于使用部分微分方程(PDE)来解释顺序模式的高度非线性动力学。但是,它通常需要领域专家知识和一系列简化的假设,这些假设并不总是实用的,并且可能偏离不断变化的世界。是否可以动态地学习与数据的差异关系以解释时间不断发展的动态?在这项工作中,我们提出了一个学习框架,该框架可以自动从顺序数据中获取可解释的PDE模型。特别是,该框架由可学习的差分块组成,称为$ p $ blocks,事实证明,该框架能够近似于理论上随着时间不断变化的复杂连续功能。此外,为了捕获动力学变化,该框架引入了元学习控制器,以动态优化混合PDE模型的超参数。 《时代》系列预测金融,工程和健康数据的广泛实验表明,我们的模型可以提供有价值的解释性并实现与最先进模型相当的性能。从经验研究中,我们发现学习一些差异操作员可能会捕获无需大量计算复杂性的顺序动力学的主要趋势。
translated by 谷歌翻译
多元时间序列预测已在各种领域(包括金融,交通,能源和医疗保健)中广泛范围的应用程序。为了捕获复杂的时间模式,大量研究设计了基于RNN,GNN和Transformers的许多变体的复杂神经网络体系结构。但是,复杂的模型在计算上通常是昂贵的,因此当应用于大型现实世界数据集时,在训练和推理效率方面面临严重的挑战。在本文中,我们介绍了Lightts,这是一种基于简单的基于MLP的结构的轻度深度学习体系结构。 LightT的关键思想是在两种微妙的下采样策略之上应用基于MLP的结构,包括间隔抽样和连续采样,灵感来自至关重要的事实,即下采样时间序列通常保留其大多数信息。我们对八个广泛使用的基准数据集进行了广泛的实验。与现有的最新方法相比,Lightts在其中五个方面表现出更好的性能,其余的性能可比性。此外,Lightts高效。与最大的基准数据集上的先前SOTA方法相比,它使用的触发器少于5%。此外,Lightts的预测准确性与以前的SOTA方法相比,在长序列预测任务中,预测准确性的差异要小得多。
translated by 谷歌翻译
花样滑冰评分是一项艰巨的任务,因为它需要评判玩家的技术动作以及与背景音乐的协调。先前基于学习的工作无法很好地解决它,原因有两个:1)每次动作迅速变化,因此,仅应用传统的框架采样将损失很多有价值的信息,尤其是在3-5分钟的持续视频中,因此非常极端远程表示学习是必要的; 2)先前的方法很少考虑其模型中的关键视听关系。因此,我们介绍了一个多模式MLP体系结构,名为Skating-Mixer。它将基于MLP混合的框架扩展到多模式的方式,并通过我们设计的内存复发单元(MRU)有效地学习长期表示。除模型外,我们还收集了高质量的音频Visual FS1000数据集,该数据集包含1000多个视频,其中8种具有7种不同评级指标的程序类型,并在数量和多样性中都超过其他数据集。实验表明,所提出的方法优于公共FIS-V和我们的FS1000数据集的所有主要指标。此外,我们还包括一项分析,将我们的方法应用于北京2022年冬季奥运会的最新比赛,证明我们的方法具有强大的鲁棒性。
translated by 谷歌翻译
骨质疏松症是一种常见的慢性代谢骨病,通常是由于对骨矿物密度(BMD)检查有限的有限获得而被诊断和妥善治疗,例如。通过双能X射线吸收测定法(DXA)。在本文中,我们提出了一种方法来预测来自胸X射线(CXR)的BMD,最常见的和低成本的医学成像考试之一。我们的方法首先自动检测来自CXR的局部和全球骨骼结构的感兴趣区域(ROI)。然后,开发了一种具有变压器编码器的多ROI深模型,以利用胸部X射线图像中的本地和全局信息以进行准确的BMD估计。我们的方法在13719 CXR患者病例中进行评估,并通过金标准DXA测量其实际BMD评分。该模型预测的BMD与地面真理(Pearson相关系数0.889腰腰1)具有强烈的相关性。当施用骨质疏松症筛查时,它实现了高分类性能(腰腰1的AUC 0.963)。作为现场使用CXR扫描预测BMD的第一次努力,所提出的算法在早期骨质疏松症筛查和公共卫生促进中具有很强的潜力。
translated by 谷歌翻译
膝关节骨关节炎(OA)是一种常见的堕落联合障碍,影响全世界的大型老年人。膝关节OA严重程度的准确放射线摄影评估在慢性患者管理中起着关键作用。目前临床采用的膝盖oA分级系统是观察者主观的,遭受帧间间的分歧。在这项工作中,我们提出了一种计算机辅助诊断方法,可以同时为两种复合材料和细粒度的OA等级提供更准确和一致的评估。提出了一种新的半监督学习方法,通过从未标记的数据学习来利用复合材料和细粒度的OA等级的潜在一致性。通过使用预先训练的高斯混合模型的日志概率表示等级相干性,我们制定了不连贯的损失,以纳入训练中的未标记数据。该方法还描述了基于关键点的汇集网络,其中从疾病目标键点(沿膝关节提取)汇集了深度图像特征,以提供更准确的和病于病理信息的特征表示,以获得准确的OA级评估。拟议的方法在公共骨关节炎倡议(OAI)数据上全面评估了4,796名科目的多中心的十年观测研究。实验结果表明,我们的方法对以前的强大的整个图像的深度分类网络基线(如Resnet-50)的显着改进。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译