智能论文笔记

Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Haigen Hu , Leizhao Shen , Qiu Guan , Xiaoxin Li , Qianwei Zhou , Su Ruan

分类：计算机视觉

2021-12-20

由于不规则的形状，正常和感染组织之间的各种尺寸和无法区分的边界，仍然是一种具有挑战性的任务，可以准确地在CT图像上进行Covid-19的感染病变。在本文中，提出了一种新的分段方案，用于通过增强基于编码器 - 解码器架构的不同级别的监督信息和融合多尺度特征映射来感染Covid-19。为此，提出了深入的协作监督（共同监督）计划，以指导网络学习边缘和语义的特征。更具体地，首先设计边缘监控模块（ESM），以通过将边缘监督信息结合到初始阶段的下采样的初始阶段来突出显示低电平边界特征。同时，提出了一种辅助语义监督模块（ASSM）来加强通过将掩码监督信息集成到稍后阶段来加强高电平语义信息。然后，通过使用注意机制来扩展高级和低电平特征映射之间的语义间隙，开发了一种注意融合模块（AFM）以融合不同级别的多个规模特征图。最后，在四个各种Covid-19 CT数据集上证明了所提出的方案的有效性。结果表明，提出的三个模块都是有希望的。基于基线（RESUNT），单独使用ESM，ASSM或AFM可以分别将骰子度量增加1.12 \％，1.95 \％，1.63 \％，而在我们的数据集中，通过将三个模型结合在一起可以上升3.97 \％。与各个数据集的现有方法相比，所提出的方法可以在某些主要指标中获得更好的分段性能，并可实现最佳的泛化和全面的性能。

translated by 谷歌翻译

Adaptively Customizing Activation Functions for Various Layers

Haigen Hu , Aizhu Liu , Qiu Guan , Xiaoxin Li , Shengyong Chen , Qianwei Zhou

分类：计算机视觉 | 机器学习

2021-12-17

为了增强神经网络的非线性并提高输入和响应变量之间的映射能力，激活函数在数据中扮演更复杂的关系和模式的重要作用。在这项工作中，提出了一种新颖的方法，仅通过向传统的激活功能（如Sigmoid，TanH和Relu）添加很少的参数来自适应地自定义激活函数。为了验证所提出的方法的有效性，提出了关于加速收敛性和提高性能的一些理论和实验分析，并基于各种网络模型进行一系列实验（例如AlexNet，Vggnet，Googlenet，Reset和DenSenet）和各种数据集（如Cifar10，CiFar100，MiniimAgenet，Pascal VOC和Coco）。为了进一步验证各种优化策略和使用场景中的有效性和适用性，还在不同的优化策略（如SGD，势头，adagrad，Adadelta和AdaDelta和Adam）之间实施了一些比较实验以及与分类和检测等不同的识别任务。结果表明，提出的方法非常简单，但在收敛速度，精度和泛化方面具有显着性能，它可以超越像雷丝和自适应功能等其他流行的方法，如在整体性能方面几乎所有实验。该代码公开可在https://github.com/huhaigen/aptove-custivation-操作系统上使用。该包装包括所提出的三种自适应激活功能，可用于可重复性目的。

translated by 谷歌翻译

A Generalization of ViT/MLP-Mixer to Graphs

Xiaoxin He , Bryan Hooi , Thomas Laurent , Adam Perold , Yann LeCun , Xavier Bresson

分类：计算机视觉

2022-12-27

Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on the Long Range Graph Benchmark (LRGB) and the TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them.

translated by 谷歌翻译

A Fast Blockchain-based Federated Learning Framework with Compressed Communications

Laizhong Cui , Xiaoxin Su , Yipeng Zhou

分类：机器学习

2022-08-12

最近，基于区块链的联合学习（BFL）引起了密集的研究关注，因为培训过程是可审核的，并且该体系结构无助于避免了Vanilla Federated学习（VFL）中参数服务器的单点故障。然而，BFL大大升级了通信流量量，因为BFL客户端获得的所有本地模型更新（即，模型参数的更改）都将转移给所有矿工进行验证以及所有客户端以进行聚合。相比之下，参数服务器和VFL中的客户端仅保留汇总模型更新。因此，BFL的巨大沟通流量将不可避免地损害培训效率，并阻碍BFL现实的部署。为了提高BFL的实用性，我们是第一个通过压缩BFL中的通信（称为BCFL）来提出基于快速区块链的联合学习框架的人之一。同时，我们得出了BCFL的收敛速率，而非凸损失损失。为了最大化最终模型的准确性，我们进一步提出问题，以最大程度地减少收敛率的训练损失，而相对于压缩率和块生成速率的训练时间有限，这是BI-CONVEX优化问题，可以是有效解决。最后，为了证明BCFL的效率，我们对标准CIFAR-10和女权主义数据集进行了广泛的实验。我们的实验结果不仅验证了我们的分析的正确性，而且还表明BCFL可以显着将通信流量降低95-98％，或者与BFL相比，训练时间缩短了90-95％。

translated by 谷歌翻译

Optimal Rate Adaption in Federated Learning with Compressed Communications

Laizhong Cui , Xiaoxin Su , Yipeng Zhou , Jiangchuan Liu

分类：机器学习

2021-12-13

联合学习（FL）引发了高通信开销，这可以通过压缩模型更新而大大缓解。然而，网络环境中压缩和模型精度之间的权衡仍不清楚，为简单起见，大多数实现仅采用固定压缩率。在本文中，我们首次系统地检查了该权衡，识别压缩误差对最终模型精度的影响，相对于学习率。具体而言，我们将每个全局迭代的压缩误差因其强大凸面和非凸损耗下的收敛速度分析。然后，我们通过策略性地调整每次迭代中的压缩速率来提高最终模型精度来最大化最终模型精度的适应框架。我们讨论了具有代表压缩算法的实用网络中框架的关键实施问题。对流行的MNIST和CIFAR-10数据集的实验证实，我们的解决方案有效地降低了网络流量，但在FL中保持了高模型精度。

translated by 谷歌翻译

Large-Scale Deep Learning Optimizations: A Comprehensive Survey

Xiaoxin He , Fuzhao Xue , Xiaozhe Ren , Yang You

分类：机器学习

2021-11-01

深度学习在广泛的AI应用方面取得了有希望的结果。较大的数据集和模型一致地产生更好的性能。但是，我们一般花费更长的培训时间，以更多的计算和沟通。在本调查中，我们的目标是在模型精度和模型效率方面提供关于大规模深度学习优化的清晰草图。我们调查最常用于优化的算法，详细阐述了大批量培训中出现的泛化差距的可辩论主题，并审查了解决通信开销并减少内存足迹的SOTA策略。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Li Zhang , Chris Callison-Burch

分类：自然语言处理

2023-01-03

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译