智能论文笔记

SRPCN: Structure Retrieval based Point Completion Network

Kaiyi Zhang , Ximing Yang , Yuan Wu , Cheng Jin

分类：计算机视觉

2022-02-06

Given partial objects and some complete ones as references, point cloud completion aims to recover authentic shapes. However, existing methods pay little attention to general shapes, which leads to the poor authenticity of completion results. Besides, the missing patterns are diverse in reality, but existing methods can only handle fixed ones, which means a poor generalization ability. Considering that a partial point cloud is a subset of the corresponding complete one, we regard them as different samples of the same distribution and propose Structure Retrieval based Point Completion Network (SRPCN). It first uses k-means clustering to extract structure points and disperses them into distributions, and then KL Divergence is used as a metric to find the complete structure point cloud that best matches the input in a database. Finally, a PCN-like decoder network is adopted to generate the final results based on the retrieved structure point clouds. As structure plays an important role in describing the general shape of an object and the proposed structure retrieval method is robust to missing patterns, experiments show that our method can generate more authentic results and has a stronger generalization ability.

translated by 谷歌翻译

Attention-based Transformation from Latent Features to Point Clouds

Kaiyi Zhang , Ximing Yang , Yuan Wu , Cheng Jin

分类：计算机视觉

2021-12-10

在点云生成和完成中，用于将潜在特征转换为点云的先前方法通常基于完全连接的层（基于FC）或折叠操作（基于折叠）。然而，基于FC的方法产生的点云通常由异常值和粗糙表面困扰。对于基于折叠的方法，它们的数据流量很大，收敛速度慢，并且它们也很难处理非平滑表面的产生。在这项工作中，我们提出了Axform，一种基于注意的方法来将潜在特征转换为点云。 Axform首先使用完全连接的图层在临时空间中生成点。然后聚合这些中期点以生成目标点云。 AXFROM将参数共享和数据流入到帐户中，这使得异常值较少，更少的网络参数和更快的收敛速度。 Axform产生的点不具有强大的2歧管约束，这改善了非平滑表面的产生。当AxForm扩展到本地代以进行多个分支时，向心缩法使其具有自集群和空间一致性的属性，进一步实现了无监督的语义分割。我们还采用此方案和设计AXFormNet进行点云完成。对不同数据集的相当大的实验表明我们的方法实现了最先进的结果。

translated by 谷歌翻译

A Primal-Dual Approach to Bilevel Optimization with Multiple Inner Minima

Daouda Sow , Kaiyi Ji , Ziwei Guan , Yingbin Liang

分类：机器学习 | (统计)机器学习

2022-03-01

二重优化发现在现代机器学习问题中发现了广泛的应用，例如超参数优化，神经体系结构搜索，元学习等。而具有独特的内部最小点（例如，内部功能是强烈凸的，都具有唯一的内在最小点）的理解，这是充分理解的，多个内部最小点的问题仍然是具有挑战性和开放的。为此问题设计的现有算法适用于限制情况，并且不能完全保证融合。在本文中，我们采用了双重优化的重新制定来限制优化，并通过原始的双二线优化（PDBO）算法解决了问题。 PDBO不仅解决了多个内部最小挑战，而且还具有完全一阶效率的情况，而无需涉及二阶Hessian和Jacobian计算，而不是大多数现有的基于梯度的二杆算法。我们进一步表征了PDBO的收敛速率，它是与多个内部最小值的双光线优化的第一个已知的非质合收敛保证。我们的实验证明了所提出的方法的预期性能。

translated by 谷歌翻译

Graph-based Ensemble Machine Learning for Student Performance Prediction

Yinkai Wang , Aowei Ding , Kaiyi Guan , Shixi Wu , Yuanqi Du

分类：机器学习 | 人工智能

2021-12-15

学生绩效预测是了解学生需求的重要研究问题，呈现适当的学习机会/资源，并培养教学质量。但是，传统的机器学习方法无法产生稳定和准确的预测结果。在本文中，我们提出了一种基于图的集合机器学习方法，旨在通过多种方法的共识来提高单机学习方法的稳定性。具体而言，我们利用监督预测方法和无监督的聚类方法，构建一种迭代方法，它在二分图中传播，以及收敛到更稳定和准确的预测结果。广泛的实验表明了我们提出的方法在预测更准确的学生表现方面的有效性。具体而言，我们的模型优于最佳的传统机器学习算法，以预测准确度高达14.8％。

translated by 谷歌翻译

Cervical Optical Coherence Tomography Image Classification Based on Contrastive Self-Supervised Texture Learning

Kaiyi Chen , Qingbin Wang , Yutao Ma

分类：计算机视觉 | 机器学习

2021-08-11

背景：宫颈癌严重影响了女性生殖系统的健康。光学相干断层扫描（OCT）作为宫颈疾病检测的非侵入性，高分辨率成像技术。然而，OCT图像注释是知识密集型和耗时的，这阻碍了基于深度学习的分类模型的培训过程。目的：本研究旨在基于自我监督学习，开发一种计算机辅助诊断（CADX）方法来对体内宫颈OCT图像进行分类。方法：除了由卷积神经网络（CNN）提取的高电平语义特征外，建议的CADX方法利用了通过对比纹理学习来利用未标记的宫颈OCT图像的纹理特征。我们在中国733名患者的多中心临床研究中对OCT图像数据集进行了十倍的交叉验证。结果：在用于检测高风险疾病的二元分类任务中，包括高级鳞状上皮病变和宫颈癌，我们的方法实现了0.9798加号或减去0.0157的面积曲线值，灵敏度为91.17加或对于OCT图像贴片，减去4.99％，特异性为93.96加仑或减去4.72％;此外，它在测试集上的四位医学专家中表现出两种。此外，我们的方法在使用交叉形阈值投票策略的118名中国患者中达到了91.53％的敏感性和97.37％的特异性。结论：所提出的基于对比 - 学习的CADX方法表现优于端到端的CNN模型，并基于纹理特征提供更好的可解释性，其在“见和治疗”的临床协议中具有很大的潜力。

translated by 谷歌翻译

Provably Faster Algorithms for Bilevel Optimization

Junjie Yang , Kaiyi Ji , Yingbin Liang

分类：机器学习 | (统计)机器学习

2021-06-08

彼得纤维优化已广泛应用于许多重要的机器学习应用，例如普带的参数优化和元学习。最近，已经提出了几种基于动量的算法来解决贝韦尔优化问题。但是，基于SGD的算法的$ \ Mathcal {\ widetilde o}（\ epsilon ^ {-2}），那些基于势头的算法不会达到可释放的计算复杂性。在本文中，我们提出了两种用于双纤维优化的新算法，其中第一算法采用基于动量的递归迭代，第二算法采用嵌套环路中的递归梯度估计来降低方差。我们表明这两种算法都达到了$ \ mathcal {\ widetilde o}的复杂性（\ epsilon ^ { - 1.5}）$，这优于所有现有算法的级别。我们的实验验证了我们的理论结果，并展示了我们在封路数据应用程序中的算法的卓越实证性能。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译