智能论文笔记

A Lightweight Reconstruction Network for Surface Defect Inspection

Chao Hu , Jian Yao , Weijie Wu , Weibin Qiu , Liqiang Zhu

分类：计算机视觉 | 机器学习

2022-12-25

Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Few-shot Image Generation with Diffusion Models

Jingyuan Zhu , Huimin Ma , Jiansheng Chen , Jian Yuan

分类：计算机视觉

2022-11-07

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. However, to our knowledge, few-shot image generation tasks have yet to be studied with DDPM-based approaches. Modern approaches are mainly built on Generative Adversarial Networks (GANs) and adapt models pre-trained on large source domains to target domains using a few available samples. In this paper, we make the first attempt to study when do DDPMs overfit and suffer severe diversity degradation as training data become scarce. Then we propose to adapt DDPMs pre-trained on large source domains to target domains using limited data. Our results show that utilizing knowledge from pre-trained DDPMs can significantly accelerate convergence and improve the quality and diversity of the generated images. Moreover, we propose a DDPM-based pairwise similarity loss to preserve the relative distances between generated samples during domain adaptation. In this way, we further improve the generation diversity of the proposed DDPM-based approaches. We demonstrate the effectiveness of our approaches qualitatively and quantitatively on a series of few-shot image generation tasks and achieve results better than current state-of-the-art GAN-based approaches in quality and diversity.

translated by 谷歌翻译

Inductive Logical Query Answering in Knowledge Graphs

Mikhail Galkin , Zhaocheng Zhu , Hongyu Ren , Jian Tang

分类：人工智能 | 机器学习

2022-10-13

Formulating and answering logical queries is a standard communication interface for knowledge graphs (KGs). Alleviating the notorious incompleteness of real-world KGs, neural methods achieved impressive results in link prediction and complex query answering tasks by learning representations of entities, relations, and queries. Still, most existing query answering methods rely on transductive entity embeddings and cannot generalize to KGs containing new entities without retraining the entity embeddings. In this work, we study the inductive query answering task where inference is performed on a graph containing new entities with queries over both seen and unseen entities. To this end, we devise two mechanisms leveraging inductive node and relational structure representations powered by graph neural networks (GNNs). Experimentally, we show that inductive models are able to perform logical reasoning at inference time over unseen nodes generalizing to graphs up to 500% larger than training ones. Exploring the efficiency--effectiveness trade-off, we find the inductive relational structure representation method generally achieves higher performance, while the inductive node representation method is able to answer complex queries in the inference-only regime without any training on queries and scales to graphs of millions of nodes. Code is available at https://github.com/DeepGraphLearning/InductiveQE.

translated by 谷歌翻译

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

Ahmad Darkhalil , Dandan Shan , Bin Zhu , Jian Ma , Amlan Kar , Richard Higgins , Sanja Fidler , David Fouhey , Dima Damen

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-26

我们介绍了遮阳板，一个新的像素注释的新数据集和一个基准套件，用于在以自我为中心的视频中分割手和活动对象。遮阳板注释Epic-kitchens的视频，其中带有当前视频分割数据集中未遇到的新挑战。具体而言，我们需要确保像素级注释作为对象经历变革性相互作用的短期和长期一致性，例如洋葱被剥皮，切成丁和煮熟 - 我们旨在获得果皮，洋葱块，斩波板，刀，锅以及表演手的准确像素级注释。遮阳板引入了一条注释管道，以零件为ai驱动，以进行可伸缩性和质量。总共，我们公开发布257个对象类的272K手册语义面具，990万个插值密集口罩，67K手动关系，涵盖36小时的179个未修剪视频。除了注释外，我们还引入了视频对象细分，互动理解和长期推理方面的三个挑战。有关数据，代码和排行榜：http：//epic-kitchens.github.io/visor

translated by 谷歌翻译

Bi-level Physics-Informed Neural Networks for PDE Constrained Optimization using Broyden's Hypergradients

Zhongkai Hao , Chengyang Ying , Hang Su , Jun Zhu , Jian Song , Ze Cheng

分类：机器学习

2022-09-15

基于深度学习的方法，例如物理知识的神经网络（PINN）和DeepOnets已显示出解决PDE受约束优化（PDECO）问题的希望。但是，现有方法不足以处理对优化目标具有复杂或非线性依赖性的PDE约束。在本文中，我们提出了一个新颖的双层优化框架，以通过将目标和约束的优化解耦来解决挑战。对于内部循环优化，我们采用PINN仅解决PDE约束。对于外循环，我们通过基于隐式函数定理（IFT）使用Broyden的方法来设计一种新颖的方法，该方法对于近似高度级别而言是有效且准确的。我们进一步介绍了高度级计算的理论解释和误差分析。在多个大规模和非线性PDE约束优化问题上进行了广泛的实验表明，与强基础相比，我们的方法可实现最新的结果。

translated by 谷歌翻译

StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing

Xuekai Zhu , Jian Guan , Minlie Huang , Juan Liu

分类：自然语言处理

2022-08-29

非平行文本样式转移是自然语言生成的重要任务。但是，先前的研究集中在令牌或句子级别上，例如句子情绪和形式转移，但在话语水平上忽略了长时间的转移。长文本通常涉及更复杂的作者语言偏好，例如话语结构，而不是句子。在本文中，我们制定了非并行故事作者风格转移的任务，该任务需要将输入故事传输到指定的作者样式的同时，同时维护源语义。为了解决这个问题，我们提出了一个名为StoryTrans的一代模型，该模型利用话语表示捕获源内容信息并将其传输到具有可学习样式嵌入的目标样式中。我们使用额外的培训目标将文学的文学特征与学习的话语表示，以防止模型退化为自动编码器。此外，为了增强内容保存，我们设计了一个面具和填充框架，以将源文本的特定于特定于样式的关键字定为生成。此外，我们分别用中文和英语构建了此任务的新数据集。广泛的实验表明，我们的模型在样式传输和内容保存的总体性能方面优于强大的基线。

translated by 谷歌翻译

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Ren Yang , Radu Timofte , Xin Li , Qi Zhang , Lin Zhang , Fanglong Liu , Dongliang He , Fu li , He Zheng , Weihang Yuan

分类：计算机视觉

2022-08-23

本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率，轨迹〜2靶向压缩视频的超分辨率。在轨道1中，我们使用流行的数据集DIV2K作为培训，验证和测试集。在轨道2中，我们提出了LDV 3.0数据集，其中包含365个视频，包括LDV 2.0数据集（335个视频）和30个其他视频。在这一挑战中，有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。

translated by 谷歌翻译

Label-Guided Auxiliary Training Improves 3D Object Detector

Yaomin Huang , Xinmei Liu , Yichen Zhu , Zhiyuan Xu , Chaomin Shen , Zhengping Che , Guixu Zhang , Yaxin Peng , Feifei Feng , Jian Tang

分类：计算机视觉 | 人工智能

2022-07-24

从点云中检测3D对象是一项实用但充满挑战的任务，最近引起了越来越多的关注。在本文中，我们提出了针对3D对象检测的标签引导辅助训练方法（LG3D），该方法是增强现有3D对象检测器的功能学习的辅助网络。具体而言，我们提出了两个新型模块：一个标签 - 通道诱导器，该模块诱导器将框架中的注释和点云映射到特定于任务的表示形式和一个标签 - 知识式插曲器，该标签知识映射器有助于获得原始特征以获得检测临界表示。提出的辅助网络被推理丢弃，因此在测试时间没有额外的计算成本。我们对室内和室外数据集进行了广泛的实验，以验证我们的方法的有效性。例如，我们拟议的LG3D分别在SUN RGB-D和SCANNETV2数据集上将投票人员分别提高了2.5％和3.1％的地图。

translated by 谷歌翻译

Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation

Ye Zhu , Yu Wu , Kyle Olszewski , Jian Ren , Sergey Tulyakov , Yan Yan

分类：计算机视觉

2022-06-15

扩散概率模型（DPM）由于其有希望的结果和对跨模式合成的支持，已成为有条件产生的流行方法。条件合成中的一个关键逃亡者是在条件输入和生成的输出之间实现高对应。大多数现有方法通过将先验纳入变异下限中，隐含地学习了这种关系。在这项工作中，我们采用了另一条路线 - 我们通过使用对比度学习来最大化其共同信息来增强输入输出连接。为此，我们引入了有条件的离散对比扩散（CDCD）损失，并设计了两种对比扩散机制，以有效地将其纳入剥离过程中。我们通过将CDCD与传统的变分目标联系起来来制定CDCD。我们证明了我们的方法在三种多种多样的条件合成任务中的评估中的功效：舞蹈到音乐的生成，文本到图像综合和班级调节图像综合。在每个方面，我们达到最新的或更高的合成质量并提高输入输出对应关系。此外，提出的方法改善了扩散模型的收敛性，将所需扩散步骤的数量减少了两个基准的35％以上，从而大大提高了推理速度。

translated by 谷歌翻译