对图形卷积网络(GCN)的兴趣激增,已经产生了数千种GCN变体,每年引入数百种。相比之下,许多GCN模型仅重复使用少数基准数据集,因为人们的兴趣图(例如社交或商业网络)都是专有的。我们提出了一个新的图生成问题,以使源图分布之后,为GCN生成各种基准图(可能是专有的),具有三个要求:1)基准有效性作为GCN研究源图的替代品, 2)可扩展性处理大型现实图形,以及3)最终用户的隐私保证。借助新的图形编码方案,我们将大规模的图生成问题重新构架为中长长序列生成问题,并将变压器体系结构的强生成功率应用于图形域。跨大量图生成模型进行的广泛实验表明,我们的模型可以成功生成基准图,并具有实际的图形结构,节点属性和基准GCNS在节点分类任务上所需的节点标签。
translated by 谷歌翻译
从社会或商业平台等工业生态系统连续发出的数据通常表示为由多种节点/边缘类型组成的异质图(HG)。使用称为异质图神经网络(HGNN)的HGS的最先进的图形学习方法用于学习深层上下文信息形式表示。但是,来自工业应用程序的许多HG数据集都遭受节点类型之间的标签失衡。由于没有直接学习使用扎根于不同节点类型的标签的直接方法,因此HGNN仅应用于具有丰富标签的几个节点类型。我们为HGNN提出了一个称为知识转移网络(KTN)的零射击传输学习模块,该模块通过HG中给出的丰富关系信息将知识从标签的源节点类型转移到零标记的节点类型。 KTN源自我们在这项工作中引入的理论关系,在HGNN模型中给出的每个节点类型的不同特征提取器之间。 KTN将6种不同类型的HGNN模型的性能提高了960%,以推断零标记的节点类型,并且在HGS上的18个不同的转移学习任务中,最高的最先进的转移学习基线胜过最高的最高转移学习基线。
translated by 谷歌翻译
In this paper, we learn a diffusion model to generate 3D data on a scene-scale. Specifically, our model crafts a 3D scene consisting of multiple objects, while recent diffusion research has focused on a single object. To realize our goal, we represent a scene with discrete class labels, i.e., categorical distribution, to assign multiple objects into semantic categories. Thus, we extend discrete diffusion models to learn scene-scale categorical distributions. In addition, we validate that a latent diffusion model can reduce computation costs for training and deploying. To the best of our knowledge, our work is the first to apply discrete and latent diffusion for 3D categorical data on a scene-scale. We further propose to perform semantic scene completion (SSC) by learning a conditional distribution using our diffusion model, where the condition is a partial observation in a sparse point cloud. In experiments, we empirically show that our diffusion models not only generate reasonable scenes, but also perform the scene completion task better than a discriminative model. Our code and models are available at https://github.com/zoomin-lee/scene-scale-diffusion
translated by 谷歌翻译
The deep neural network (DNN) models for object detection using camera images are widely adopted in autonomous vehicles. However, DNN models are shown to be susceptible to adversarial image perturbations. In the existing methods of generating the adversarial image perturbations, optimizations take each incoming image frame as the decision variable to generate an image perturbation. Therefore, given a new image, the typically computationally-expensive optimization needs to start over as there is no learning between the independent optimizations. Very few approaches have been developed for attacking online image streams while considering the underlying physical dynamics of autonomous vehicles, their mission, and the environment. We propose a multi-level stochastic optimization framework that monitors an attacker's capability of generating the adversarial perturbations. Based on this capability level, a binary decision attack/not attack is introduced to enhance the effectiveness of the attacker. We evaluate our proposed multi-level image attack framework using simulations for vision-guided autonomous vehicles and actual tests with a small indoor drone in an office environment. The results show our method's capability to generate the image attack in real-time while monitoring when the attacker is proficient given state estimates.
translated by 谷歌翻译
Iris segmentation is the initial step to identify biometric of animals to establish a traceability system of livestock. In this study, we propose a novel deep learning framework for pixel-wise segmentation with minimum use of annotation labels using BovineAAEyes80 public dataset. In the experiment, U-Net with VGG16 backbone was selected as the best combination of encoder and decoder model, demonstrating a 99.50% accuracy and a 98.35% Dice coefficient score. Remarkably, the selected model accurately segmented corrupted images even without proper annotation data. This study contributes to the advancement of the iris segmentation and the development of a reliable DNNs training framework.
translated by 谷歌翻译
We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing. Using the probabilistic linear context-free rewriting system (LCFRS) formalism, our approach fixes the rule structure in advance and focuses on parameter learning with maximum likelihood. To reduce the computational complexity of both parsing and parameter estimation, we restrict the grammar formalism to LCFRS-2 (i.e., binary LCFRS with fan-out two) and further discard rules that require O(n^6) time to parse, reducing inference to O(n^5). We find that using a large number of nonterminals is beneficial and thus make use of tensor decomposition-based rank-space dynamic programming with an embedding-based parameterization of rule probabilities to scale up the number of nonterminals. Experiments on German and Dutch show that our approach is able to induce linguistically meaningful trees with continuous and discontinuous structures
translated by 谷歌翻译
This work proposes a framework developed to generalize Critical Heat Flux (CHF) detection classification models using an Unsupervised Image-to-Image (UI2I) translation model. The framework enables a typical classification model that was trained and tested on boiling images from domain A to predict boiling images coming from domain B that was never seen by the classification model. This is done by using the UI2I model to transform the domain B images to look like domain A images that the classification model is familiar with. Although CNN was used as the classification model and Fixed-Point GAN (FP-GAN) was used as the UI2I model, the framework is model agnostic. Meaning, that the framework can generalize any image classification model type, making it applicable to a variety of similar applications and not limited to the boiling crisis detection problem. It also means that the more the UI2I models advance, the better the performance of the framework.
translated by 谷歌翻译
Human organs constantly undergo anatomical changes due to a complex mix of short-term (e.g., heartbeat) and long-term (e.g., aging) factors. Evidently, prior knowledge of these factors will be beneficial when modeling their future state, i.e., via image generation. However, most of the medical image generation tasks only rely on the input from a single image, thus ignoring the sequential dependency even when longitudinal data is available. Sequence-aware deep generative models, where model input is a sequence of ordered and timestamped images, are still underexplored in the medical imaging domain that is featured by several unique challenges: 1) Sequences with various lengths; 2) Missing data or frame, and 3) High dimensionality. To this end, we propose a sequence-aware diffusion model (SADM) for the generation of longitudinal medical images. Recently, diffusion models have shown promising results on high-fidelity image generation. Our method extends this new technique by introducing a sequence-aware transformer as the conditional module in a diffusion model. The novel design enables learning longitudinal dependency even with missing data during training and allows autoregressive generation of a sequence of images during inference. Our extensive experiments on 3D longitudinal medical images demonstrate the effectiveness of SADM compared with baselines and alternative methods.
translated by 谷歌翻译
Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase the frequency of these least frequent senses (LFS) to reduce the distributional bias of senses during training. We propose Sense-Maintained Sentence Mixup (SMSMix), a novel word-level mixup method that maintains the sense of a target word. SMSMix smoothly blends two sentences using mask prediction while preserving the relevant span determined by saliency scores to maintain a specific word's sense. To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word. With extensive experiments, we validate that our augmentation method can effectively give more information about rare senses during training with maintained target sense label.
translated by 谷歌翻译
Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.
translated by 谷歌翻译