Recent years have witnessed the tremendous progress of 3D GANs for generating view-consistent radiance fields with photo-realism. Yet, high-quality generation of human radiance fields remains challenging, partially due to the limited human-related priors adopted in existing methods. We present HumanGen, a novel 3D human generation scheme with detailed geometry and $\text{360}^{\circ}$ realistic free-view rendering. It explicitly marries the 3D human generation with various priors from the 2D generator and 3D reconstructor of humans through the design of "anchor image". We introduce a hybrid feature representation using the anchor image to bridge the latent space of HumanGen with the existing 2D generator. We then adopt a pronged design to disentangle the generation of geometry and appearance. With the aid of the anchor image, we adapt a 3D reconstructor for fine-grained details synthesis and propose a two-stage blending scheme to boost appearance generation. Extensive experiments demonstrate our effectiveness for state-of-the-art 3D human generation regarding geometry details, texture quality, and free-view performance. Notably, HumanGen can also incorporate various off-the-shelf 2D latent editing methods, seamlessly lifting them into 3D.
translated by 谷歌翻译
在“知识图”(kgs)的表示领域中,超级关系的事实由主要三重和几个辅助属性描述组成,这被认为比基于三重的事实更全面,更具体。但是,由于代表实体之间的隶属关系的层次结构削弱,因此,单个视图中现有的超相关KG嵌入方法受到限制。为了打破这一限制,我们提出了一个双视性超相关kg(DH-kg)结构,该结构包含实体的超相关实例视图,以及对从实体到共同模型超相关的概念的超相关本体论视图和分层信息。在本文中,我们首先定义了DH-KG上的链接预测和实体键入任务,并根据医疗数据构建了两个DH-KG数据集,即从Wikidata和HTDM中提取的JW44K-6K。此外,我们根据Gran编码器,HGNN和联合学习提出了DH-KG嵌入模型DHGE。实验结果表明,DHGE在DH-KG上的表现优于基线模型。我们还提供了该技术在高血压药物领域中应用的示例。我们的模型和数据集公开可用。
translated by 谷歌翻译
当前的最佳性能模型用于知识图推理(KGR)将几何学对象或概率分布引入嵌入实体,并将一阶逻辑(fol)查询引入低维矢量空间。它们可以总结为中心尺寸框架(点/框/锥,β/高斯分布等)。但是,它们具有有限的逻辑推理能力。而且很难概括到各种功能,因为中心和大小是一对一的约束,无法具有多个中心或尺寸。为了应对这些挑战,我们相反提出了一个名为“特征逻辑嵌入框架Flex”的新颖的KGR框架,这是第一个KGR框架,它不仅可以真正处理所有运营,包括连词,析取,否定,否定等等,而且还支持各种操作特征空间。具体而言,特征逻辑框架的逻辑部分是基于向量逻辑的,它自然地对所有FOL操作进行了建模。实验表明,FLEX在基准数据集上明显优于现有的最新方法。
translated by 谷歌翻译
已经证明,深度神经网络(DNN)在解决许多现实问题方面是有效的,但其高计算成本禁止将这些模型部署到边缘设备。修剪,作为将零的方法引入模型重量的方法,已显示是在模型精度和计算效率之间提供良好权衡的有效方法,并且是一种生成压缩模型的广泛使用的方法。然而,修剪的粒度使得重要的权衡。在相同的稀疏性水平上,粗粒结构的稀疏图案在传统硬件上更有效,但导致更差的精度,而细粒度的非结构化稀疏模式可以实现更好的精度,但在现有硬件上效率低下。另一方面,一些现代处理器配备了快速的片上刻痕存储器和聚集/散射引擎,用于在这种存储器上执行间接负载和存储操作。在这项工作中,我们提出了一系列新颖的稀疏模式,命名为聚光散射(GS)模式,以利用Scratchpad存储器和收集/散射引擎来加速神经网络推论。相应地,我们呈现了一种紧凑的稀疏格式。提出的稀疏模式,以及一种新颖的修剪方法,解决了负载不平衡问题,并导致质量接近非结构化稀疏模型的型号,以及靠近结构化稀疏型号的计算效率。我们的实验表明,与传统结构稀疏模式相比,GS模式在精度和计算效率之间始终如一地进行折衷。 GS模式可以以相同的精度级别将DNN组件的运行时间减少两到三次。这是在三个不同的深度学习任务和流行模型中确认,即机器翻译的GNMT,用于图像识别的Reset50,以及用于声学语音识别的Japser。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译