Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM.
translated by 谷歌翻译
在许多现实世界中的机器学习应用中,亚种群的转移存在着极大地存在,指的是包含相同亚种群组的培训和测试分布,但在亚种群频率中有所不同。重要性重新加权是通过对训练数据集中每个样本施加恒定或自适应抽样权重来处理亚种群转移问题的正常方法。但是,最近的一些研究已经认识到,这些方法中的大多数无法改善性能,而不是经验风险最小化,尤其是当应用于过度参数化的神经网络时。在这项工作中,我们提出了一个简单而实用的框架,称为“不确定性感知混合”(UMIX),以根据样品不确定性重新加权“混合”样品来减轻过度参数化模型中的过度拟合问题。基于训练 - 注射器的不确定性估计为每个样品的拟议UMIX配备,以灵活地表征亚群分布。我们还提供有见地的理论分析,以验证UMIX是否在先前的工作中实现了更好的概括界限。此外,我们在广泛的任务上进行了广泛的经验研究,以验证我们方法的有效性,既有定性和定量。
translated by 谷歌翻译
在过去的十年中,AI AID毒品发现(AIDD)的计算方法和数据集策划的繁荣发展。但是,现实世界中的药物数据集经常表现出高度不平衡的分布,这在很大程度上被当前的文献忽略了,但可能会严重损害机器学习应用程序的公平性和概括。在这一观察结果的激励下,我们介绍了Imdrug,这是一个全面的基准标准,其开源python库由4个不平衡设置,11个AI-Ready数据集,54个学习任务和16种为不平衡学习量身定制的基线算法。它为涵盖广泛的药物发现管道(例如分子建模,药物靶标相互作用和逆合合成)的问题和解决方案提供了可访问且可定制的测试床。我们通过新的评估指标进行广泛的实证研究,以证明现有算法在数据不平衡情况下无法解决药物和药物挑战。我们认为,Imdrug为未来的研究和发展开辟了途径,在AIDD和深度不平衡学习的交集中对现实世界中的挑战开辟了道路。
translated by 谷歌翻译
最近,使用随机梯度Langevin Dynamics(SGLD)的非凸实验性风险最小化范例的泛化界限已经过度研究。已经提出了几种理论框架来研究来自不同观点的这个问题,例如信息理论和稳定性。在本文中,我们从隐私泄漏分析中提出了一个统一的视图,以调查SGLD的泛化范围,以及以简洁的方式重新获得以前结果的理论框架。除了理论上的发现之外,我们进行各种数值研究,以统一地评估SGLD的信息泄漏问题。此外,我们的理论和经验结果提供了研究SGLD成员隐私的事先作品的解释。
translated by 谷歌翻译
随着AI民主化的进展,机器学习(ML)已成功应用于边缘应用,如智能手机和自动驾驶。如今,更多的应用需要在具有极其有限的资源的微小设备上ML,如植入式心脏除颤器(ICD),其称为Tinym1。与边缘上的ML不同,有限的能量供应的Tinyml对低功率执行的需求较高。随机计算(SC)对数据表示的比特流是有价值的,因为它可以使用简单的逻辑门来执行基本的ML操作,而不是复杂的二进制加法器和乘法器。然而,由于算术单元的低数据精度和不准确性,SC通常遭受ML任务的低精度。增加现有作品中的比特流的长度可以减轻精度问题,但延迟较高。在这项工作中,我们提出了一种新的SC架构,即基于块的随机计算(BSC)。 BSC将输入划分为块,使得通过利用高数据并行性可以减少延迟。此外,提出了优化的算术单元和输出修订(我们)方案以提高精度。在它之上,设计了全局优化方法来确定块的数量,可以提高延迟功率折衷。实验结果表明,BSC可以优于现有的设计,以实现ML任务的高度超过10%,并且减少超过6倍。
translated by 谷歌翻译
Message-passing neural networks (MPNNs) have been successfully applied to representation learning on graphs in a variety of real-world applications. However, two fundamental weaknesses of MPNNs' aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative graphs. Few studies have noticed the weaknesses from different perspectives. From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. The behind basic idea is the aggregation on a graph can benefit from a continuous space underlying the graph. The proposed aggregation scheme is permutation-invariant and consists of three modules, node embedding, structural neighborhood, and bi-level aggregation. We also present an implementation of the scheme in graph convolutional networks, termed Geom-GCN, to perform transductive learning on graphs. Experimental results show the proposed Geom-GCN achieved state-of-the-art performance on a wide range of open datasets of graphs.
translated by 谷歌翻译
Autonomous cars are indispensable when humans go further down the hands-free route. Although existing literature highlights that the acceptance of the autonomous car will increase if it drives in a human-like manner, sparse research offers the naturalistic experience from a passenger's seat perspective to examine the human likeness of current autonomous cars. The present study tested whether the AI driver could create a human-like ride experience for passengers based on 69 participants' feedback in a real-road scenario. We designed a ride experience-based version of the non-verbal Turing test for automated driving. Participants rode in autonomous cars (driven by either human or AI drivers) as a passenger and judged whether the driver was human or AI. The AI driver failed to pass our test because passengers detected the AI driver above chance. In contrast, when the human driver drove the car, the passengers' judgement was around chance. We further investigated how human passengers ascribe humanness in our test. Based on Lewin's field theory, we advanced a computational model combining signal detection theory with pre-trained language models to predict passengers' humanness rating behaviour. We employed affective transition between pre-study baseline emotions and corresponding post-stage emotions as the signal strength of our model. Results showed that the passengers' ascription of humanness would increase with the greater affective transition. Our study suggested an important role of affective transition in passengers' ascription of humanness, which might become a future direction for autonomous driving.
translated by 谷歌翻译
Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works often use hand-crafted features and predefined rule-based functions to detect headings and resolve the hierarchical relationship between headings. Both the benchmark and research based on deep learning are still limited. Accordingly, in this paper, we first introduce a standard dataset, HierDoc, including image samples from 650 documents of scientific papers with their content labels. Then we propose a novel end-to-end model by using the multimodal tree decoder (MTD) for ToC as a benchmark for HierDoc. The MTD model is mainly composed of three parts, namely encoder, classifier, and decoder. The encoder fuses the multimodality features of vision, text, and layout information for each entity of the document. Then the classifier recognizes and selects the heading entities. Next, to parse the hierarchical relationship between the heading entities, a tree-structured decoder is designed. To evaluate the performance, both the metric of tree-edit-distance similarity (TEDS) and F1-Measure are adopted. Finally, our MTD approach achieves an average TEDS of 87.2% and an average F1-Measure of 88.1% on the test set of HierDoc. The code and dataset will be released at: https://github.com/Pengfei-Hu/MTD.
translated by 谷歌翻译
Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT introduces the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed current state-of-the-art avatar creation methods when only a single image is available. Code will be public for reseach purpose at https://elicit3d.github.io .
translated by 谷歌翻译
In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tackle the viewpoint change, this work proposes to use a 3D dense pose estimation model and a texture mapping module to map the pedestrian images to canonical view images. Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images, and thus directly using them for ReID will inevitably result in poor performance. To handle this issue, we propose to fuse the original image and canonical view image via a transformer-based module. The key insight of this design is that the cross-attention mechanism in the transformer could be an ideal solution to align the discriminative texture clues from the original image with the canonical view image, which could compensate for the low-quality texture information of the canonical view image. Through extensive experiments, we show that our method can lead to superior performance over the existing approaches in various evaluation settings.
translated by 谷歌翻译