最近出现了变异推断,成为大规模贝叶斯推理中古典马尔特·卡洛(MCMC)的流行替代品。变异推断的核心思想是贸易统计准确性以达到计算效率。它旨在近似后部,以降低计算成本,但可能损害其统计准确性。在这项工作中,我们通过推论模型选择中的案例研究研究了这种统计和计算权衡。侧重于具有对角和低级精度矩阵的高斯推论模型(又名变异近似族),我们在两个方面启动了对权衡的理论研究,贝叶斯后期推断误差和频繁的不确定性不确定定量误差。从贝叶斯后推理的角度来看,我们表征了相对于精确后部的变异后部的误差。我们证明,鉴于固定的计算预算,较低的推论模型会产生具有较高统计近似误差的变异后期,但计算误差较低。它减少了随机优化的方差,进而加速收敛。从频繁的不确定性定量角度来看,我们将变异后部的精度矩阵视为不确定性估计值。我们发现,相对于真实的渐近精度,变异近似遭受了来自数据的采样不确定性的附加统计误差。此外,随着计算预算的增加,这种统计误差成为主要因素。结果,对于小型数据集,推论模型不必全等级即可达到最佳估计误差。我们最终证明了在经验研究之间的这些统计和计算权衡推论,从而证实了理论发现。
translated by 谷歌翻译
科学和工程领域广泛使用计算机模拟。这些模拟通常以多个级别的复杂性运行,以平衡准确性和效率。多保真替代建模通过融合不同的仿真输出来降低计算成本。低保真模拟器产生的廉价数据可以与昂贵的高保真模拟器生成的有限高质量数据结合使用。基于高斯流程的现有方法依赖于内核函数的强烈假设,并且几乎不能扩展到高维设置。我们提出了多保真层次神经过程(MF-HNP),这是一种用于多效率替代模型的统一神经潜在变量模型。 MF-HNP继承了神经过程的灵活性和可扩展性。潜在变量将不同的保真度水平之间的相关性从观测到潜在空间。鉴于潜在状态,跨忠诚度之间的预测是有条件独立的。它有助于缓解现有方法中的错误传播问题。 MF-HNP足够灵活,可以在不同的保真度水平下处理非巢高维数据,并具有不同的输入和输出尺寸。我们评估了MF-HNP关于流行病学和气候建模任务的评估,从而在准确性和不确定性估计方面实现了竞争性能。与仅具有低维度(<10)任务的Deep Gaussian过程相反,我们的方法显示出巨大的希望,可以加速高维复杂模拟(用于流行病学建模的7000多个超过7000个,对于气候建模45000)。
translated by 谷歌翻译
我们提出了一种联邦平均Langevin算法(FA-LD),用于不确定量化和与分布式客户端的平均预测。特别是,我们概括了正常的后验分布,并考虑一般的模型。我们为FA-LD为具有非I.I.D数据的强烈凹入分布的理论保障,并研究了注入的噪声和随机梯度噪声如何,数据的异质性以及不同的学习率影响收敛性。这样的分析揭示了最佳选择的本地更新,以最大限度地减少通信成本。对于我们的方法很重要,即通信效率不会与Langevin算法中的注入噪声恶化。此外,我们在我们的FA-LD算法中审视了不同客户端使用的独立和相关噪声。我们遵守联邦和沟通成本之间的权衡也在那里。由于本地设备在联合网络中可能处于非活动状态,我们还基于仅可用的部分设备更新的不同平均方案来显示收敛结果。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.
translated by 谷歌翻译