In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tackle the viewpoint change, this work proposes to use a 3D dense pose estimation model and a texture mapping module to map the pedestrian images to canonical view images. Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images, and thus directly using them for ReID will inevitably result in poor performance. To handle this issue, we propose to fuse the original image and canonical view image via a transformer-based module. The key insight of this design is that the cross-attention mechanism in the transformer could be an ideal solution to align the discriminative texture clues from the original image with the canonical view image, which could compensate for the low-quality texture information of the canonical view image. Through extensive experiments, we show that our method can lead to superior performance over the existing approaches in various evaluation settings.
translated by 谷歌翻译
Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps, accelerating inference by at least 10-fold compared to existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate the effectiveness of our approach on text-guided image editing and inpainting, where our distilled model is able to generate high-quality results using as few as 2-4 denoising steps.
translated by 谷歌翻译
The activity of the grid cell population in the medial entorhinal cortex (MEC) of the mammalian brain forms a vector representation of the self-position of the animal. Recurrent neural networks have been proposed to explain the properties of the grid cells by updating the neural activity vector based on the velocity input of the animal. In doing so, the grid cell system effectively performs path integration. In this paper, we investigate the algebraic, geometric, and topological properties of grid cells using recurrent network models. Algebraically, we study the Lie group and Lie algebra of the recurrent transformation as a representation of self-motion. Geometrically, we study the conformal isometry of the Lie group representation where the local displacement of the activity vector in the neural space is proportional to the local displacement of the agent in the 2D physical space. Topologically, the compact abelian Lie group representation automatically leads to the torus topology commonly assumed and observed in neuroscience. We then focus on a simple non-linear recurrent model that underlies the continuous attractor neural networks of grid cells. Our numerical experiments show that conformal isometry leads to hexagon periodic patterns in the grid cell responses and our model is capable of accurate path integration. Code is available at \url{https://github.com/DehongXu/grid-cell-rnn}.
translated by 谷歌翻译
潜在空间基于能量的模型(EBM),也称为基于能量的先验,引起了对生成建模的日益兴趣。由于其在潜在空间的配方和强大的建模能力方面的灵活性所推动,最近构建的作品已经进行了有趣的尝试,目的是针对文本建模的解释性。但是,潜在空间EBM还继承了数据空间中EBM的一些缺陷。实践中退化的MCMC抽样质量会导致培训中的发电质量和不稳定差,尤其是在具有复杂潜在结构的数据上。受到最近的努力的启发,该努力利用扩散恢复的可能性学习是解决抽样问题的一种方法,我们在变异学习框架中引入了扩散模型和潜在空间EBM之间的新型共生,这是潜在扩散能量基于能量的模型。我们与信息瓶颈共同开发基于几何聚类的正则化,以进一步提高学到的潜在空间的质量。对几个具有挑战性的任务进行的实验证明了我们模型在可解释的文本建模上的优越性能而不是强大的同行。
translated by 谷歌翻译
我们提出了一种变体的乘法器(ADMM)的交替方向方法,以解决受约束的轨迹优化问题。我们的ADMM框架将联合优化分为小子问题,导致低迭代成本和分散的参数更新。我们的方法继承了原始内部点法(P-IPM)的理论特性(P-IPM),即保证碰撞避免和同型保存,同时是速度更快的秩序。我们已经分析了收敛性,并评估了我们的时间最佳多UAV轨迹优化的方法,以及多个机器人臂的同时达到多个机器人臂的方法,在那里我们考虑运动学 - ,动态限制和同级保留碰撞约束。我们的方法突出显示加速10-100倍,同时产生可比品质的轨迹作为最先进的P-IPM求解器。
translated by 谷歌翻译
为了将训练有素的模型直接概括为看不见的目标域,域概括(DG)是一种新提出的学习范式,引起了很大的关注。以前的DG模型通常需要在训练过程中观察到的源域中的足够数量的带注释的样品。在本文中,我们放宽了有关完全注释的要求,并研究了半监督域的概括(SSDG),在训练过程中,只有一个源域与其他完全未标记的域一起完全注释。由于要解决观察到的源域之间的域间隙和预测看不见的目标域之间的挑战,我们提出了一个通过关节域吸引的标签和双分类器的新型深框架,以产生高质量的伪标记。具体来说,为了预测域移位下的准确伪标记,开发了一个域吸引的伪标记模块。此外,考虑到概括和伪标记之间的目标不一致:前者防止在所有源域上过度拟合,而后者可能过分适合未标记的源域,以高精度,我们采用双分类器来独立执行伪标记和域名,并在训练过程中执行伪造域通用化。 。当为未标记的源域生成准确的伪标记时,将域混合操作应用于标记和未标记域之间的新域,这对于提高模型的通用能力是有益的。公开可用的DG基准数据集的广泛结果显示了我们提出的SSDG方法的功效。
translated by 谷歌翻译
了解网格单元如何执行路径集成计算仍然是一个根本的问题。在本文中,我们对网格单元进行了对路径集成的一般表示模型的理论分析,其中2D自身位被编码为更高的尺寸向量,并且通过向量的一般转换表示2D自动。我们确定转型的两个条件。一个是路径集成所必需的组表示条件。另一个是一种各向同性的缩放条件,可确保局部共形地嵌入,使得向量表示中的误差符合在2D自身位置中的误差。然后,我们调查最简单的转换,即线性变换,将其显式代数和几何结构揭示为矩阵旋转,并探索各向同性缩放条件与特殊类六角网格图案之间的连接。最后,通过基于优化的方法,我们可以学习六边形网格模式,该网格图案在啮齿动物大脑中共享网格细胞的相似性质。学习模型能够准确地长距离路径集成。代码可在https://github.com/ruiqigao/grid-cell-path中获得。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译