Strong lensing in galaxy clusters probes properties of dense cores of dark matter halos in mass, studies the distant universe at flux levels and spatial resolutions otherwise unavailable, and constrains cosmological models independently. The next-generation large scale sky imaging surveys are expected to discover thousands of cluster-scale strong lenses, which would lead to unprecedented opportunities for applying cluster-scale strong lenses to solve astrophysical and cosmological problems. However, the large dataset challenges astronomers to identify and extract strong lensing signals, particularly strongly lensed arcs, because of their complexity and variety. Hence, we propose a framework to detect cluster-scale strongly lensed arcs, which contains a transformer-based detection algorithm and an image simulation algorithm. We embed prior information of strongly lensed arcs at cluster-scale into the training data through simulation and then train the detection algorithm with simulated images. We use the trained transformer to detect strongly lensed arcs from simulated and real data. Results show that our approach could achieve 99.63 % accuracy rate, 90.32 % recall rate, 85.37 % precision rate and 0.23 % false positive rate in detection of strongly lensed arcs from simulated images and could detect almost all strongly lensed arcs in real observation images. Besides, with an interpretation method, we have shown that our method could identify important information embedded in simulated data. Next step, to test the reliability and usability of our approach, we will apply it to available observations (e.g., DESI Legacy Imaging Surveys) and simulated data of upcoming large-scale sky surveys, such as the Euclid and the CSST.
translated by 谷歌翻译
In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tackle the viewpoint change, this work proposes to use a 3D dense pose estimation model and a texture mapping module to map the pedestrian images to canonical view images. Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images, and thus directly using them for ReID will inevitably result in poor performance. To handle this issue, we propose to fuse the original image and canonical view image via a transformer-based module. The key insight of this design is that the cross-attention mechanism in the transformer could be an ideal solution to align the discriminative texture clues from the original image with the canonical view image, which could compensate for the low-quality texture information of the canonical view image. Through extensive experiments, we show that our method can lead to superior performance over the existing approaches in various evaluation settings.
translated by 谷歌翻译
Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps, accelerating inference by at least 10-fold compared to existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate the effectiveness of our approach on text-guided image editing and inpainting, where our distilled model is able to generate high-quality results using as few as 2-4 denoising steps.
translated by 谷歌翻译
The activity of the grid cell population in the medial entorhinal cortex (MEC) of the mammalian brain forms a vector representation of the self-position of the animal. Recurrent neural networks have been proposed to explain the properties of the grid cells by updating the neural activity vector based on the velocity input of the animal. In doing so, the grid cell system effectively performs path integration. In this paper, we investigate the algebraic, geometric, and topological properties of grid cells using recurrent network models. Algebraically, we study the Lie group and Lie algebra of the recurrent transformation as a representation of self-motion. Geometrically, we study the conformal isometry of the Lie group representation where the local displacement of the activity vector in the neural space is proportional to the local displacement of the agent in the 2D physical space. Topologically, the compact abelian Lie group representation automatically leads to the torus topology commonly assumed and observed in neuroscience. We then focus on a simple non-linear recurrent model that underlies the continuous attractor neural networks of grid cells. Our numerical experiments show that conformal isometry leads to hexagon periodic patterns in the grid cell responses and our model is capable of accurate path integration. Code is available at \url{https://github.com/DehongXu/grid-cell-rnn}.
translated by 谷歌翻译
立体类像素细分旨在通过左右视图将离散的像素分组为感知区域,以更加协作和高效地分组。现有的Superpixel分割算法主要利用颜色和空间特征作为输入,这可能会对空间信息施加强大的约束,同时利用立体声图像对的差异信息。为了减轻此问题,我们提出了一种立体声超级像素细分方法,并在本工作中具有空间信息的脱钩机制。为了解除立体视差信息和空间信息,在融合立体声图像对的特征之前,暂时删除空间信息,并提出了脱钩的立体声融合模块(DSFM),以处理立体声的特征特征特征对齐和遮挡问题。此外,由于空间信息对于超像素分割至关重要,因此我们进一步设计一个动态空间嵌入模块(DSEM)以重新添加空间信息,并且将通过DSEM中的DSEM进行自适应调整空间信息的权重(DF)用于实现更好的细分。全面的实验结果表明,我们的方法可以在KITTI2015和CityScapes数据集上实现最新性能,并且还可以在NJU2K数据集上的显着对象检测中验证效率。源代码将在接受纸张后公开提供。
translated by 谷歌翻译
在本文中,我们提出了一种基于排名的水下图像质量评估(UIQA)方法,该方法缩写为Uranker。乌兰克(Uranker)建立在高效的注意力图像变压器上。在水下图像方面,我们特别设计(1)直方图嵌入了水下图像作为直方图表的颜色分布以参加全局降解,以及(2)与模型局部降解的动态跨尺度对应关系。最终预测取决于不同量表的类代币,该标记是全面考虑多尺度依赖性的。随着保证金排名损失,我们的乌员可以根据其视觉质量通过不同的水下图像增强(UIE)算法来准确对同一场景的水下图像的顺序进行排名。为此,我们还贡献了一个数据集,即Urankerset,其中包含不同的UIE算法和相应的感知排名增强的足够结果,以训练我们的uranker。除了Uranker的良好表现外,我们发现一个简单的U-Shape UIE网络与我们的预训练的Uranker相结合时可以获得有希望的性能。此外,我们还提出了一个标准化尾巴,可以显着提高UIE网络的性能。广泛的实验证明了我们方法的最新性能。讨论了我们方法的关键设计。我们将发布我们的数据集和代码。
translated by 谷歌翻译
本文考虑了使用嵌入式设备来获取和分类图像的设置。由于计算能力有限,嵌入式设备依赖于具有不平衡精度的简约分类模型。当认为本地分类不准确时,设备可以决定使用更准确但资源密集型的模型将图像卸载到边缘服务器。但是,资源限制(例如,网络带宽)需要调节这种传输,以避免交通拥堵和高延迟。当传输调节是通过令牌桶时,该论文调查了此卸载问题,该机制通常用于此类目的。目的是设计一种轻巧的在线卸载策略,该策略在令牌存储桶的限制下优化了特定于应用程序的指标(例如,分类精度)。该论文制定了基于深Q网络(DQN)的政策,并证明了其功效和在嵌入式设备上部署的可行性。值得注意的是,该策略可以处理复杂的输入模式,包括图像到达中的相关性和分类精度。评估是通过使用来自Imagenet图像分类基准生成的合成痕迹对局部测试床进行图像分类进行的。这项工作的实施可在https://github.com/qiujiaming315/edgeml-dqn上获得。
translated by 谷歌翻译
我们提出了神经可变形场(NDF),这是一种从多视频视频中进行动态人类数字化的新表示形式。最近的作品提出,代表具有共同的规范神经辐射场的动态人体,该范围与变形场估计相结合了观察空间。但是,学到的规范表示是静态的,变形场的当前设计无法表示大型运动或详细的几何变化。在本文中,我们建议学习一个围绕合适的参数体模型包裹的神经可变形场,以代表动态人体。NDF通过基础参考表面在空间上对齐。然后,学会了神经网络将其映射到NDF的动力学。提出的NDF表示可以通过新颖的观点和新颖的姿势合成数字化的表演者,并具有详细且合理的动态外观。实验表明,我们的方法明显优于最近的人类合成方法。
translated by 谷歌翻译
在观察性研究中,经常遇到有关存在或缺乏因果边缘和路径的因果背景知识。由于背景知识而导致的马尔可夫等效dag的子类共享的指向边缘和链接可以由因果关系最大部分定向的无循环图(MPDAG)表示。在本文中,我们首先提供了因果MPDAG的声音和完整的图形表征,并提供了因果MPDAG的最小表示。然后,我们介绍了一种名为Direct Causal子句(DCC)的新颖表示,以统一形式表示所有类型的因果背景知识。使用DCC,我们研究因果背景知识的一致性和等效性,并表明任何因果背景知识集都可以等效地分解为因果MPDAG,以及最小的残留DCC。还提供了多项式时间算法,以检查一致性,等效性并找到分解的MPDAG和残留DCC。最后,有了因果背景知识,我们证明了一个足够且必要的条件来识别因果关系,并且出人意料地发现因果效应的可识别性仅取决于分解的MPDAG。我们还开发了局部IDA型算法,以估计无法识别效应的可能值。模拟表明因果背景知识可以显着提高因果影响的识别性。
translated by 谷歌翻译
本文提出了一种新颖的邻居搜索算法,可实现TPU(Google Tensor处理单元)的峰值性能,超过了最先进的GPU算法,其召回水平相似。所提出的算法的设计是由准确的加速器性能模型的动机,该模型同时考虑了内存和指令瓶颈。我们的算法具有预期召回的分析保证,并且不需要维护复杂的索引数据结构或调整,因此它适用于经常更新的应用程序。我们的工作可在TPU上的Jax和Tensorflow的开源软件包中获得。
translated by 谷歌翻译