可见的红外人员重新识别(VI-REID)是与可见和红外形态相同的个人匹配的任务。它的主要挑战在于由在不同光谱上运行的相机引起的模态差距。现有的VI-Reid方法主要集中于跨模式学习的一般特征,通常是以特征可区分性为代价。为了解决这个问题,我们提出了一个基于周期的新型网络,用于中性但歧视性特征学习,称为环形。具体而言,Cycletrans使用轻巧的知识捕获模块(KCM)根据伪查询从与模态相关的特征地图捕获丰富的语义。之后,根据模态 - 欧罗威兰原型将这些特征转换为中性特征,将差异建模模块(DMM)部署为中性。为了确保特征可区分性,进一步部署了另外两个KCMs以进行特征周期结构。通过自行车结构,我们的方法可以在保留其出色的语义的同时学习有效的中性特征。在SYSU-MM01和REGDB数据集上进行的广泛实验验证了环形验证的优点针对最先进的方法,在SYSU-MM01中排名1的 +4.57%,REGDB中排名1 +2.2%。
translated by 谷歌翻译
像素合成是图像生成的有前途的研究范式,可以很好地利用像素的先验知识来生成。但是,现有方法仍然遭受过多的内存足迹和计算开销。在本文中,我们提出了一个渐进的像素合成网络,用于有效的图像生成,以像素型构成。具体而言,PixelFolder将图像生成作为渐进的像素回归问题制定,并通过多阶段结构合成图像,这可以大大减少由大型张量转换引起的开销。此外,我们引入了新型的像素折叠操作,以进一步提高模型效率,同时保持像素的先验知识以进行端到端回归。通过这些创新的设计,我们大大减少了像素合成的支出,例如,与最新的像素合成方法CIPS相比,减少了89%的计算和53%的参数。为了验证我们的方法,我们在两个基准数据集(即FFHQ和LSUN教堂)上进行了广泛的实验。实验结果表明,PixelFolder的支出要少得多,在两个基准数据集上获得了新的最先进(SOTA)性能,即3.77 FID和2.45 FID在FFHQ和LSUN教堂上。比SOTA方法效率高,例如stylegan2,分别降低了约72%的计算和31%的参数。这些结果极大地验证了所提出的像素的有效性。
translated by 谷歌翻译
在本文中,我们提出了一个简单而通用的网络,该网络称为SEQTR,用于视觉接地任务,例如短语本地化,参考表达理解(REC)和分割(RES)。视觉接地的规范范例通常需要在设计网络体系结构和损失功能方面具有丰富的专业知识,从而使它们难以跨越跨任务进行推广。为了简化和统一建模,我们将视觉接地作为点预测问题在图像和文本输入上进行条件,其中边界框或二进制掩码表示为一系列离散坐标令牌。在此范式下,视觉接地任务是在我们的SEQTR网络中统一的,而没有特定于任务的分支或头部,例如RES的卷积蒙版解码器,这大大降低了多任务建模的复杂性。此外,SEQTR还具有简单的交叉渗透损失,共享所有任务的相同优化目标,从而进一步降低了部署手工制作的损失功能的复杂性。五个基准数据集的实验表明,所提出的SEQTR优于现有的最新技术(或与之相提并论),这证明了一种简单而通用的视觉接地方法确实是可行的。源代码可在https://github.com/sean-zhuh/seqtr上获得。
translated by 谷歌翻译
具有高质量注释的大规模培训数据对于训练语义和实例分割模型至关重要。不幸的是,像素的注释是劳动密集型且昂贵的,从而提高了对更有效的标签策略的需求。在这项工作中,我们提出了一种新颖的3D到2D标签传输方法,即Panoptic Nerf,该方法旨在从易于体现的粗3D边界原始基原始素中获取每个像素2D语义和实例标签。我们的方法利用NERF作为可区分的工具来统一从现有数据集中传输的粗3D注释和2D语义提示。我们证明,这种组合允许通过语义信息指导的几何形状,从而使跨多个视图的准确语义图渲染。此外,这种融合过程解决了粗3D注释的标签歧义,并过滤了2D预测中的噪声。通过推断3D空间并渲染到2D标签,我们的2D语义和实例标签是按设计一致的多视图。实验结果表明,在挑战Kitti-360数据集的挑战性城市场景方面,Pastic Nerf的表现优于现有标签传输方法。
translated by 谷歌翻译
在本文中,我们发现两个因素抑制POMS从实现高感感性质量:1)方向优化(COO)问题和2)模型的低频趋势。首先,POMS倾向于生成SR图像,其位置空间中的位置最接近所有潜在的高分辨率(HR)图像的分配中心,导致这种POMS失去高频细节。其次,图像的90美元\%$区域由低频信号组成;相比之下,人类感知依赖于图像的高频细节。然而,POMS应用相同的计算来处理不同频率区域,使POM倾向于恢复低频区域。基于这两个因素,我们提出了一种细节,通过组合高频增强模块和空间对比学习模块来降低COO问题的影响和低频趋势来提高对比损失(DECHROSTS)。实验结果表明,在若干常规SR模型上施加DROCKS时的效率和有效性。例如,在EDSR中,与基于GAN的方法相比,我们所提出的方法与视觉质量微妙降级的基于GAN的方法实现了3.60美元。此外,我们的最终结果表明,与最先进的方法相比,配备了我们的DECHROSS的SR网络更具现实和视觉上令人愉悦的纹理。 %拟议方法的源代码包含在补充材料中,并将在将来公开。
translated by 谷歌翻译
Neural Radiance Fields (NeRF) have demonstrated superior novel view synthesis performance but are slow at rendering. To speed up the volume rendering process, many acceleration methods have been proposed at the cost of large memory consumption. To push the frontier of the efficiency-memory trade-off, we explore a new perspective to accelerate NeRF rendering, leveraging a key fact that the viewpoint change is usually smooth and continuous in interactive viewpoint control. This allows us to leverage the information of preceding viewpoints to reduce the number of rendered pixels as well as the number of sampled points along the ray of the remaining pixels. In our pipeline, a low-resolution feature map is rendered first by volume rendering, then a lightweight 2D neural renderer is applied to generate the output image at target resolution leveraging the features of preceding and current frames. We show that the proposed method can achieve competitive rendering quality while reducing the rendering time with little memory overhead, enabling 30FPS at 1080P image resolution with a low memory footprint.
translated by 谷歌翻译
Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.
translated by 谷歌翻译
Generative models, as an important family of statistical modeling, target learning the observed data distribution via generating new instances. Along with the rise of neural networks, deep generative models, such as variational autoencoders (VAEs) and generative adversarial network (GANs), have made tremendous progress in 2D image synthesis. Recently, researchers switch their attentions from the 2D space to the 3D space considering that 3D data better aligns with our physical world and hence enjoys great potential in practice. However, unlike a 2D image, which owns an efficient representation (i.e., pixel grid) by nature, representing 3D data could face far more challenges. Concretely, we would expect an ideal 3D representation to be capable enough to model shapes and appearances in details, and to be highly efficient so as to model high-resolution data with fast speed and low memory cost. However, existing 3D representations, such as point clouds, meshes, and recent neural fields, usually fail to meet the above requirements simultaneously. In this survey, we make a thorough review of the development of 3D generation, including 3D shape generation and 3D-aware image synthesis, from the perspectives of both algorithms and more importantly representations. We hope that our discussion could help the community track the evolution of this field and further spark some innovative ideas to advance this challenging task.
translated by 谷歌翻译
Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely performed in populations. To automate the process, we validated an existing artificial intelligence (AI) system for 3D liver segmentation and used it to purpose a novel method: AI-ROI, which could automatically select the ROI for attenuation measurements. AI segmentation and AI-ROI method were evaluated on 1,014 non-contrast enhanced chest CT images from eight international datasets: LIDC-IDRI, NSCLC-Lung1, RIDER, VESSEL12, RICORD-1A, RICORD-1B, COVID-19-Italy, and COVID-19-China. AI segmentation achieved a mean dice coefficient of 0.957. Attenuations measured by AI-ROI showed no significant differences (p = 0.545) and a reduction of 71% time compared to expert measurements. The area under the curve (AUC) of the steatosis classification of AI-ROI is 0.921 (95% CI: 0.883 - 0.959). If performed as a routine screening method, our AI protocol could potentially allow early non-invasive, non-pharmacological preventative interventions for hepatic steatosis. 1,014 expert-annotated liver segmentations of patients with hepatic steatosis annotations can be downloaded here: https://drive.google.com/drive/folders/1-g_zJeAaZXYXGqL1OeF6pUjr6KB0igJX.
translated by 谷歌翻译
最先进的3D感知生成模型依赖于基于坐标的MLP来参数化3D辐射场。在证明令人印象深刻的结果的同时,请查询每个沿每个射线样品的MLP,都会导致渲染缓慢。因此,现有方法通常会呈现低分辨率特征图,并通过UPSMPLING网络处理以获取最终图像。尽管有效,神经渲染通常纠缠于观点和内容,从而改变摄像头会导致几何或外观的不必要变化。在基于体素的新型视图合成中的最新结果中,我们研究了本文中稀疏体素电网表示的快速和3D一致生成建模的实用性。我们的结果表明,当将稀疏体素电网与渐进式生长,自由空间修剪和适当的正则化结合时,单层MLP确实可以被3D卷积代替。为了获得场景的紧凑表示并允许缩放到更高的体素分辨率,我们的模型将前景对象(以3D模型)从背景(以2D模型建模)中。与现有方法相反,我们的方法仅需要单个正向通行证来生成完整的3D场景。因此,它允许从任意观点呈现有效渲染,同时以高视觉保真度产生3D一致的结果。
translated by 谷歌翻译