Body Mass Index (BMI), age, height and weight are important indicators of human health conditions, which can provide useful information for plenty of practical purposes, such as health care, monitoring and re-identification. Most existing methods of health indicator prediction mainly use front-view body or face images. These inputs are hard to be obtained in daily life and often lead to the lack of robustness for the models, considering their strict requirements on view and pose. In this paper, we propose to employ gait videos to predict health indicators, which are more prevalent in surveillance and home monitoring scenarios. However, the study of health indicator prediction from gait videos using deep learning was hindered due to the small amount of open-sourced data. To address this issue, we analyse the similarity and relationship between pose estimation and health indicator prediction tasks, and then propose a paradigm enabling deep learning for small health indicator datasets by pre-training on the pose estimation task. Furthermore, to better suit the health indicator prediction task, we bring forward Global-Local Aware aNd Centrosymmetric Encoder (GLANCE) module. It first extracts local and global features by progressive convolutions and then fuses multi-level features by a centrosymmetric double-path hourglass structure in two different ways. Experiments demonstrate that the proposed paradigm achieves state-of-the-art results for predicting health indicators on MoVi, and that the GLANCE module is also beneficial for pose estimation on 3DPW.
translated by 谷歌翻译
Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging. In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance, i.e., holistic pre-training on base classes, then partial fine-tuning in a balanced setting with all classes. Since unlabeled instances are suppressed as backgrounds in the base training phase, the learned RPN is prone to produce biased proposals for novel instances, resulting in dramatic performance degradation. Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the RoI head from evolving toward novel classes. In this paper, we introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head by recycling its localization ability endowed in base training and enriching high-quality positive samples for semantic fine-tuning. Specifically, we sample proposals based on the base proposal statistics to calibrate the distribution bias and impose additional localization and classification losses upon the sampled proposals for fast expanding the base detector to novel classes. Experiments on the commonly used Pascal VOC and MS COCO datasets with explicit state-of-the-art performances justify the efficacy of our PDC for FSOD. Code is available at github.com/Bohao-Lee/PDC.
translated by 谷歌翻译
Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted. Although tremendous efforts have been made to recall precise and complete locations for each class, existing methods still commonly suffer from the unsolicited Out-of-Candidate (OC) error predictions that not belongs to the label candidates, which could be avoidable since the contradiction with image-level class tags is easy to be detected. In this paper, we develop a group ranking-based Out-of-Candidate Rectification (OCR) mechanism in a plug-and-play fashion. Firstly, we adaptively split the semantic categories into In-Candidate (IC) and OC groups for each OC pixel according to their prior annotation correlation and posterior prediction correlation. Then, we derive a differentiable rectification loss to force OC pixels to shift to the IC group. Incorporating our OCR with seminal baselines (e.g., AffinityNet, SEAM, MCTformer), we can achieve remarkable performance gains on both Pascal VOC (+3.2%, +3.3%, +0.8% mIoU) and MS COCO (+1.0%, +1.3%, +0.5% mIoU) datasets with negligible extra training overhead, which justifies the effectiveness and generality of our OCR.
translated by 谷歌翻译
Headline generation is a task of generating an appropriate headline for a given article, which can be further used for machine-aided writing or enhancing the click-through ratio. Current works only use the article itself in the generation, but have not taken the writing style of headlines into consideration. In this paper, we propose a novel Seq2Seq model called CLH3G (Contrastive Learning enhanced Historical Headlines based Headline Generation) which can use the historical headlines of the articles that the author wrote in the past to improve the headline generation of current articles. By taking historical headlines into account, we can integrate the stylistic features of the author into our model, and generate a headline not only appropriate for the article, but also consistent with the author's style. In order to efficiently learn the stylistic features of the author, we further introduce a contrastive learning based auxiliary task for the encoder of our model. Besides, we propose two methods to use the learned stylistic features to guide both the pointer and the decoder during the generation. Experimental results show that historical headlines of the same user can improve the headline generation significantly, and both the contrastive learning module and the two style features fusion methods can further boost the performance.
translated by 谷歌翻译
建模长期依赖关系对于理解计算机视觉中的任务至关重要。尽管卷积神经网络(CNN)在许多视觉任务中都表现出色,但由于它们通常由当地核层组成,因此它们仍然限制捕获长期结构化关系。但是,完全连接的图(例如变形金刚中的自我发项操作)对这种建模是有益的,但是,其计算开销非常有用。在本文中,我们提出了一个动态图形消息传递网络,与建模完全连接的图形相比,该网络大大降低了计算复杂性。这是通过在图表中自适应采样节点(以输入为条件)来实现的,以传递消息传递。基于采样节点,我们动态预测节点依赖性滤波器权重和亲和力矩阵,以在它们之间传播信息。这种公式使我们能够设计一个自我发挥的模块,更重要的是,我们将基于变压器的新骨干网络用于图像分类预处理,并用于解决各种下游任务(对象检测,实例和语义细分)。使用此模型,我们在四个不同任务上的强,最先进的基线方面显示出显着改进。我们的方法还优于完全连接的图形,同时使用较少的浮点操作和参数。代码和型号将在https://github.com/fudan-zvg/dgmn2上公开提供。
translated by 谷歌翻译
人类认知具有组成。我们通过将场景分解为不同的概念(例如,对象的形状和位置)并学习这些概念的各个概念(例如,运动定律)或人造(例如,游戏的定律)来理解场景。 。这些定律的自动解析表明该模型能够理解场景的能力,这使得分析在许多视觉任务中起着核心作用。在本文中,我们提出了一个深层可变模型,用于解析(CLAP)。拍手通过编码编码架构来实现类似人类的组成能力,以表示现场的概念为潜在变量,并进一步采用特定于概念的随机功能,并在潜在空间中实例化,以捕获每个概念的法律。我们的实验结果表明,拍手优于比较多个视觉任务中的基线方法,包括直观的物理,抽象的视觉推理和场景表示。此外,拍手可以在场景中学习特定于概念的法律,而无需监督,并且可以通过修改相应的潜在随机功能来编辑法律,从而验证其可解释性和可操作性。
translated by 谷歌翻译
通用样式转移(UST)从任意参考图像中注入样式中的内容图像。现有的方法虽然享有许多实际的成功,但无法解释实验观察,包括UST算法的不同性能在保存内容图像的空间结构时。此外,方法仅限于对风格化的繁琐全局控制,因此它们需要其他空间掩码才能进行所需的风格化。在这项工作中,我们为UST的一般框架提供了系统的傅立叶分析。我们在频域中提出了框架的等效形式。该形式意味着现有算法平均处理特征图的所有频率组件和像素,但零频率组件除外。我们分别将傅立叶振幅和相位与革兰氏矩阵和样式转移的内容重建损失联系起来。因此,基于这种等效性和连接,我们可以解释具有傅立叶相的算法之间的不同结构保存行为​​。鉴于我们的解释,我们在实践中提出了两项​​操纵,以保存结构和所需的风格。定性和定量实验都证明了我们方法对最新方法的竞争性能。我们还进行实验以证明(1)上述等效性,(2)基于傅立叶幅度和相位的可解释性以及(3)与频率分量相关的可控性。
translated by 谷歌翻译
车道检测是许多实际自治系统的重要组成部分。尽管已经提出了各种各样的车道检测方法,但随着时间的推移报告了基准的稳定改善,但车道检测仍然是一个未解决的问题。这是因为大多数现有的车道检测方法要么将车道检测视为密集的预测或检测任务,因此很少有人考虑泳道标记的独特拓扑(Y形,叉形,几乎是水平的车道),该拓扑标记物是该标记的。导致亚最佳溶液。在本文中,我们提出了一种基于继电器链预测的新方法检测。具体而言,我们的模型预测了分割图以对前景和背景区域进行分类。对于前景区域中的每个像素点,我们穿过前向分支和后向分支以恢复整个车道。每个分支都会解码传输图和距离图,以产生移动到下一个点的方向,以及逐步预测继电器站的步骤(下一个点)。因此,我们的模型能够沿车道捕获关键点。尽管它很简单,但我们的策略使我们能够在包括Tusimple,Culane,Curvelanes和Llamas在内的四个主要基准上建立新的最先进。
translated by 谷歌翻译
虽然最近出现了类别级的9DOF对象姿势估计,但由于较大的对象形状和颜色等类别内差异,因此,先前基于对应的或直接回归方法的准确性均受到限制。 - 级别的物体姿势和尺寸炼油机Catre,能够迭代地增强点云的姿势估计以产生准确的结果。鉴于初始姿势估计,Catre通过对齐部分观察到的点云和先验的抽象形状来预测初始姿势和地面真理之间的相对转换。具体而言,我们提出了一种新颖的分离体系结构,以了解旋转与翻译/大小估计之间的固有区别。广泛的实验表明,我们的方法在REAL275,Camera25和LM基准测试中的最先进方法高达〜85.32Hz,并在类别级别跟踪上取得了竞争成果。我们进一步证明,Catre可以对看不见的类别进行姿势改进。可以使用代码和训练有素的型号。
translated by 谷歌翻译
一方(服务器)培训的检测模型可能会在分发给其他用户(客户)时面临严重的性能降解。例如,在自主驾驶场景中,不同的驾驶环境可能会带来明显的域移动,从而导致模型预测的偏见。近年来出现的联合学习可以使多方合作培训无需泄漏客户数据。在本文中,我们专注于特殊的跨域场景,其中服务器包含大规模数据,并且多个客户端仅包含少量数据。同时,客户之间的数据分布存在差异。在这种情况下,传统的联合学习技术不能考虑到所有参与者的全球知识和特定客户的个性化知识的学习。为了弥补这一限制,我们提出了一个跨域联合对象检测框架,名为FedOD。为了同时学习不同领域的全球知识和个性化知识,拟议的框架首先执行联合培训,以通过多教老师蒸馏获得公共全球汇总模型,并将汇总模型发送给每个客户端以供应其个性化的个性化模型本地模型。经过几轮沟通后,在每个客户端,我们可以对公共全球模型和个性化本地模型进行加权合奏推理。通过合奏,客户端模型的概括性能可以胜过具有相同参数量表的单个模型。我们建立了一个联合对象检测数据集,该数据集具有基于多个公共自主驾驶数据集的显着背景差异和实例差异,然后在数据集上进行大量实验。实验结果验证了所提出的方法的有效性。
translated by 谷歌翻译