预测道路代理的未来行为是自动驾驶的关键任务。尽管现有模型在预测边际代理的未来行为方面取得了巨大的成功,但有效预测多种代理的一致的关节行为仍然是一个挑战。最近,提出了占用场的占用场表示,以通过占用网格和流量的结合来代表公路代理的联合未来状态,从而支持有效且一致的关节预测。在这项工作中,我们提出了一个新颖的占用流场预测因子,以产生准确的占用和流动预测,通过结合图像编码器的功能,该图像编码器从栅格化的流量图像中学习特征和矢量编码器,以捕获连续代理轨迹和地图状态的信息。在生成最终预测之前,这两个编码的功能由多个注意模块融合。我们的简单但有效的模型排在Waymo Open数据集占用和流预测挑战中,并在封闭的占用和流动预测任务中取得了最佳性能。
translated by 谷歌翻译
侯马联盟书是中国山西博物馆小镇博物馆的国家宝藏之一。它在研究古老的历史方面具有重要的历史意义。迄今为止,关于霍玛联盟书籍的研究一直留在纸质文件的识别中,这是无法识别和难以显示,学习和宣传的纸质文件。因此,霍玛联盟公认的古代角色的数字化可以有效提高识别古代角色并提供更可靠的技术支持和文本数据的效率。本文提出了一个新的Houma Alliance书籍的新数据库。在数据库中,从原始书籍收藏和人类的模仿写作中收集了297个班级和3,547个Houma Alliance古代手写字符样本。此外,决策级分类器融合策略用于融合三个众所周知的深神网络体系结构,以供古代手写角色识别。实验是在我们的新数据库上执行的。实验结果首先为研究界提供了新数据库的基线结果,然后证明了我们提出的方法的效率。
translated by 谷歌翻译
量子计算机是下一代设备,有望执行超出古典计算机范围的计算。实现这一目标的主要方法是通过量子机学习,尤其是量子生成学习。由于量子力学的固有概率性质,因此可以合理地假设量子生成学习模型(QGLM)可能会超过其经典对应物。因此,QGLM正在从量子物理和计算机科学社区中受到越来越多的关注,在这些QGLM中,可以在近期量子机上有效实施各种QGLM,并提出了潜在的计算优势。在本文中,我们从机器学习的角度回顾了QGLM的当前进度。特别是,我们解释了这些QGLM,涵盖了量子电路出生的机器,量子生成的对抗网络,量子玻尔兹曼机器和量子自动编码器,作为经典生成学习模型的量子扩展。在这种情况下,我们探讨了它们的内在关系及其根本差异。我们进一步总结了QGLM在常规机器学习任务和量子物理学中的潜在应用。最后,我们讨论了QGLM的挑战和进一步研究指示。
translated by 谷歌翻译
以前的纵向图像生成方法大致分为两类:2D GAN和3D感知的GAN。 2D GAN可以产生高保真肖像,但具有低视图一致性。 3D感知GaN方法可以维护查看一致性,但它们所生成的图像不是本地可编辑的。为了克服这些限制,我们提出了FENERF,一个可以生成查看一致和本地可编辑的纵向图像的3D感知生成器。我们的方法使用两个解耦潜码,以在具有共享几何体的空间对齐的3D卷中生成相应的面部语义和纹理。从这种底层3D表示中受益,FENERF可以联合渲染边界对齐的图像和语义掩码,并使用语义掩模通过GaN反转编辑3D音量。我们进一步示出了可以从广泛可用的单手套图像和语义面膜对中学习这种3D表示。此外,我们揭示了联合学习语义和纹理有助于产生更精细的几何形状。我们的实验表明FENERF在各种面部编辑任务中优于最先进的方法。
translated by 谷歌翻译
人类视力能够从整个场景中捕获部分整个分层信息。本文介绍了Visual解析器(VIP),它明确地构造了与变压器的等层次结构。 VIP将视觉表示分为两个级别,零件级别和整个级别。每个部分的信息代表整个内部的几个独立向量的组合。为了模拟两个级别的表示,我们首先通过注意机制将整体信息从整体编码为部分向量,然后将零件向量内的全局信息解码回到整个表示中。通过使用所提出的编码器 - 解码器交互迭代地解析两个级别,模型可以逐渐改进两个级别上的特征。实验结果表明,VIP可以在三个主要任务中实现非常竞争的性能。分类,检测和实例分割。特别是,它可以通过对象检测的大边缘超越先前的最先进的CNN主干。 VIP系列的小型型号为7.2美元,参数为$ 7.2 \ times $ 10.9 \ times $更少的拖鞋可以与最大的resnext-101-64 $ \ times $ 4d的resne(x)t家族相对表现。可视化结果还表明,学习部分对预测类具有高度信息,使VIP比以前的基本架构更可说明。代码可在https://github.com/kevin-ssy/vip上获得。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.
translated by 谷歌翻译
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
translated by 谷歌翻译
Over the past few years, developing a broad, universal, and general-purpose computer vision system has become a hot topic. A powerful universal system would be capable of solving diverse vision tasks simultaneously without being restricted to a specific problem or a specific data domain, which is of great importance in practical real-world computer vision applications. This study pushes the direction forward by concentrating on the million-scale multi-domain universal object detection problem. The problem is not trivial due to its complicated nature in terms of cross-dataset category label duplication, label conflicts, and the hierarchical taxonomy handling. Moreover, what is the resource-efficient way to utilize emerging large pre-trained vision models for million-scale cross-dataset object detection remains an open challenge. This paper tries to address these challenges by introducing our practices in label handling, hierarchy-aware loss design and resource-efficient model training with a pre-trained large model. Our method is ranked second in the object detection track of Robust Vision Challenge 2022 (RVC 2022). We hope our detailed study would serve as an alternative practice paradigm for similar problems in the community. The code is available at https://github.com/linfeng93/Large-UniDet.
translated by 谷歌翻译
Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well.
translated by 谷歌翻译