Visual localization plays an important role for intelligent robots and autonomous driving, especially when the accuracy of GNSS is unreliable. Recently, camera localization in LiDAR maps has attracted more and more attention for its low cost and potential robustness to illumination and weather changes. However, the commonly used pinhole camera has a narrow Field-of-View, thus leading to limited information compared with the omni-directional LiDAR data. To overcome this limitation, we focus on correlating the information of 360 equirectangular images to point clouds, proposing an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space. Inspired by the attention mechanism, we optimize the network to capture the salient feature for comparing images and point clouds. We construct several sequences containing 360 equirectangular images and corresponding point clouds based on the KITTI-360 dataset and conduct extensive experiments. The results demonstrate the effectiveness of our approach.
translated by 谷歌翻译
传统的多播路由方法在构建多播树时存在一些问题,例如对网络状态信息的访问有限,对网络的动态和复杂变化的适应性不佳以及不灵活的数据转发。为了解决这些缺陷,软件定义网络(SDN)中的最佳多播路由问题是根据多目标优化问题量身定制的,以及基于深Q网络(DQN)深度强化学习(DQN)的智能多播路由算法DRL-M4MR( DRL)方法旨在构建SDN中的多播树。首先,通过组合SDN的全局视图和控制,将多播树状态矩阵,链路带宽矩阵,链路延迟矩阵和链路延迟损耗矩阵设计为DRL代理的状态空间。其次,代理的动作空间是网络中的所有链接,而动作选择策略旨在将链接添加到四种情况下的当前多播树。第三,单步和最终奖励功能表格旨在指导智能以做出决定以构建最佳多播树。实验结果表明,与现有算法相比,DRL-M4MR的多播树结构可以在训练后获得更好的带宽,延迟和数据包损耗率,并且可以在动态网络环境中做出更智能的多播路由决策。
translated by 谷歌翻译
Jacobian and Hessian regularization aim to reduce the magnitude of the first and second-order partial derivatives with respect to neural network inputs, and they are predominantly used to ensure the adversarial robustness of image classifiers. In this work, we generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products. The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices. On the other hand, the major challenge for Jacobian and Hessian regularization has been high computational complexity. We introduce Lanczos-based spectral norm minimization to tackle this difficulty. This technique uses a parallelized implementation of the Lanczos algorithm and is capable of effective and stable regularization of large Jacobian and Hessian matrices. Theoretical justifications and empirical evidence are provided for the proposed paradigm and technique. We carry out exploratory experiments to validate the effectiveness of our novel regularization terms. We also conduct comparative experiments to evaluate Lanczos-based spectral norm minimization against prior methods. Results show that the proposed methodologies are advantageous for a wide range of tasks.
translated by 谷歌翻译
Information Extraction (IE) aims to extract structured information from heterogeneous sources. IE from natural language texts include sub-tasks such as Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE). Most IE systems require comprehensive understandings of sentence structure, implied semantics, and domain knowledge to perform well; thus, IE tasks always need adequate external resources and annotations. However, it takes time and effort to obtain more human annotations. Low-Resource Information Extraction (LRIE) strives to use unsupervised data, reducing the required resources and human annotation. In practice, existing systems either utilize self-training schemes to generate pseudo labels that will cause the gradual drift problem, or leverage consistency regularization methods which inevitably possess confirmation bias. To alleviate confirmation bias due to the lack of feedback loops in existing LRIE learning paradigms, we develop a Gradient Imitation Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data, which can force pseudo-labeled data to achieve better optimization capabilities similar to labeled data. Based on how well the pseudo-labeled data imitates the instructive gradient descent direction obtained from labeled data, we design a reward to quantify the imitation process and bootstrap the optimization capability of pseudo-labeled data through trial and error. In addition to learning paradigms, GIRL is not limited to specific sub-tasks, and we leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings (semi-supervised IE and few-shot IE).
translated by 谷歌翻译
实体链接(EL)是将实体提及在文本中及其相应实体中出现在知识库中的过程。通常基于Wikipedia估算实体的EL特征(例如,先前的概率,相关性评分和实体嵌入)。但是,对于刚刚在新闻中发现的新兴实体(EES)而言,它们可能仍未包含在Wikipedia中。结果,它无法获得Wikipedia和EL模型的EES所需的EL功能,将始终无法将歧义提及与这些EES正确链接,因为它没有其EL功能。为了解决这个问题,在本文中,我们专注于以一般方式为新兴实体学习EL功能的新任务。我们提出了一种名为Stamo的新颖方法,可以自动学习EES的高质量EL功能,该功能仅需要从网络中收集的每个EE的少数标记文档,因为它可以进一步利用隐藏在未标记的数据中的知识。 Stamo主要基于自我训练,这使其与任何EL功能或EL模型都灵活地集成在一起,但也使其很容易遭受由错误标签的数据引起的错误加强问题。我们认为自我训练是相对于EES的EL特征,而不是一些试图将错误标签的数据抛弃的常见自我训练策略,而是提出了内部插槽和斜率优化的多重优化过程,以减轻误差加强问题隐含。我们构建了涉及选定的EE的两个EL数据集,以评估EES获得的EL特征的质量,实验结果表明,我们的方法显着优于其他学习EL特征的基线方法。
translated by 谷歌翻译
权力是一个不可避免的尚未识别的协作元素。权力动力学影响科学合作的各个方面。团队电力动力学可以通过团队功率级和团队电力层次结构来衡量。团队功率水平概念化为拥有资源,专业知识或团队决策权的平均水平。团队权力层次结构代表了团队中资源财产的垂直差异。在科学科学中,很少有研究从团队权力动力学的角度看过科学合作。本研究探讨了团队权力动力学如何影响团队的影响,以填补研究差距。在这项研究中,一个出版物的所有共同作者被视为一个团队。一支队伍的团队电力水平和团队电力层次由本团队共同作者的职业年龄的平均值和基尼指数来衡量。团队影响由这支球队撰写的文件的引用量化。通过分析来自科学(例如计算机科学,物理学),社会科学(例如,社会学,图书馆和信息科学)和艺术和人文学科(例如,艺术)的770万队,我们发现平坦的团队结构与更高相关团队影响。当团队功率水平增加时,带有低团队电力层次的团队比高队电力层次结构的队伍更多。这些调查结果已经在所有五个学科中重复,除了艺术之外的所有五个学科,以及来自计算机科学的各种类型的团队,包括来自工业或学术界的团队,不同的性别团队的团队,具有地理对比的团队,以及具有独特统一的团队。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译