面向任务导向的对话系统已经受到获得大规模和高质量的注释对话的困难困扰。此外,大多数公开的数据集仅包括书面对话,这不足以反映实际口头对话系统中的实际人类行为。在本文中,我们提出了面向任务的对话数据增强(TOD-DA),这是一种新型模型 - 不可知的数据增强范例,以提高面向任务对话建模的鲁棒性。 TOD-DA由两个模块组成:1)对话丰富,以扩展关于易于执行数据稀疏性的任务对话的培训数据,用于宽松数据稀疏性和2)口语对话模拟器,以模仿各种粒度的口语样式表达和语音识别错误,以弥合书面之间的差距和口头对话。通过这样的设计,我们的方法在DSTC10 Track2的两个任务中排名第一,这是针对口语对话的任务对话建模的基准,展示了我们提出的TOD-DA的优势和有效性。
translated by 谷歌翻译
受益于从特定情况(源)收集的相当大的像素级注释,训练有素的语义分段模型表现得非常好,但由于大域移位而导致的新情况(目标)失败。为了缓解域间隙,先前的跨域语义分段方法始终在域对齐期间始终假设源数据和目标数据的共存。但是,在实际方案中访问源数据可能会引发隐私问题并违反知识产权。为了解决这个问题,我们专注于一个有趣和具有挑战性的跨域语义分割任务,其中仅向目标域提供训练源模型。具体地,我们提出了一种称为ATP的统一框架,其包括三种方案,即特征对准,双向教学和信息传播。首先,我们设计了课程熵最小化目标,以通过提供的源模型隐式对准目标功能与看不见的源特征。其次,除了vanilla自我训练中的正伪标签外,我们是第一个向该领域引入负伪标签的,并开发双向自我训练策略,以增强目标域中的表示学习。最后,采用信息传播方案来通过伪半监督学习进一步降低目标域内的域内差异。综合与跨城市驾驶数据集的广泛结果验证\ TextBF {ATP}产生最先进的性能,即使是需要访问源数据的方法。
translated by 谷歌翻译
拟合网络模型到神经活动是神经科学的重要工具。一种流行的方法是利用概率经常性尖刺网络来模拟大脑区域,其参数最大化记录的活动的可能性。虽然这是广泛使用的,但我们表明所得模型不会产生现实的神经活动。要纠正此功能,我们建议使用测量模拟和录制活动之间的异化的术语来增加日志可能性。这种不相似性通过神经科学常用的概要统计来定义,并且优化是有效的,因为它依赖于通过随机模拟的尖峰列车的背部传播。理论上我们分析了这种方法,并经验展示它产生更现实的活动统计数据。我们发现它可以改善其他拟合算法,用于尖刺网络模型,如GLM(广义线性模型),通常不依赖于反向传播。这种新的拟合算法还使得能够考虑难以苛刻的隐藏神经元,并且我们表明在尝试从尖峰录制中推断网络连接时可能是至关重要的。
translated by 谷歌翻译
由于其稀疏和细长的性质,估算3D空间中准确的车道线仍然具有挑战性。在这项工作中,我们提出了M^2-3dlanenet,这是一个有效3D车道检测的多模式框架。旨在集成来自多传感器的互补信息,M^2-3dlanenet首先将多模式特征提取具有模态特异性骨架,然后将它们融合在统一的鸟眼视图(BEV)空间中。具体而言,我们的方法由两个核心组成部分组成。 1)要获得准确的2D-3D映射,我们提出了自上而下的BEV生成。其中,使用线条限制的变形(LRDA)模块可用于以自上而下的方式有效地增强图像特征,从而充分捕获车道的细长特征。之后,它使用深度感知的举重将2D锥体特征投入到3D空间中,并通过枕形生成BEV特征。 2)我们进一步提出了自下而上的BEV融合,该融合通过多尺度的级联注意力汇总了多模式特征,从而集成了来自摄像头和激光雷达传感器的互补信息。足够的实验证明了M^2-3dlanenet的有效性,该实验的有效性超过了先前的最先进方法,即在OpenLane数据集上提高了12.1%的F1-SCORE改善。
translated by 谷歌翻译
图像目标导航是一项具有挑战性的任务,因为它要求代理必须导航到以前看不见的场景中图像指示的目标。当前方法介绍了各种存储机制,这些记忆机制可以保存导航历史记录以解决此任务。但是,这些方法使用内存中的所有观察值来生成导航操作,而无需考虑该内存的哪一部分是有益的。为了解决这一限制,我们提出了Memonav,这是一种用于图像目标导航的新型内存机制,该机制保留了代理商的短期记忆和长期记忆,以改善多进球任务上的导航性能。代理拓扑图上的节点功能存储在短期内存中,因为这些功能已动态更新。为了帮助短期记忆,我们还通过通过图形注意模块连续汇总短期内存来生成长期记忆。 MEMONAV通过基于变压器解码器的遗忘模块保留短期内存的信息部分,然后将此保留的短期内存和长期内存结合到工作内存中。最后,代理使用工作内存进行动作生成。我们在新的多进球导航数据集上评估了我们的模型。实验结果表明,MEMONAV的表现优于SOTA方法,而导航历史悠久的比例较小。从经验上看,结果还表明,我们的模型不太可能被困在僵局中,这进一步验证了Memonav通过减少冗余步骤来提高代理商的导航效率。
translated by 谷歌翻译
基于嵌入的神经主题模型可以通过将它们嵌入均匀的特征空间来明确表示单词和主题,从而显示出更高的解释性。但是,嵌入训练没有明确的限制,从而导致更大的优化空间。此外,仍然缺乏对嵌入的变化以及对模型性能的影响的清晰描述。在本文中,我们提出了一个嵌入式化的神经主题模型,该模型应用于单词嵌入和主题嵌入的特殊设计的训练约束,以减少参数的优化空间。为了揭示嵌入的变化和角色,我们将\ textbf {均匀性}引入基于嵌入的神经主题模型中,作为嵌入空间的评估度量。在此基础上,我们描述了嵌入在训练过程中如何通过嵌入均匀性的变化而变化。此外,我们通过消融研究证明了基于嵌入的神经主题模型中嵌入的变化的影响。在两个主流数据集上实验的结果表明,我们的模型在主题质量和文档建模之间的和谐方面显着优于基线模型。这项工作是利用统一性来探索基于嵌入的神经主题模型嵌入的变化及其对模型性能的影响,从而获得了我们的最佳知识。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译