Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.
translated by 谷歌翻译
Visual odometry is crucial for many robotic tasks such as autonomous exploration and path planning. Despite many progresses, existing methods are still not robust enough to dynamic illumination environments. In this paper, we present AirVO, an illumination-robust and accurate stereo visual odometry system based on point and line features. To be robust to illumination variation, we introduce the learning-based feature extraction and matching method and design a novel VO pipeline, including feature tracking, triangulation, key-frame selection, and graph optimization etc. We also employ long line features in the environment to improve the accuracy of the system. Different from the traditional line processing pipelines in visual odometry systems, we propose an illumination-robust line tracking method, where point feature tracking and distribution of point and line features are utilized to match lines. In the experiments, the proposed system is extensively evaluated in environments with dynamic illumination and the results show that it achieves superior performance to the state-of-the-art algorithms.
translated by 谷歌翻译
与人类的视野相比,基于卷积神经网络(CNN)的计算机视觉更容易受到对抗性的噪音。这种差异可能归因于眼睛如何样本视觉输入以及大脑如何通过其背侧和腹侧视觉途径处理视网膜样品,这些途径尚未探索计算机视觉。受到大脑的启发,我们设计了复发性神经网络,包括模拟人类视网膜的输入采样器,它是一个指导下一步位置的背面网络,以及代表视网膜样品的腹网络。组合这些模块,这些模型学会了多一眼图像,每一眼就注意一个明显的部分,并随着时间的推移积累表示形式以识别图像。我们测试了此类模型的稳健性,并在不同水平的对抗噪声上测试,特别关注不同输入采样策略的效果。我们的发现表明,视网膜凹和采样使模型更加可靠,并且在给予更长的时间以更多地看一眼图像时,该模型可能会从攻击中纠正自身。总之,强大的视觉识别可以从三种受脑启发的机制的综合使用中受益:视网膜转化,注意力引导的眼动运动和经常性处理,而不是仅喂食的CNN。
translated by 谷歌翻译
基于聚类的方法,在伪标签的产生和特征提取网络的优化之间交替,在无监督学习(USL)和无监督的域自适应(UDA)人重新识别(RE-ID)中起着主要作用。为了减轻嘈杂的伪标签的不利影响,现有方法要么放弃不可靠的标签,要么通过相互学习或标签传播来完善伪标签。但是,仍然积累了许多错误的标签,因为这些方法主要采用传统的无监督聚类算法,这些算法依赖于对数据分布的某些假设,并且无法捕获复杂的现实世界数据的分布。在本文中,我们提出了基于插件的伪标签校正网络(GLC),以以监督聚类的方式完善伪标签。训练GLC可以通过任何聚类方法生成的初始伪标签的监督来感知自训练的每个时期的不同数据分布。它可以学会通过K最近的邻居(KNN)图和早期训练策略的样本之间的关系约束来纠正初始嘈杂标签。具体而言,GLC学会从邻居汇总节点特征,并预测是否应在图上链接节点。此外,在对嘈杂的标签进行严重记忆以防止过度拟合嘈杂的伪标签之前,GLC已通过“早停”进行了优化。因此,尽管监督信号包含一些噪音,但GLC提高了伪标签的质量,从而可以更好地进行重新ID性能。在Market-1501和MSMT17上进行了USL和UDA人重新ID的广泛实验表明,我们的方法与各种基于聚类的方法广泛兼容,并始终如一地促进最先进的性能。
translated by 谷歌翻译
在亲自重新识别(REID)中,最近的研究已经验证了未标记的人图像上的模型的预训练要比ImageNet上要好得多。但是,这些研究直接应用了为图像分类设计的现有自我监督学习(SSL)方法,用于REID,而无需在框架中进行任何适应。这些SSL方法将本地视图的输出(例如红色T恤,蓝色短裤)与同时的全球视图相匹配,从而丢失了很多细节。在本文中,我们提出了一种特定于REID的预训练方法,部分意识的自我监督预训练(PASS),该方法可以生成零件级别的功能以提供细粒度的信息,并且更适合REID。通行证将图像分为几个局部区域,每个区域随机裁剪的本地视图都有特定的可学习[部分]令牌。另一方面,所有地方区域的[部分]也附加到全球视图中。通行证学习以匹配同一[部分]上本地视图的输出和全局视图。也就是说,从本地区域获得的本地视图的[部分]仅与从全球视图中学到的相应[部分]相匹配。结果,每个[部分]可以专注于图像的特定局部区域,并提取该区域的细粒度信息。实验显示通行证在Market1501和MSMT17上的新最先进的表演以及各种REID任务(例如Vanilla vit-s/16)通过Pass Achieves 92.2 \%/90.2 \%/88.5 \%地图准确性,例如Vanilla vit-s/16在Market1501上进行监督/UDA/USL REID。我们的代码可在https://github.com/casia-iva-lab/pass-reid上找到。
translated by 谷歌翻译
对象编码和识别对于自主探索,语义场景理解和重新定位等机器人任务至关重要。以前的方法已经尝试了对象或生成用于对象标识的描述符。然而,这种系统仅限于单个视点的“固定”部分对象表示。在机器人探索设置中,由于机器人从多个视点观察对象,因此需要暂时“不断发展”的全局对象表示。此外,鉴于现实世界中未知新颖对象的广泛分布,对象识别过程必须是类无话的。在此上下文中,我们提出了一种新的时间3D对象编码方法,被称为AirObject,以获取基于对象的全局关键点图形的嵌入。具体地,使用跨从基于曲线图的编码方法获得的多个帧的结构信息的时间卷积网络生成全局3D对象嵌入。我们证明AirObject实现了视频对象识别的最先进的性能,并且对严重的遮挡,感知锯齿,视点换档,变形和缩放变换,表现出最先进的单帧和稳健顺序描述符。据我们所知,AirObject是第一个时间对象编码方法之一。
translated by 谷歌翻译
本文介绍了语音(TTS)系统的Microsoft端到端神经文本:暴风雪挑战2021。这一挑战的目标是从文本中综合自然和高质量的演讲,并在两个观点中接近这一目标:首先是直接模型,并在48 kHz采样率下产生波形,这比以前具有16 kHz或24 kHz采样率的先前系统带来更高的感知质量;第二个是通过系统设计来模拟语音中的变化信息,从而提高了韵律和自然。具体而言,对于48 kHz建模,我们预测声学模型中的16 kHz熔点 - 谱图,并提出称为HIFINET的声码器直接从预测的16kHz MEL谱图中产生48kHz波形,这可以更好地促进培训效率,建模稳定性和语音。质量。我们从显式(扬声器ID,语言ID,音高和持续时间)和隐式(话语级和音素级韵律)视角系统地模拟变化信息:1)对于扬声器和语言ID,我们在培训和推理中使用查找嵌入; 2)对于音高和持续时间,我们在训练中提取来自成对的文本语音数据的值,并使用两个预测器来预测推理中的值; 3)对于话语级和音素级韵律,我们使用两个参考编码器来提取训练中的值,并使用两个单独的预测器来预测推理中的值。此外,我们介绍了一个改进的符合子块,以更好地模拟声学模型中的本地和全局依赖性。对于任务SH1,DelightFultts在MOS测试中获得4.17均匀分数,4.35在SMOS测试中,表明我们所提出的系统的有效性
translated by 谷歌翻译
对象编码和识别对于许多机器人任务是至关重要的,例如自主探索和语义重建。现有的作品依赖于检测到的对象的跟踪,但难以准确调用重新审议的对象。在本文中,我们提出了一种新的对象编码方法,基于关键点的图表,该方法被命名为AirCode。为了强大到检测到的关键点的数量,我们提出了一个特征稀疏编码和对象密度编码方法,以确保每个关键点只能影响对象描述符的一小部分,导致对视点变化具有鲁棒性,缩放,闭塞,甚至物体变形。在实验中,我们表明它实现了比最先进的算法的对象识别的卓越性能,并且能够提供可靠的语义重定位化。它是一个即插即用模块,我们希望它将在各种应用中发挥重要作用。
translated by 谷歌翻译
Graph Convolution Network (GCN) has become new state-ofthe-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs -feature transformation and nonlinear activation -contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance.In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -neighborhood aggregation -for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.0% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) -a state-of-the-art GCN-based recommender model -under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives. Our implementations are available in both TensorFlow
translated by 谷歌翻译
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both timeconsuming and sub-optimal. There are plenty of specialized hardware for neural networks, but little research has been done for specialized neural network optimization for a particular hardware architecture. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95× and the energy consumption by 1.9× with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design. * indicates equal contributions. 68 69 70 71 72 73 25 44 63 82 101 120 MobileNets (fixed 8-bit quantization) MobileNets (our flexible-bit quantization) Latency (ms) Top-1 Accuracy (%) 1MB 2MB 3MB Model Size:Figure 1: We need mixed precision for different layers. We quantize MobileNets [12] to different number of bits (both weights and activations), and it lies on a better pareto curve (yellow) than fixed bit quantization (blue). The reason is that different layers have different redundancy and have different arithmetic intensity (OPs/byte) on the hardware, which advocates for using mixed precision for different layers.
translated by 谷歌翻译