This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/ sentencepiece.
translated by 谷歌翻译
Subword units are an effective way to alleviate the open vocabulary problems in neural machine translation (NMT). While sentences are usually converted into unique subword sequences, subword segmentation is potentially ambiguous and multiple segmentations are possible even with the same vocabulary. The question addressed in this paper is whether it is possible to harness the segmentation ambiguity as a noise to improve the robustness of NMT. We present a simple regularization method, subword regularization, which trains the model with multiple subword segmentations probabilistically sampled during training. In addition, for better subword sampling, we propose a new subword segmentation algorithm based on a unigram language model. We experiment with multiple corpora and report consistent improvements especially on low resource and out-of-domain settings.
translated by 谷歌翻译
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference -sometimes prohibitively so in the case of very large data sets and large models. Several authors have also charged that NMT systems lack robustness, particularly when input sentences contain rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. To directly optimize the translation BLEU scores, we consider refining the models by using reinforcement learning, but we found that the improvement in the BLEU scores did not reflect in the human evaluation. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
translated by 谷歌翻译
This study demonstrates the feasibility of point cloud-based proactive link quality prediction for millimeter-wave (mmWave) communications. Image-based methods to quantitatively and deterministically predict future received signal strength using machine learning from time series of depth images to mitigate the human body line-of-sight (LOS) path blockage in mmWave communications have been proposed. However, image-based methods have been limited in applicable environments because camera images may contain private information. Thus, this study demonstrates the feasibility of using point clouds obtained from light detection and ranging (LiDAR) for the mmWave link quality prediction. Point clouds represent three-dimensional (3D) spaces as a set of points and are sparser and less likely to contain sensitive information than camera images. Additionally, point clouds provide 3D position and motion information, which is necessary for understanding the radio propagation environment involving pedestrians. This study designs the mmWave link quality prediction method and conducts two experimental evaluations using different types of point clouds obtained from LiDAR and depth cameras, as well as different numerical indicators of link quality, received signal strength and throughput. Based on these experiments, our proposed method can predict future large attenuation of mmWave link quality due to LOS blockage by human bodies, therefore our point cloud-based method can be an alternative to image-based methods.
translated by 谷歌翻译
由于自我批判性和歧义,了解动态的手动运动和动态动作是一项基本而又具有挑战性的任务。为了解决遮挡和歧义,我们开发了一个基于变压器的框架来利用时间信息以进行稳健的估计。注意到手部姿势估计和动作识别之间的不同时间粒度和语义相关性,我们建立了一个网络层次结构,其中有两个级联变压器编码器,其中第一个利用了短期的时间cue进行手姿势估算,而后者则每次聚集物,后者每次聚集体 - 帧姿势和对象信息在更长的时间范围内识别动作。我们的方法在两个第一人称手动作基准(即FPHA和H2O)上取得了竞争成果。广泛的消融研究验证了我们的设计选择。我们将开放源代码和数据以促进未来的研究。
translated by 谷歌翻译
最近的工作表明,通过将RL任务转换为监督学习任务,通过有条件的政策来解决离线加强学习(RL)可以产生有希望的结果。决策变压器(DT)结合了条件政策方法和变压器体系结构,以显示针对多个基准测试的竞争性能。但是,DT缺乏缝线能力 - 离线RL的关键能力之一,它从亚最佳轨迹中学习了最佳策略。当离线数据集仅包含亚最佳轨迹时,问题就变得很重要。另一方面,基于动态编程(例如Q学习)的常规RL方法不会遇到相同的问题;但是,他们患有不稳定的学习行为,尤其是当它在非政策学习环境中采用功能近似时。在本文中,我们提出了通过利用动态编程(Q-Learning)的好处来解决DT的缺点的Q学习决策者(QDT)。 QDT利用动态编程(Q-学习)结果来重新标记培训数据中的返回。然后,我们使用重新标记的数据训练DT。我们的方法有效利用了这两种方法的好处,并弥补了彼此的缺点,以取得更好的绩效。我们在简单的环境中演示了DT的问题和QDT的优势。我们还在更复杂的D4RL基准测试中评估了QDT,显示出良好的性能增长。
translated by 谷歌翻译
本文提出了一个逐步连接的光场网络(Prolif),以构成复杂的前向场景的新观点。扩散编码一个4D光场,该场允许在一个训练步骤中渲染大量射线,以实现图像或贴片级损失。直接从图像中学习神经光场很难呈现多视图一致的图像,因为它对基础3D几何形状的不了解。为了解决这个问题,我们提出了一种渐进培训计划和正则化损失,以推断训练过程中的基础几何形状,这两者都会实现多视图一致性,从而极大地提高了渲染质量。实验表明,与香草神经光场相比,我们的方法能够实现明显更好的渲染质量,并且与挑战性的LLFF数据集和闪亮对象数据集的类似NERF的渲染方法相当。此外,我们证明了与LPIP的损失更好的兼容性,以实现与不同的光条件和剪辑损失的稳健性,以控制场景的渲染方式。项目页面:https://totoro97.github.io/projects/prolif。
translated by 谷歌翻译
在许多计算机视觉和图形应用程序中,从2D图像重建3D室内场景是一项重要任务。这项任务中的一个主要挑战是,典型的室内场景中的无纹理区域使现有方法难以产生令人满意的重建结果。我们提出了一种名为Neuris的新方法,以高质量地重建室内场景。 Neuris的关键思想是将估计的室内场景正常整合为神经渲染框架中的先验,以重建大型无纹理形状,并且重要的是,以适应性的方式进行此操作,以便重建不规则的形状,并具有很好的细节。 。具体而言,我们通过检查优化过程中重建的多视图一致性来评估正常先验的忠诚。只有被接受为忠实的正常先验才能用于3D重建,通常发生在平滑形状的区域中,可能具有弱质地。但是,对于那些具有小物体或薄结构的区域,普通先验通常不可靠,我们只能依靠输入图像的视觉特征,因为此类区域通常包含相对较丰富的视觉特征(例如,阴影变化和边界轮廓)。广泛的实验表明,在重建质量方面,Neuris明显优于最先进的方法。
translated by 谷歌翻译
在互动过程中了解人类的意图一直是一个持久的主题,它在人类机器人互动,虚拟现实和监视中都有应用。在这项研究中,我们专注于与大型每日物体的全身相互作用,并旨在预测对人类对象相互作用的顺序观察,以预测对象和人类的未来状态。由于没有这样的数据集专用于与大型每日物体的全身相互作用,因此我们收集了一个大规模的数据集,其中包含数千种用于培训和评估目的的交互。我们还观察到,对象的固有物理属性对于对象运动预测很有用,因此设计一组对象动态描述符以编码此类内部属性。我们将对象动态描述符视为一种新模式,并提出图形神经网络HO-GCN,以将运动数据和动态描述符为预测任务。我们显示了所提出的网络,消耗动态描述符可以实现最先进的预测结果,并帮助网络更好地推广到看不见的对象。我们还证明了预测结果对人类机器人的合作有用。
translated by 谷歌翻译
我们介绍了Sparseneus,这是一种基于神经渲染的新方法,用于从多视图图像中进行表面重建的任务。当仅提供稀疏图像作为输入时,此任务变得更加困难,这种情况通常会产生不完整或失真的结果。此外,他们无法概括看不见的新场景会阻碍他们在实践中的应用。相反,Sparseneus可以概括为新场景,并与稀疏的图像(仅2或3)良好合作。 Sparseneus采用签名的距离函数(SDF)作为表面表示,并通过引入代码编码通用表面预测的几何形状来从图像特征中学习可概括的先验。此外,引入了几种策略,以有效利用稀疏视图来进行高质量重建,包括1)多层几何推理框架以粗略的方式恢复表面; 2)多尺度的颜色混合方案,以实现更可靠的颜色预测; 3)一种一致性意识的微调方案,以控制由遮挡和噪声引起的不一致区域。广泛的实验表明,我们的方法不仅胜过最先进的方法,而且表现出良好的效率,可推广性和灵活性。
translated by 谷歌翻译