In this paper, we carry out numerical analysis to prove convergence of a novel sample-wise back-propagation method for training a class of stochastic neural networks (SNNs). The structure of the SNN is formulated as discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the back-propagation for training the SNN. The convergence analysis is derived with and without convexity assumption for optimization of the SNN parameters. Especially, our analysis indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. Numerical experiments are carried out to validate the analysis results, and the performance of the sample-wise back-propagation method for training SNNs is examined by benchmark machine learning examples.
translated by 谷歌翻译
实时音乐伴奏的生成在音乐行业(例如音乐教育和现场表演)中具有广泛的应用。但是,自动实时音乐伴奏的产生仍在研究中,并且经常在逻辑延迟和暴露偏见之间取决于权衡。在本文中,我们提出了Song Driver,这是一种无逻辑延迟或暴露偏见的实时音乐伴奏系统。具体而言,Songdriver将一个伴奏的生成任务分为两个阶段:1)安排阶段,其中变压器模型首先安排了和弦,以实时进行输入旋律,并在下一阶段加速了和弦,而不是播放它们。 2)预测阶段,其中CRF模型基于先前缓存的和弦生成了即将到来的旋律的可播放的多轨伴奏。通过这种两相策略,歌手直接生成即将到来的旋律的伴奏,从而达到了零逻辑延迟。此外,在预测时间步的和弦时,歌手是指第一阶段的缓存和弦,而不是其先前的预测,这避免了暴露偏见问题。由于输入长度通常在实时条件下受到限制,因此另一个潜在的问题是长期顺序信息的丢失。为了弥补这一缺点,我们在当前时间步骤作为全球信息之前从长期音乐作品中提取了四个音乐功能。在实验中,我们在一些开源数据集上训练歌手,以及由中国风格的现代流行音乐得分构建的原始\```````'''aisong数据集。结果表明,歌手在客观和主观指标上均优于现有的SOTA(最先进)模型,同时大大降低了物理潜伏期。
translated by 谷歌翻译
真实世界的文本应用程序通常涉及组成广泛的文本控制操作,例如编辑文本W.R.T.属性,操纵关键字和结构,并生成所需属性的新文本。事先的工作通常会学习/芬太尼语言模型(LM)以执行操作的个人或特定子集。最近的研究以插件方式研究了合并操作,通常在复杂序列空间中以昂贵的搜索或优化进行了研究。本文提出了一种新的有效方法,用于在紧凑的文本潜在空间中进行可复合的文本操作。文本潜在矢量的低维度和不同性使我们能够基于给定的任意插入运算符(例如属性分类器)基于普通微分方程(ODE)开发有效的采样器。通过通过有效的适应性将预告片的LMS(例如GPT2)连接到潜在空间,然后我们将采样向量解码为所需的文本序列。灵活的方法允许使用来自不同域中的任何相关数据获取的各种控制操作员(情感,时态,形式,关键字等)。实验表明,在我们的方法中构成这些操作员可以生成或编辑高质量文本,从而在发电质量和效率方面显着改善了以前的方法。
translated by 谷歌翻译
高级驾驶员辅助系统(ADA)旨在提高车辆安全性。但是,如果不了解当前ADA及其可能的解决方案的原因和局限性,就很难获得此类收益。这项研究1)通过文献综述研究了ADA的局限性和解决方案,2)通过使用自然语言处理模型来确定ADA通过消费者投诉的原因和影响,3)比较了两者之间的主要差异。这两条研究线确定了类似的ADA原因类别,包括人为因素,环境因素和车辆因素。但是,学术研究更多地集中在ADA问题的人为因素上,并提出了高级算法来减轻此类问题,而驾驶员抱怨ADAS失败的更多车辆因素,这导致了最大的后果。这两个来源的发现倾向于相互补充,并为未来的改善ADA提供了重要意义。
translated by 谷歌翻译
凝视估计方法从面部特征学习眼睛凝视。然而,在面部图像中的丰富信息中,真正的凝视相关特征只对应于眼睛区域的微妙变化,而其他凝视无关的功能,如照明,个人外观和甚至面部表情可能会以意想不到的方式影响学习。这是现有方法在跨域/数据集评估中显示出显着性能下降的主要原因。在本文中,我们在凝视估计中解决了跨域问题。与常见的域适应方法不同,我们提出了一种域泛化方法,以改善跨域性能而不触摸目标样本。通过凝视特征纯化实现域泛化。我们消除了凝视无关的因素,如照明和身份,以改善跨域性能。我们设计了用于凝视功能净化的即插即用自我对抗框架。该框架不仅增强了我们的基线,而且直接且显着地增强了现有的凝视估计方法。据我们所知,我们是第一个提出凝视估计中的域泛化方法的。我们的方法不仅可以在典型的凝视估计方法之间实现最先进的性能,而且在域适应方法中也是竞争结果。代码在https://github.com/yihuacheng/puregaze中发布。
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Supervised Deep-Learning (DL)-based reconstruction algorithms have shown state-of-the-art results for highly-undersampled dynamic Magnetic Resonance Imaging (MRI) reconstruction. However, the requirement of excessive high-quality ground-truth data hinders their applications due to the generalization problem. Recently, Implicit Neural Representation (INR) has appeared as a powerful DL-based tool for solving the inverse problem by characterizing the attributes of a signal as a continuous function of corresponding coordinates in an unsupervised manner. In this work, we proposed an INR-based method to improve dynamic MRI reconstruction from highly undersampled k-space data, which only takes spatiotemporal coordinates as inputs. Specifically, the proposed INR represents the dynamic MRI images as an implicit function and encodes them into neural networks. The weights of the network are learned from sparsely-acquired (k, t)-space data itself only, without external training datasets or prior images. Benefiting from the strong implicit continuity regularization of INR together with explicit regularization for low-rankness and sparsity, our proposed method outperforms the compared scan-specific methods at various acceleration factors. E.g., experiments on retrospective cardiac cine datasets show an improvement of 5.5 ~ 7.1 dB in PSNR for extremely high accelerations (up to 41.6-fold). The high-quality and inner continuity of the images provided by INR has great potential to further improve the spatiotemporal resolution of dynamic MRI, without the need of any training data.
translated by 谷歌翻译
Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
translated by 谷歌翻译
The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a probabilistic embedding to represent each 3D point in a part embedding space. Assuming the corresponding points are similar in the embedding space, we implement dense correspondence through an inverse function mapping from the part embedding vector to a corresponded 3D point. Both functions are jointly learned with several effective and uncertainty-aware loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.
translated by 谷歌翻译