In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation and intonation of the second language in different contexts without mutual interference. This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker. We introduce phonology embedding to capture the English differences between different phonology. Embedding mask is applied to language embedding for distinguishing information between different languages and to phonology embedding for focusing on English expression. We specially design an embedding strength modulator to capture the dynamic strength of language and phonology. Experiments show that our approach can produce significantly more natural and standard spoken English speech of the monolingual Chinese speaker. From analysis, we find that suitable phonology control contributes to better performance in different scenarios.
translated by 谷歌翻译
The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world. We propose a dataset with paired visual and tactile data called Touch and Go, in which human data collectors probe objects in natural environments using tactile sensors, while simultaneously recording egocentric video. In contrast to previous efforts, which have largely been confined to lab settings or simulated environments, our dataset spans a large number of "in the wild" objects and scenes. To demonstrate our dataset's effectiveness, we successfully apply it to a variety of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven image stylization, i.e., making the visual appearance of an object more consistent with a given tactile signal, and 3) predicting future frames of a tactile signal from visuo-tactile inputs.
translated by 谷歌翻译
Datasets for training recommender systems are often subject to distribution shift induced by users' and recommenders' selection biases. In this paper, we study the impact of selection bias on datasets with different quantization. We then leverage two differently quantized datasets from different source distributions to mitigate distribution shift by applying the inverse probability scoring method from causal inference. Empirically, our approach gains significant performance improvement over single-dataset methods and alternative ways of combining two datasets.
translated by 谷歌翻译
Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that the frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.
translated by 谷歌翻译
作为一种完全致动的系统,全向多电流飞机(OMAVS)的机动性比传统不足的多电流飞机具有更灵活的机动性,并且它在复杂环境中的障碍物避免飞行中也具有更大的优势。可以发挥OMAV的潜力的整个自由轨迹。到配置空间的高维度,使设计的轨迹生成算法有效且可扩展是一项挑战。本文旨在实现复杂环境中OMAV的障碍避免计划。 OMAVS的6-DOF轨迹生成框架首次根据几何约束的最小控制工作(MINCO)轨迹生成框架设计。根据一系列凸Polyhedra代表的安全区域,与飞机的整体形状和整体形状和整体形状和整体形状和结合在一起。动态约束,该框架最终生成了无碰撞的最佳6-DOF轨迹。车辆的态度通过立体图投影将参数化为3D矢量。基于凉亭和PX4自动驾驶仪的示意实验是为了验证提议的框架的性能。
translated by 谷歌翻译
知识图(kg)存储了大量的结构知识,而直接人类的理解并不容易。知识图表到文本(kg-to-text)生成旨在从kg产生易于理解的句子,同时,在生成的句子和kg之间保持语义一致性。现有的kg至文本生成方法短语此任务是线性化kg作为序列到序列生成任务作为输入的,并通过在每个解码的句子和kg节点word之间的简单选择来考虑生成的文本和kg的一致性问题时间步骤。但是,线性化的kg顺序通常是通过启发式搜索获得的,而无需数据驱动的优化。在本文中,我们根据从标题提取的顺序监督优化了知识描述顺序预测,并通过句法和语义正则化进一步增强了生成的句子和kg的一致性。我们合并了词性(POS)句法标签,以限制位置以复制kg中的单词并采用语义上下文评分函数,以评估生成句子中的每个单词时在本地上下文中每个单词的语义适应性。在两个数据集(WebNLG和DART)上进行了广泛的实验,并实现最先进的表演。
translated by 谷歌翻译
在当前的股票市场中,计算机科学和技术越来越广泛地用于分析股票。与大多数相关的机器学习股票价格预测工作不同,这项工作研究了公司年度报告披露后第二天的股票价格趋势。我们使用各种不同的模型,包括决策树,逻辑回归,随机森林,神经网络,原型网络。我们使用两组财务指标(密钥和扩展)进行实验,这些财务指标是从公司披露的Eastmoney网站获得的,最后我们发现这些模型的行为不佳来预测趋势。此外,我们还过滤了ROE大于0.15的库存,净现金比大于0.9。我们得出的结论是,根据基于公司刚发布的年度报告的财务指标,披露后第二天股票价格变动的可预测性较弱,最高准确性约为59.6%,我们的测试中的最高精度约为0.56。由随机森林分类器设置,库存过滤并不能改善性能。在所有这些模型中,随机森林总体上表现最好,这些模型符合某些工作的发现。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译