The goal of multimodal abstractive summarization (MAS) is to produce a concise summary given the multimodal data (text and vision). Existing studies on MAS mainly focus on how to effectively use the extracted visual features, having achieved impressive success on the high-resource English dataset. However, less attention has been paid to the quality of the visual features to the summary, which may limit the model performance especially in the low- and zero-resource scenarios. In this paper, we propose to improve the summary quality through summary-oriented visual features. To this end, we devise two auxiliary tasks including \emph{vision to summary task} and \emph{masked image modeling task}. Together with the main summarization task, we optimize the MAS model via the training objectives of all these tasks. By these means, the MAS model can be enhanced by capturing the summary-oriented visual features, thereby yielding more accurate summaries. Experiments on 44 languages, covering mid-high-, low-, and zero-resource scenarios, verify the effectiveness and superiority of the proposed approach, which achieves state-of-the-art performance under all scenarios.
translated by 谷歌翻译
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. In our experiments, we utilize the pre-training-then-fine-tuning paradigm. In the first pre-training stage, we employ data filtering and synthetic data generation (i.e., back-translation, forward-translation, and knowledge distillation). In the second fine-tuning stage, we investigate speaker-aware in-domain data generation, speaker adaptation, prompt-based context modeling, target denoising fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve 0.810 and 0.946 COMET scores. The COMET scores of English-German and German-English are the highest among all submissions.
translated by 谷歌翻译
隐式神经表示显示了3D场景重建的有希望的潜力。最近的工作将其应用于自主3D重建,通过学习信息获得图路径计划的信息增益。有效,信息增益的计算很昂贵,并且与使用体积表示相比,使用隐式表示为3D点进行碰撞检查要慢得多。在本文中,我们建议1)利用神经网络作为信息增益场的隐式函数近似器,以及2)将隐式细粒表示与粗量表示形式结合起来,以提高效率。随着效率的提高,我们提出了基于基于图的计划者的新型信息路径计划。我们的方法表明,与具有隐性和明确表示的自主重建相比,重建质量和计划效率的显着提高。我们将该方法部署在真正的无人机上,结果表明我们的方法可以计划信息意见并以高质量重建场景。
translated by 谷歌翻译
隐式神经表示表现出了令人信服的结果3D重建,并且最近也证明了在线大满贯系统的潜力。但是,将它们应用于自主3D重建,在此尚未研究机器人探索场景并计划重建的视图路径的情况下。在本文中,我们首次通过解决两个关键挑战来首次探索自动3D场景重建的可能性:1)寻求标准以根据新表示形式衡量候选人观点的质量,以及2)从可以推广到不同场景的数据而不是手工制作的数据中学习标准。对于第一个挑战,提出了峰值信噪比(PSNR)的代理来量化观点质量。代理是通过将场景中空间点的颜色视为在高斯分布下而不是确定性分布下的随机变量来获得的;分布的方差量化了重建的不确定性并组成代理。在第二个挑战中,代理与场景隐式神经网络的参数共同优化。通过提出的视图质量标准,我们可以将新表示形式应用于自动3D重建。我们的方法证明了与使用TSDF或重建的变体相比,在没有视图计划的情况下,与使用TSDF或重建的变体相比,对各种指标的各种指标进行了重大改进。
translated by 谷歌翻译
核分型是评估染色体异常可能存在的重要程序。但是,由于非刚性性质,染色体通常在微观图像中弯曲,这种变形形状阻碍了细胞遗传学家的染色体分析。在本文中,我们提出了一个自我发项的指导框架,以消除染色体的曲率。提出的框架提取空间信息和本地纹理,以在回归模块中保留带模式。借助弯曲染色体的互补信息,改进模块旨在进一步改善细节。此外,我们提出了两个专用的几何约束,以维持长度并恢复染色体的变形。为了训练我们的框架,我们创建一个合成数据集,其中通过网格变形从现实世界的直染色体生成弯曲的染色体。定量和定性实验是对合成和现实世界数据进行的。实验结果表明,我们所提出的方法可以有效拉直弯曲的染色体,同时保持带的细节和长度。
translated by 谷歌翻译
Recent years have witnessed the rapid growth of Small Private Online Courses (SPOC) which is able to highly customized and personalized to adapt variable educational requests, in which machine learning techniques are explored to summarize and predict the learner's performance, mostly focus on the final grade. However, the problem is that the final grade of learners on SPOC is generally seriously imbalance which handicaps the training of prediction model. To solve this problem, a sampling batch normalization embedded deep neural network (SBNEDNN) method is developed in this paper. First, a combined indicator is defined to measure the distribution of the data, then a rule is established to guide the sampling process. Second, the batch normalization (BN) modified layers are embedded into full connected neural network to solve the data imbalanced problem. Experimental results with other three deep learning methods demonstrates the superiority of the proposed method.
translated by 谷歌翻译
基于方面的情绪分析(ABSA)主要涉及三个子任务:方面术语提取,意见术语提取和方面思维分类,其通常以单独的或联合方式处理。然而,以前的方法并没有很好地利用三个子任务之间的互动关系,并不完全利用易于使用的文档级标记的域/情绪知识,这限制了他们的性能。为解决这些问题,我们提出了一种用于端到端ABSA的新型迭代多知识转移网络(IMKTN)。首先,通过ABSA子组织之间的交互式相关性,我们的IMKTN通过利用精心设计的路由算法将来自三个子任务中的任意两个子组织中的任意两个子组织中的任务特定知识传输到另一个,即任何两个这三个子组织将有助于第三个子任务。对于另一个,我们的IMKTN无疑将文档级知识,即特定于域和情绪相关的知识传输到方面级别子特派团,以进一步提高相应的性能。三个基准数据集的实验结果证明了我们方法的有效性和优越性。
translated by 谷歌翻译
Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM
translated by 谷歌翻译
Given a document in a source language, cross-lingual summarization (CLS) aims at generating a concise summary in a different target language. Unlike monolingual summarization (MS), naturally occurring source-language documents paired with target-language summaries are rare. To collect large-scale CLS samples, existing datasets typically involve translation in their creation. However, the translated text is distinguished from the text originally written in that language, i.e., translationese. Though many efforts have been devoted to CLS, none of them notice the phenomenon of translationese. In this paper, we first confirm that the different approaches to constructing CLS datasets will lead to different degrees of translationese. Then we design systematic experiments to investigate how translationese affects CLS model evaluation and performance when it appears in source documents or target summaries. In detail, we find that (1) the translationese in documents or summaries of test sets might lead to the discrepancy between human judgment and automatic evaluation; (2) the translationese in training sets would harm model performance in the real scene; (3) though machine-translated documents involve translationese, they are very useful for building CLS systems on low-resource languages under specific training strategies. Furthermore, we give suggestions for future CLS research including dataset and model developments. We hope that our work could let researchers notice the phenomenon of translationese in CLS and take it into account in the future.
translated by 谷歌翻译
广告视频编辑旨在将广告视频自动编辑为较短的视频,同时保留广告商传达的连贯内容和关键信息。它主要包含两个阶段:视频细分和段组合。现有方法在视频分割阶段表现良好,但遭受了对额外繁琐模型的依赖性问题,并且在细分组合阶段的性能差。为了解决这些问题,我们提出了M-SAN(多模式段组合网络),该网络可以执行高效且连贯的段组合任务。它利用从段中提取的多模式表示形式,并遵循带有注意机制的编码器ptr-decoder ptr-net框架。重要性补偿奖励是为培训M-SAN设计的。我们在广告客户收集的丰富广告方案下,在ADS-1K数据集上使用1000多个视频进行实验。为了评估这些方法,我们提出了一个统一的imp-coh@Time,该指标可以全面评估同时评估产出的重要性,相干性和持续时间。实验结果表明,我们的方法比随机选择和公制上的先前方法更好的性能。消融实验进一步验证了多模式表示和重要性互动的奖励可显着改善性能。 ADS-1K数据集可用:https://github.com/yunlong10/ads-1k
translated by 谷歌翻译