Transformer-based models have been widely demonstrated to be successful in computer vision tasks by modelling long-range dependencies and capturing global representations. However, they are often dominated by features of large patterns leading to the loss of local details (e.g., boundaries and small objects), which are critical in medical image segmentation. To alleviate this problem, we propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs, namely, the Global-to-Local Spatial Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA has the ability to aggregate and represent both global and local spatial features, which are beneficial for locating large and small objects, respectively. The SBA module is used to aggregate the boundary characteristic from low-level features and semantic information from high-level features for better preserving boundary details and locating the re-calibration objects. Extensive experiments in six benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images. In addition, our approach is more robust than existing methods in various challenging situations such as small object segmentation and ambiguous object boundaries.
translated by 谷歌翻译
西里尔和传统蒙古人是蒙古写作系统的两个主要成员。西里尔传统的蒙古双向转换(CTMBC)任务包括两个转换过程,包括西里尔蒙古人到传统的蒙古人(C2T)和传统的蒙古人到西里尔蒙古人转换(T2C)。以前的研究人员采用了传统的联合序列模型,因为CTMBC任务是自然序列到序列(SEQ2SEQ)建模问题。最近的研究表明,基于反复的神经网络(RNN)和自我注意力(或变压器)的编码器模型模型已显示一些主要语言之间的机器翻译任务有了显着改善,例如普通话,英语,法语等。但是,对于是否可以利用RNN和变压器模型可以改善CTMBC质量,仍然存在开放问题。为了回答这个问题,本文研究了这两种强大的CTMBC任务技术的实用性,并结合了蒙古语的凝集特征。我们分别基于RNN和Transformer构建基于编码器的CTMBC模型,并深入比较不同的网络配置。实验结果表明,RNN和Transformer模型都优于传统的关节序列模型,其中变压器可以达到最佳性能。与关节序列基线相比,C2T和T2C的变压器的单词错误率(WER)分别降低了5.72 \%和5.06 \%。
translated by 谷歌翻译
本文介绍了蒙古人的高质量开源文本到语音(TTS)合成数据集,蒙古是一种低资源的语言,该语言是全球超过1000万人所讲的。该数据集名为MNTTS,由一位22岁专业女性蒙古播音员说的大约8个小时的录音录音组成。它是第一个开发的公开数据集,旨在促进学术界和行业中的蒙古TTS应用程序。在本文中,我们通过描述数据集开发程序并面临挑战来分享我们的经验。为了证明数据集的可靠性,我们建立了一个基于FastSpeech2模型和HIFI-GAN Vocoder的强大的非自动回调基线系统,并使用主观平均意见分数(MOS)和实时因素(RTF)指标对其进行了评估。评估结果表明,在我们的数据集上训练的功能强大的基线系统可在4和RTF上获得MOS,大约3.30美元\ times10^{ - 1} $,这使其适用于实际使用。数据集,培训配方和预估计的TTS模型是免费可用的\ footNote {\ label {github} \ url {}}}。
translated by 谷歌翻译
以在线方式进行功能选择的在线流媒体特征选择(OSFS)在处理高维数据方面起着重要作用。在许多真实的应用程序(例如智能医疗平台)中,流媒体功能始终存在一些缺少的数据,这在进行OSFS(即如何在稀疏流式传输功能和标签之间建立不确定的关系)方面提出了至关重要的挑战。不幸的是,现有的OSFS算法从未考虑过这种不确定的关系。为了填补这一空白,我们在本文中提出了一个不确定性(OS2FSU)算法的在线稀疏流媒体特征选择。 OS2FSU由两个主要部分组成:1)潜在因素分析用于预测稀疏流特征中缺少的数据,然后使用划分功能选择,而2)使用模糊逻辑和邻里粗糙集来减轻估计流流之间的不确定性进行功能选择期间的功能和标签。在实验中,将OS2FSU与六个真实数据集中的五种最先进的OSFS算法进行了比较。结果表明,在OSF中遇到丢失的数据时,OS2FSU胜过其竞争对手。
translated by 谷歌翻译
translated by 谷歌翻译
Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for accurate polyp segmentation tasks with excellent results. However, due to the structure of polyps image and the varying shapes of polyps, it is easy for existing deep learning models to overfit the current dataset. As a result, the model may not process unseen colonoscopy data. To address this, we propose a new state-of-the-art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models. Specifically, our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and restrict attention dispersion. The SSFormer achieves stateof-the-art performance in both learning and generalization assessment.
translated by 谷歌翻译
translated by 谷歌翻译