Graph neural networks (GNNs) have received remarkable success in link prediction (GNNLP) tasks. Existing efforts first predefine the subgraph for the whole dataset and then apply GNNs to encode edge representations by leveraging the neighborhood structure induced by the fixed subgraph. The prominence of GNNLP methods significantly relies on the adhoc subgraph. Since node connectivity in real-world graphs is complex, one shared subgraph is limited for all edges. Thus, the choices of subgraphs should be personalized to different edges. However, performing personalized subgraph selection is nontrivial since the potential selection space grows exponentially to the scale of edges. Besides, the inference edges are not available during training in link prediction scenarios, so the selection process needs to be inductive. To bridge the gap, we introduce a Personalized Subgraph Selector (PS2) as a plug-and-play framework to automatically, personally, and inductively identify optimal subgraphs for different edges when performing GNNLP. PS2 is instantiated as a bi-level optimization problem that can be efficiently solved differently. Coupling GNNLP models with PS2, we suggest a brand-new angle towards GNNLP training: by first identifying the optimal subgraphs for edges; and then focusing on training the inference model by using the sampled subgraphs. Comprehensive experiments endorse the effectiveness of our proposed method across various GNNLP backbones (GCN, GraphSage, NGCF, LightGCN, and SEAL) and diverse benchmarks (Planetoid, OGB, and Recommendation datasets). Our code is publicly available at \url{https://github.com/qiaoyu-tan/PS2}
translated by 谷歌翻译
Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes. This allows ResFormer to cope with novel resolutions effectively. We conduct extensive experiments for image classification on ImageNet. The results provide strong quantitative evidence that ResFormer has promising scaling abilities towards a wide range resolutions. For instance, ResFormer-B-MR achieves a Top-1 accuracy of 75.86% and 81.72% when evaluated on relatively low and high resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better than DeiT-B. We also demonstrate, among other things, ResFormer is flexible and can be easily extended to semantic segmentation and video action recognition.
translated by 谷歌翻译
本文介绍了蒙古人的高质量开源文本到语音(TTS)合成数据集,蒙古是一种低资源的语言,该语言是全球超过1000万人所讲的。该数据集名为MNTTS,由一位22岁专业女性蒙古播音员说的大约8个小时的录音录音组成。它是第一个开发的公开数据集,旨在促进学术界和行业中的蒙古TTS应用程序。在本文中,我们通过描述数据集开发程序并面临挑战来分享我们的经验。为了证明数据集的可靠性,我们建立了一个基于FastSpeech2模型和HIFI-GAN Vocoder的强大的非自动回调基线系统,并使用主观平均意见分数(MOS)和实时因素(RTF)指标对其进行了评估。评估结果表明,在我们的数据集上训练的功能强大的基线系统可在4和RTF上获得MOS,大约3.30美元\ times10^{ - 1} $,这使其适用于实际使用。数据集,培训配方和预估计的TTS模型是免费可用的\ footNote {\ label {github} \ url {https://github.com/walker.com/walker-hyf/mntts}}}。
translated by 谷歌翻译
无人驾驶飞机(UAV)的实时对象检测是一个具有挑战性的问题,因为Edge GPU设备作为物联网(IoT)节点的计算资源有限。为了解决这个问题,在本文中,我们提出了一种基于Yolox模型的新型轻型深度学习体系结构,用于Edge GPU上的实时对象检测。首先,我们设计了一个有效且轻巧的PixSF头,以更换Yolox的原始头部以更好地检测小物体,可以将其进一步嵌入深度可分离的卷积(DS Conv)中,以达到更轻的头。然后,开发为减少网络参数的颈层中的较小结构,这是精度和速度之间的权衡。此外,我们将注意模块嵌入头层中,以改善预测头的特征提取效果。同时,我们还改进了标签分配策略和损失功能,以减轻UAV数据集的类别不平衡和盒子优化问题。最后,提出了辅助头进行在线蒸馏,以提高PIXSF Head中嵌入位置嵌入和特征提取的能力。在NVIDIA Jetson NX和Jetson Nano GPU嵌入平台上,我们的轻质模型的性能得到了实验验证。扩展的实验表明,与目前的模型相比,Fasterx模型在Visdrone2021数据集中实现了更好的折衷和延迟之间的折衷。
translated by 谷歌翻译
我们介绍了关于多语言信息访问(MIA)2022共享任务的研讨会的结果,评估了16种类型上多样性的语言中的跨语性开放回程答案(QA)系统。在此任务中,我们在14种类型上多样化的语言中调整了两个大规模的跨语性开放式质疑QA数据集,并使用了2种代表性不足的语言中的新注释的开放式QA数据:Tagalog和Tamil。四个团队提交了他们的系统。利用迭代开采的最佳系统是不同的负面示例和较大的预审慎模型达到32.2 F1,表现优于我们的基线4.5分。第二最佳系统使用实体感知的上下文化表示文档检索,并在泰米尔语(20.8 F1)方面取得了重大改进,而其他大多数系统的得分几乎为零。
translated by 谷歌翻译
大型未标记语料库上的预训练的变压器语言模型已产生了最新的最先进的结果,从而导致了自然语言处理,有机分子设计和蛋白质序列的产生。但是,尚未应用这种模型来学习无机材料的组成模式。在这里,我们使用在ICSD,OQMD中存放的材料和材料项目数据库中扩展的公式培训了七种现代变压器模型(GPT,GPT-2,GPT-2,GPT-NEO,GPT-NEO,GPT-J,BLMM,BART和ROBERTA) 。六个不同的数据集,具有/输出非电荷 - 中性或平衡的电负性样品用于对性能进行基准测试,并发现现代变压器模型的产生偏见,以生成材料组成的生成设计。我们的广泛实验表明,基于因果语言模型的材料变形金刚可以产生高达97.54 \%的化学有效材料组合物,即充电中性,而91.40 \%的电负性平衡,与基线相比,它的富集高6倍以上伪随机抽样算法。这些模型还表现出了很高的新颖性,并且它们在新材料发现中的潜力已经证明了它们的能力恢复了留出的材料。我们还发现,可以通过使用精选的训练集(例如高带盖材料)训练模型来量身定制生成的样品的性能。我们的实验还表明,不同模型在生成样品的属性方面都有自己的喜好,并且其运行时间复杂性差异很大。我们已经应用了材料变压器模型来发现一套使用DFT计算验证的新材料。
translated by 谷歌翻译
Navier-Stokes方程是描述液体和空气等流体运动的重要部分微分方程。由于Navier-Stokes方程的重要性,有效的数值方案的发展对科学和工程师都很重要。最近,随着AI技术的开发,已经设计了几种方法来整合深层神经网络,以模拟和推断不可压缩的Navier-Stokes方程所控制的流体动力学,这些方程可以以无网状和可不同的方式加速模拟或推断过程。在本文中,我们指出,现有的深入Navier-Stokes知情方法的能力仅限于处理非平滑或分数方程,这在现实中是两种关键情况。为此,我们提出了\ emph {深入的随机涡流方法}(drvm),该方法将神经网络与随机涡流动力学系统相结合,等效于Navier-Stokes方程。具体而言,随机涡流动力学激发了用于训练神经网络的基于蒙特卡洛的损失函数,从而避免通过自动差异计算衍生物。因此,DRVM不仅可以有效地求解涉及粗糙路径,非差异初始条件和分数运算符的Navier-Stokes方程,而且还继承了基于深度学习的求解器的无网格和可区分优势。我们对凯奇问题,参数求解器学习以及2-D和3-D不可压缩的Navier-Stokes方程的逆问题进行实验。所提出的方法为Navier-Stokes方程的仿真和推断提供了准确的结果。特别是对于包括奇异初始条件的情况,DRVM明显胜过现有的PINN方法。
translated by 谷歌翻译
通用事件边界检测(GEBD)任务旨在检测通用的,无分类的事件边界,将整个视频分为块。在本文中,我们应用蒙版的自动编码器来提高GEBD任务上的算法性能。我们的方法主要采用了对GEBD任务进行微调的蒙面自动编码器的合奏,并将其作为其他基本模型的自我监督的学习者。此外,我们还使用半监督的伪标签方法来充分利用训练时丰富的未标记动力学-400数据。此外,我们提出了一种软标签方法,以部分平衡正面和负样本,并减轻此任务中模棱两可的标记问题。最后,实施了一个棘手的分割对准策略,以完善我们的模型预测到更准确的位置的边界。通过我们的方法,我们在动力学-GEBD测试集上的F1得分上获得了85.94%的成绩,与2021 Kinetics-GEBD挑战的获胜者相比,F1得分提高了2.31%。我们的代码可从https://github.com/contentandmaterialportortait/mae-gebd获得。
translated by 谷歌翻译
随着视觉变压器(VIT)在各种计算机视觉任务中取得了重大进展,最近的文献提出了各种香草VIT的变体,以提高效率和功效。但是,目前尚不清楚其独特的建筑如何影响鲁棒性对共同的腐败。在本文中,我们首次尝试探究VIT变体之间的稳健性差距,并探索对鲁棒性必不可少的基础设计。通过广泛而严格的基准测试,我们证明了简单的体系结构设计,例如重叠的补丁嵌入和卷积进料前馈网络(FFN)可以促进VIT的稳健性。此外,由于培训对培训的影响很大程度上取决于数据的增强,因此以鲁棒性目的的先前基于CNN的增强策略是否仍然值得研究。我们探索了VIT上的不同数据增强,并验证了对抗性噪声训练是否强大,而傅立叶域增强则不如。基于这些发现,我们引入了一种新颖的条件方法,该方法生成以输入图像为条件的动态增强参数,从而为常见的腐败提供了最新的鲁棒性。
translated by 谷歌翻译
To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.
translated by 谷歌翻译