Robust prediction of citywide traffic flows at different time periods plays a crucial role in intelligent transportation systems. While previous work has made great efforts to model spatio-temporal correlations, existing methods still suffer from two key limitations: i) Most models collectively predict all regions' flows without accounting for spatial heterogeneity, i.e., different regions may have skewed traffic flow distributions. ii) These models fail to capture the temporal heterogeneity induced by time-varying traffic patterns, as they typically model temporal correlations with a shared parameterized space for all time periods. To tackle these challenges, we propose a novel Spatio-Temporal Self-Supervised Learning (ST-SSL) traffic prediction framework which enhances the traffic pattern representations to be reflective of both spatial and temporal heterogeneity, with auxiliary self-supervised learning paradigms. Specifically, our ST-SSL is built over an integrated module with temporal and spatial convolutions for encoding the information across space and time. To achieve the adaptive spatio-temporal self-supervised learning, our ST-SSL first performs the adaptive augmentation over the traffic flow graph data at both attribute- and structure-levels. On top of the augmented traffic graph, two SSL auxiliary tasks are constructed to supplement the main traffic prediction task with spatial and temporal heterogeneity-aware augmentation. Experiments on four benchmark datasets demonstrate that ST-SSL consistently outperforms various state-of-the-art baselines. Since spatio-temporal heterogeneity widely exists in practical datasets, the proposed framework may also cast light on other spatial-temporal applications. Model implementation is available at https://github.com/Echo-Ji/ST-SSL.
translated by 谷歌翻译
高性能的交通流量预测模型设计是一种智能运输系统的核心技术,是工业和学术社区的长期挑战,但仍然具有挑战性。物理原理和数据驱动模型之间缺乏整合是限制该领域发展的重要原因。在文献中,基于物理学的方法通常可以清楚地解释交通流系统的动态过程,但准确性有限,而数据驱动的方法,尤其是使用黑色盒子结构的深度学习,可以提高性能,但不能由于缺乏合理的身体依据,因此要完全信任。为了弥合纯粹数据驱动和物理驱动的方法之间的差距,我们提出了一个物理学引导的深度学习模型,名为时空微分方程网络(STDEN),该模型将交通流动器的物理机理投入到深度神经网络框架中。具体而言,我们假设道路网络上的交通流量是由潜在势能场驱动的(例如水流是由重力场驱动的),并将势能场的时空动态过程作为微分方程网络进行建模。 Stden吸收了数据驱动模型的性能优势和基于物理模型的可解释性,因此被命名为物理指导的预测模型。北京三个现实世界流量数据集的实验表明,我们的模型的表现优于最先进的基线。案例研究进一步验证了stden可以捕获城市交通机制,并具有物理含义的准确预测。提出的微分方程网络建模的框架也可能会阐明其他类似的应用程序。
translated by 谷歌翻译
旨在预测人们对不同视觉刺激的情绪的视觉情感分析(VEA)最近已成为一个有吸引力的研究主题。而不是单个标签分类任务,而是通过向不同个人投票将VEA视为标签分布学习(LDL)问题是更合理的。现有方法通常可以预测统一网络中的视觉情绪分布,从而忽略了人群投票过程中的固有主观性。在心理学中,\ textit {object-appraiSal-emotion}模型表明,每个人的情绪都受到主观评估的影响,这是由情感记忆进一步形成的。受此启发,我们提出了一个新颖的\ textit {主观性评估和匹配网络(SAMNET)},以研究视觉情感分布中的主观性。为了描述人群投票过程中的多样性,我们首先提出了\ textit {主观性评估},其中每个分支都模拟了特定个人的情感唤起过程。具体而言,我们使用基于注意力的机制来构建情感记忆,以保护每个人的独特情感体验。进一步提出了主观性损失,以确保不同个体之间的差异。此外,我们提出了\ textit {主观性匹配},旨在将无序的情感标签分配给与匈牙利算法一对一的对应关系中的单个预测。广泛的实验和比较是在公共视觉情绪分布数据集上进行的,结果表明,所提出的SAMNET始终优于最新方法。消融研究验证我们方法的有效性,可视化证明了其可解释性。
translated by 谷歌翻译
People constantly use language to learn about the world. Computational linguists have capitalized on this fact to build large language models (LLMs) that acquire co-occurrence-based knowledge from language corpora. LLMs achieve impressive performance on many tasks, but the robustness of their world knowledge has been questioned. Here, we ask: do LLMs acquire generalized knowledge about real-world events? Using curated sets of minimal sentence pairs (n=1215), we tested whether LLMs are more likely to generate plausible event descriptions compared to their implausible counterparts. We found that LLMs systematically distinguish possible and impossible events (The teacher bought the laptop vs. The laptop bought the teacher) but fall short of human performance when distinguishing likely and unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLMs generalize well across syntactic sentence variants (active vs passive) but less well across semantic sentence variants (synonymous sentences), (iii) some, but not all LLM deviations from ground-truth labels align with crowdsourced human judgments, and (iv) explicit event plausibility information emerges in middle LLM layers and remains high thereafter. Overall, our analyses reveal a gap in LLMs' event knowledge, highlighting their limitations as generalized knowledge bases. We conclude by speculating that the differential performance on impossible vs. unlikely events is not a temporary setback but an inherent property of LLMs, reflecting a fundamental difference between linguistic knowledge and world knowledge in intelligent systems.
translated by 谷歌翻译
Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. However, to our knowledge, few-shot image generation tasks have yet to be studied with DDPM-based approaches. Modern approaches are mainly built on Generative Adversarial Networks (GANs) and adapt models pre-trained on large source domains to target domains using a few available samples. In this paper, we make the first attempt to study when do DDPMs overfit and suffer severe diversity degradation as training data become scarce. Then we propose to adapt DDPMs pre-trained on large source domains to target domains using limited data. Our results show that utilizing knowledge from pre-trained DDPMs can significantly accelerate convergence and improve the quality and diversity of the generated images. Moreover, we propose a DDPM-based pairwise similarity loss to preserve the relative distances between generated samples during domain adaptation. In this way, we further improve the generation diversity of the proposed DDPM-based approaches. We demonstrate the effectiveness of our approaches qualitatively and quantitatively on a series of few-shot image generation tasks and achieve results better than current state-of-the-art GAN-based approaches in quality and diversity.
translated by 谷歌翻译
多模式学习,尤其是大规模的多模式预训练,在过去的几年中已经迅速发展,并带来了人工智能(AI)的最大进步。尽管具有有效性,但了解多模式预训练模型的潜在机制仍然是一个巨大的挑战。揭示此类模型的解释性可能会使AI领域中新型学习范式的突破。为此,鉴于人脑的多模式性质,我们建议借助非侵入性脑成像技术(例如功能磁共振成像(fMRI))探索多模式学习模型的解释性。具体而言,我们首先提出了1500万个图像文本对预训练的新设计的多模式基础模型,该模型在各种认知下游任务中显示出强烈的多模式理解和概括能力。此外,从神经编码的角度来看(基于我们的基础模型),我们发现,与单峰相比,经过多模式训练的视觉和舌编码器都更像脑状。特别是,我们确定了许多大脑区域,其中多模式训练的编码器表现出更好的神经编码性能。这与现有有关探索大脑多感觉整合的研究的发现是一致的。因此,我们认为,多模式基础模型是神经科学家研究人脑中多模式信号处理机制的更合适的工具。我们的发现还证明了多模式基础模型作为理想的计算模拟器的潜力,以促进脑和大脑的AI研究。
translated by 谷歌翻译
误差校正技术仍然有效地通过自动语音识别(ASR)模型来完善输出。现有的端到端错误校正方法基于编码器架构架构过程在解码阶段中所有令牌,都会产生不良的延迟。在本文中,我们提出了一种利用校正操作预测的ASR误差校正方法。更具体地说,我们在编码器和解码器之间构建一个预测指标,以了解是否应保留一个令牌(“ k”),已删除(“ d”)或更改(“ C”)以限制解码仅为输入的一部分序列嵌入(“ C”令牌)用于快速推断。三个公共数据集的实验证明了拟议方法在减少ASR校正中解码过程的潜伏期中的有效性。与固体编码器基线相比,我们提出的两个模型的推理速度至少提高了3次(3.4次和5.7次),同时保持相同的准确性(分别降低0.53%和1.69%)。同时,我们生产并发布了为ASR错误校正社区做出贡献的基准数据集,以促进沿这一行的研究。
translated by 谷歌翻译
对比性语言图像预训练(剪辑)模型是最近提出的大规模训练模型,它吸引了计算机视觉社区越来越多的关注。从其巨大的图像文本训练集中受益,剪辑模型在零拍学习和图像文本匹配方面学习了出色的功能。为了提高剪辑在某些目标视觉概念上的识别性能,通常希望通过在额外的培训数据上微调一些利益来进一步更新剪辑模型。但是,此操作引起了一个重要的关注:更新会损害零镜头学习或剪辑的图像文本匹配能力,即灾难性的遗忘问题吗?如果是,是否可以适应现有的持续学习算法来减轻灾难性遗忘的风险?为了回答这些问题,这项工作对剪辑模型的持续学习问题进行了系统性研究。我们构建评估协议,以衡量微调更新的影响,并探索不同的方法来升级现有的持续学习方法,以减轻剪辑模型的遗忘问题。我们的研究揭示了剪辑持续学习问题的特殊挑战,并为进一步的研究奠定了基础。此外,我们提出了一种新算法,被称为学习,而无需通过重播词汇(VR-LWF)忘记,该算法显示出减轻剪辑模型遗忘问题的确切有效性。
translated by 谷歌翻译
我们为Covid-19的快速准确CT(DL-FACT)测试提供了一系列深度学习的计算框架。我们开发了基于CT的DL框架,通过基于DL的CT图像增强和分类来提高Covid-19(加上其变体)的测试速度和准确性。图像增强网络适用于DDNet,短暂的Dennet和基于Deconvolulate的网络。为了展示其速度和准确性,我们在Covid-19 CT图像的几个来源中评估了DL-FARE。我们的结果表明,DL-FACT可以显着缩短几天到几天的周转时间,并提高Covid-19测试精度高达91%。DL-FACT可以用作诊断和监测Covid-19的医学专业人员的软件工具。
translated by 谷歌翻译
人工智能(AI)的基本目标是模仿人类的核心认知活动。尽管在AI研究中取得了巨大的成功,但大多数现有方法仅具有单认知能力。为了克服这一局限性并迈出了朝着人工通用智能(AGI)迈出的坚实一步,我们开发了一个通过庞大的多模式数据进行预训练的基础模型,可以快速适应各种下游认知任务。为了实现这一目标,我们建议通过从Internet上拖延的语义相关数据进行自我监督的学习来预先培训我们的基础模型,并表明可以在各种下游任务上获得有希望的结果。特别是,使用开发的模型解剖工具,我们证明了我们的基础模型现在拥有强大的想象力。我们认为,我们的工作从我们的“弱或狭窄AI”的常见实践到“强或广泛的AI”迈出了转变的迈向AGI。
translated by 谷歌翻译