本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
Accurate short-term traffic prediction plays a pivotal role in various smart mobility operation and management systems. Currently, most of the state-of-the-art prediction models are based on graph neural networks (GNNs), and the required training samples are proportional to the size of the traffic network. In many cities, the available amount of traffic data is substantially below the minimum requirement due to the data collection expense. It is still an open question to develop traffic prediction models with a small size of training data on large-scale networks. We notice that the traffic states of a node for the near future only depend on the traffic states of its localized neighborhoods, which can be represented using the graph relational inductive biases. In view of this, this paper develops a graph network (GN)-based deep learning model LocaleGN that depicts the traffic dynamics using localized data aggregating and updating functions, as well as the node-wise recurrent neural networks. LocaleGN is a light-weighted model designed for training on few samples without over-fitting, and hence it can solve the problem of few-sample traffic prediction. The proposed model is examined on predicting both traffic speed and flow with six datasets, and the experimental results demonstrate that LocaleGN outperforms existing state-of-the-art baseline models. It is also demonstrated that the learned knowledge from LocaleGN can be transferred across cities. The research outcomes can help to develop light-weighted traffic prediction systems, especially for cities lacking historically archived traffic data.
translated by 谷歌翻译
Deep learning has achieved notable success in 3D object detection with the advent of large-scale point cloud datasets. However, severe performance degradation in the past trained classes, i.e., catastrophic forgetting, still remains a critical issue for real-world deployment when the number of classes is unknown or may vary. Moreover, existing 3D class-incremental detection methods are developed for the single-domain scenario, which fail when encountering domain shift caused by different datasets, varying environments, etc. In this paper, we identify the unexplored yet valuable scenario, i.e., class-incremental learning under domain shift, and propose a novel 3D domain adaptive class-incremental object detection framework, DA-CIL, in which we design a novel dual-domain copy-paste augmentation method to construct multiple augmented domains for diversifying training distributions, thereby facilitating gradual domain adaptation. Then, multi-level consistency is explored to facilitate dual-teacher knowledge distillation from different domains for domain adaptive class-incremental learning. Extensive experiments on various datasets demonstrate the effectiveness of the proposed method over baselines in the domain adaptive class-incremental learning scenario.
translated by 谷歌翻译
语义搜索是一项重要的任务,目的是从数据库中找到相关索引以进行查询。它需要一个可以正确学习句子语义的检索模型。基于变压器的模型由于其出色的学习语义表示能力而被广泛用作检索模型。同时,还提出了许多适合它们的正则化方法。在本文中,我们提出了一种新的正则化方法:正则化对比度学习,可以帮助基于变压器的模型学习更好地表示句子。首先,它为每个句子增强了几个不同的语义表示,然后将它们作为监管机构的对比目标。这些对比调节器可以克服过度拟合的问题并减轻各向异性问题。我们首先使用优于预训练的模型Sroberta对7个语义搜索基准测试进行评估。结果表明,我们的方法更有效地学习了出色的句子表示。然后,我们评估具有长期查询和索引的2个具有挑战性的FAQ数据集,咳嗽和FAQIR。我们的实验结果表明,我们的方法表现优于基线方法。
translated by 谷歌翻译
现有的模仿学习方法主要集中于使代理有效地模仿一种表现出的行为,但并未解决行为方式与任务目标之间的潜在矛盾。普遍缺乏有效的方法,使代理可以在完成任务的主要目标的同时部分模仿不同程度的演示行为。在本文中,我们提出了一种称为正规软批评的方法,该方法在受约束的马尔可夫决策过程框架(CMDP)下制定了主要任务和模仿任务。主要任务定义为软性参数(SAC)中使用的最大熵目标,模仿任务定义为约束。我们评估了与视频游戏应用程序相关的连续控制任务的方法。
translated by 谷歌翻译
最近,已证明模型的神经网络模型可以提高计算机视觉和增强学习任务的样本效率。本文在机器人策略学习的背景下探讨了这一想法,在这种情况下,必须完全在物理机器人系统上学习策略,而无需参考模型,模拟器或离线数据集。我们专注于模棱两可的SAC在机器人操作中的应用,并探索算法的许多变化。最终,我们证明了通过在不到一小时或两个小时的壁时钟时间内的机上体验完全学习几项非平凡操纵任务的能力。
translated by 谷歌翻译
许多智能交通系统是多种代理系统,即交​​通参与者和运输基础设施内的子系统都可以被建模为互动代理。使用基于AI的方法在不同的代理系统之间实现协调可以提供更好的安全系统,这些运输系统仅包含人类操作车辆的运输系统,并在交通吞吐量,传感范围和启用协作任务方面提高系统效率。然而,增加的自主权使运输基础设施容易受到损害的车辆代理或基础设施。本文通过将信托权限嵌入运输基础设施来系统地量化称为主观逻辑的认知逻辑来系统地量化代理商的可信度来提出新的框架。在本文中,我们提出了以下新的贡献:(i)我们提出了一个框架,以利用代理商的量化可靠性来实现信任感知的协调和控制。 (ii)我们展示如何使用基于强化学习的方法来综合信任感知控制器。 (iii)我们全面分析了自主交叉口管理(AIM)案例研究,并制定了一个名为AIM-Trust的信任知识版本,导致在由可信和不受信任的代理商的混合中的情景中导致事故率降低。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译