在过去的几年中,无监督的域适应性(UDA)技术在计算机视觉中具有显着的重要性和流行。但是,与可用于图像的广泛文献相比,视频领域仍然相对尚未探索。另一方面,动作识别模型的性能受到域转移的严重影响。在本文中,我们提出了一种简单新颖的UDA方法,以供视频动作识别。我们的方法利用了时空变压器的最新进展来构建一个强大的源模型,从而更好地概括了目标域。此外,由于引入了来自信息瓶颈原则的新颖对齐损失术语,我们的架构将学习域不变功能。我们报告了UDA的两个视频动作识别基准的结果,显示了HMDB $ \ leftrightArrow $ ucf的最新性能,以及动力学$ \ rightarrow $ nec-Drone,这更具挑战性。这证明了我们方法在处理不同级别的域转移方面的有效性。源代码可在https://github.com/vturrisi/udavt上获得。
translated by 谷歌翻译
We are concerned with learning models that generalize well to different unseen domains. We consider a worst-case formulation over data distributions that are near the source domain in the feature space. Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the current model. We show that our iterative scheme is an adaptive data augmentation method where we append adversarial examples at each iteration. For softmax losses, we show that our method is a data-dependent regularization scheme that behaves differently from classical regularizers that regularize towards zero (e.g., ridge or lasso). On digit recognition and semantic segmentation tasks, our method learns models improve performance across a range of a priori unknown target domains.
translated by 谷歌翻译
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows researchers to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint tracking of multiple objects simultaneously. Furthermore, we propose a Transformer-based GOT tracker TaMOS capable of joint processing of multiple objects through shared computation. TaMOs achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark. Finally, TaMOs achieves highly competitive results on single-object GOT datasets, setting a new state-of-the-art on TrackingNet with a success rate AUC of 84.4%. Our benchmark, code, and trained models will be made publicly available.
translated by 谷歌翻译
Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.
translated by 谷歌翻译
Recent advances in Federated Learning (FL) have paved the way towards the design of novel strategies for solving multiple learning tasks simultaneously, by leveraging cooperation among networked devices. Multi-Task Learning (MTL) exploits relevant commonalities across tasks to improve efficiency compared with traditional transfer learning approaches. By learning multiple tasks jointly, significant reduction in terms of energy footprints can be obtained. This article provides a first look into the energy costs of MTL processes driven by the Model-Agnostic Meta-Learning (MAML) paradigm and implemented in distributed wireless networks. The paper targets a clustered multi-task network setup where autonomous agents learn different but related tasks. The MTL process is carried out in two stages: the optimization of a meta-model that can be quickly adapted to learn new tasks, and a task-specific model adaptation stage where the learned meta-model is transferred to agents and tailored for a specific task. This work analyzes the main factors that influence the MTL energy balance by considering a multi-task Reinforcement Learning (RL) setup in a robotized environment. Results show that the MAML method can reduce the energy bill by at least 2 times compared with traditional approaches without inductive transfer. Moreover, it is shown that the optimal energy balance in wireless networks depends on uplink/downlink and sidelink communication efficiencies.
translated by 谷歌翻译
Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed dataset towards negative instances for extractive summarization; we thus adopt a margin ranking loss for extraction to encourage separation between positive and negative examples. Our extraction component operates at the constituent level; our approach to this problem enriches the text with spinal tree information which provides syntactic context (in the form of constituents) to the extraction model. We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset.
translated by 谷歌翻译
我们专注于一个典型的物流部门的卸载问题,该问题被建模为顺序的选择任务。在这种类型的任务中,现代的机器学习技术已经显示出比经典系统更好的工作,因为它们更适合随机性,并且能够更好地应对大型不确定性。更具体地说,在这方面,有监督和模仿学习取得了出色的成果,因为需要某种形式的监督,这对于所有设置并不总是可获得的。另一方面,加固学习(RL)需要许多更温和的监督形式,但由于其效率低下仍然不切实际。在本文中,我们提出并理论上激励了一种新颖的无监督奖励构成算法,从专家的观察结果中塑造了算法,该算法放宽了代理商所需的监督水平,并致力于改善我们任务中的RL绩效。
translated by 谷歌翻译
我们研究使用动物视频来提高增强学习(RL)效率和性能的可能性。从理论角度来看,我们激励使用加权策略优化对非政策RL的使用,描述从视频中学习并提出解决方案时面临的主要挑战。我们在离线和在线RL中测试我们的想法,并在一系列2D导航任务上显示令人鼓舞的结果。
translated by 谷歌翻译
在室外和室内环境中的精确定位是一个具有挑战性的问题,目前构成了几种实际应用的重要限制。超宽带(UWB)本地化技术代表了解决该问题的宝贵低成本解决方案。然而,特定无线电环境的非视线(NLOS)条件和复杂性很容易在范围测量中引入正偏见,从而导致高度不准确和不令人满意的位置估计。鉴于此,我们利用了深神网络优化技术的最新进步及其在超低功率微控制器上的实施,以引入有效的范围错误缓解解决方案,该解决方案可在NLOS或LOS条件下提供校正,并具有几兆瓦的功率。我们广泛的实验认可了我们的低成本和力量效率方法的优势和改进。
translated by 谷歌翻译
域的概括(DG)研究了深度学习模型推广到训练分布的能力。在过去的十年中,文献已经大量填充了一系列培训方法,这些方法声称获得了更抽象和强大的数据表示以应对域的转移。最近的研究为DG提供了可再现的基准,指出了天真的经验风险最小化(ERM)对现有算法的有效性。然而,研究人员坚持使用相同过时的特征提取器,并且尚未注意不同骨干的影响。在本文中,我们从骨干开始,提出了对其内在概括能力的全面分析,迄今为止,研究界忽略了。我们评估了各种特征提取器,从标准残差解决方案到基于变压器的架构,发现大规模单域分类精度和DG功能之间的线性相关性。我们广泛的实验表明,通过采用竞争性骨干与有效的数据增强结合使用,普通ERM的表现优于最近的DG解决方案,并实现了最先进的准确性。此外,我们的其他定性研究表明,新型骨架提供了与同类样本更相似的表示,从而将特征空间中的不同域分开。这种概括能力的增强功能使DG算法的边缘空间为调查问题,提出了一个新的范式,将骨干放在聚光灯下,并鼓励在其顶部开发一致的算法。
translated by 谷歌翻译