我们解决了视频动作识别的数据增强问题。视频中的标准增强策略是手工设计的,并随机对可能的增强数据点的空间进行采样,而不知道哪个增强点会更好,或者是通过启发式方法会更好。我们建议学习是什么使良好的视频供行动识别,并仅选择高质量的样本进行增强。特别是,我们选择前景和背景视频的视频合成作为数据增强过程,从而导致各种新样本。我们了解了哪对视频要增加,而无需实际综合它们。这降低了可能的增强空间,这具有两个优势:它节省了计算成本并提高了最终训练的分类器的准确性,因为增强对的质量高于平均水平。我们在整个训练环境中介绍了实验结果:几乎没有射击,半监督和完全监督。我们观察到所有这些都对动力学,UCF101,HMDB51的基准进行了一致的改进,并在设置上实现了有限数据的新最新设置。在半监督环境中,我们看到高达8.6%的改善。
translated by 谷歌翻译
场景图生成(SGG)旨在捕获对物体对之间的各种相互作用,这对于完整的场景了解至关重要。在整个关系集上培训的现有SGG方法未能由于培训数据中的各种偏差而导致视觉和文本相关性的复杂原理。学习表明像“ON”这样的通用空间配置的琐碎关系,而不是“停放”,例如“停放”,不执行这种复杂的推理,伤害泛化。为了解决这个问题,我们提出了一种新颖的SGG培训框架,以利用基于其信息的关系标签。我们的模型 - 不可知论培训程序对培训数据中的较少信息样本造成缺失的信息关系,并在算标签上培训算法的SGG模型以及现有的注释。我们表明,这种方法可以成功地与最先进的SGG方法结合使用,并在标准视觉基因组基准测试中显着提高它们的性能。此外,我们在更具挑战性的零射击设置中获得了看不见的三胞胎的相当大的改进。
translated by 谷歌翻译
电影拖车执行多种功能:他们向故事介绍了观众,传达了电影的情绪和艺术风格,并鼓励受众看电影。这些不同的功能使自动拖车产生充满挑战的努力。我们将其分解为两个小组:叙事结构识别和情绪预测。我们将电影作为图形,其中节点是截图,边缘表示它们之间的语义关系。我们使用联合对比培训学习这些关系,该联合对比培训利用剧本绘制的特权文本信息(例如,字符,措施,情况)。然后,无监督算法将遍历图,并生成人类法官更喜欢通过竞争监督方法产生的拖车。
translated by 谷歌翻译
零射击动作识别是识别无视觉示例的识别性类别的任务,只有在没有看到看到的类别的seman-tic嵌入方式。问题可以看作是学习一个函数,该函数可以很好地讲述不见的阶级实例,而不会在类之间失去歧视。神经网络可以模拟视觉类别之间的复杂边界,从而将其作为监督模型的成功范围。但是,这些高度专业化的类边界可能不会从看不见的班级转移到看不见的类别。在本文中,我们提出了基于质心的表示,该表示将视觉和语义表示,同时考虑所有训练样本,通过这种方式,对看不见的课程的实例很好。我们使用强化学习对群集进行优化,这对我们的工作方法表明了至关重要的。我们称提出的甲壳类动物的命名为Claster,并观察到它在所有标准数据集中始终超过最先进的方法,包括UCF101,HMDB51和奥运会运动;在Thestandard Zero-shot评估和广义零射击学习中。此外,我们表明我们的模型在图像域也可以进行com的性能,在许多设置中表现出色。
translated by 谷歌翻译
关于人类阅读的研究长期以来一直记录在阅读行为表明特定于任务的效果,但是建立一个通用模型来预测人类在给定任务中将显示什么的通用模型。我们介绍了Neat,这是人类阅读中注意力分配的计算模型,基于人类阅读优化了一项任务中关注经济和成功之间的权衡。我们的模型是使用当代神经网络建模技术实施的,并对注意力分配的分配方式在不同任务中如何变化做出明确的测试预测。我们在一项针对阅读理解任务的两个版本的眼影研究中对此进行了测试,发现我们的模型成功说明了整个任务的阅读行为。因此,我们的工作提供了证据表明,任务效果可以建模为对任务需求的最佳适应。
translated by 谷歌翻译
We study the problem of combining neural networks with symbolic reasoning. Recently introduced frameworks for Probabilistic Neurosymbolic Learning (PNL), such as DeepProbLog, perform exponential-time exact inference, limiting the scalability of PNL solutions. We introduce Approximate Neurosymbolic Inference (A-NeSI): a new framework for PNL that uses neural networks for scalable approximate inference. A-NeSI 1) performs approximate inference in polynomial time without changing the semantics of probabilistic logics; 2) is trained using data generated by the background knowledge; 3) can generate symbolic explanations of predictions; and 4) can guarantee the satisfaction of logical constraints at test time, which is vital in safety-critical applications. Our experiments show that A-NeSI is the first end-to-end method to scale the Multi-digit MNISTAdd benchmark to sums of 15 MNIST digits, up from 4 in competing systems. Finally, our experiments show that A-NeSI achieves explainability and safety without a penalty in performance.
translated by 谷歌翻译
Speech to text models tend to be trained and evaluated against a single target accent. This is especially true for English for which native speakers from the United States became the main benchmark. In this work, we are going to show how two simple methods: pre-trained embeddings and auxiliary classification losses can improve the performance of ASR systems. We are looking for upgrades as universal as possible and therefore we will explore their impact on several models architectures and several languages.
translated by 谷歌翻译
The Makespan Scheduling problem is an extensively studied NP-hard problem, and its simplest version looks for an allocation approach for a set of jobs with deterministic processing times to two identical machines such that the makespan is minimized. However, in real life scenarios, the actual processing time of each job may be stochastic around the expected value with a variance, under the influence of external factors, and the actual processing times of these jobs may be correlated with covariances. Thus within this paper, we propose a chance-constrained version of the Makespan Scheduling problem and investigate the theoretical performance of the classical Randomized Local Search and (1+1) EA for it. More specifically, we first study two variants of the Chance-constrained Makespan Scheduling problem and their computational complexities, then separately analyze the expected runtime of the two algorithms to obtain an optimal solution or almost optimal solution to the instances of the two variants. In addition, we investigate the experimental performance of the two algorithms for the two variants.
translated by 谷歌翻译
To mitigate climate change, the share of renewable needs to be increased. Renewable energies introduce new challenges to power grids due to decentralization, reduced inertia and volatility in production. The operation of sustainable power grids with a high penetration of renewable energies requires new methods to analyze the dynamic stability. We provide new datasets of dynamic stability of synthetic power grids and find that graph neural networks (GNNs) are surprisingly effective at predicting the highly non-linear target from topological information only. To illustrate the potential to scale to real-sized power grids, we demonstrate the successful prediction on a Texan power grid model.
translated by 谷歌翻译
The evolution of wireless communications into 6G and beyond is expected to rely on new machine learning (ML)-based capabilities. These can enable proactive decisions and actions from wireless-network components to sustain quality-of-service (QoS) and user experience. Moreover, new use cases in the area of vehicular and industrial communications will emerge. Specifically in the area of vehicle communication, vehicle-to-everything (V2X) schemes will benefit strongly from such advances. With this in mind, we have conducted a detailed measurement campaign with the purpose of enabling a plethora of diverse ML-based studies. The resulting datasets offer GPS-located wireless measurements across diverse urban environments for both cellular (with two different operators) and sidelink radio access technologies, thus enabling a variety of different studies towards V2X. The datasets are labeled and sampled with a high time resolution. Furthermore, we make the data publicly available with all the necessary information to support the on-boarding of new researchers. We provide an initial analysis of the data showing some of the challenges that ML needs to overcome and the features that ML can leverage, as well as some hints at potential research studies.
translated by 谷歌翻译