多关系网络在当今世界上起着重要作用,并用于捕获数据之间的复杂关系。它们的应用涵盖了许多领域,例如生物医学,财务,社会等,并且由于它们的可用性提高,因此寻找有效的方法来应对多个层的额外复杂性变得至关重要。在这项工作中,我们提出了一种新颖的方法,使用单个汇总的邻接矩阵来表示这些复杂的网络,并利用Primes作为关系的替代物。由于算术的基本定理,这允许使用单个邻接矩阵对整个多关系图进行无损,紧凑的表示。此外,该表示形式可以快速计算多跳的邻接矩阵,这对于各种下游任务很有用。我们提出了简单而复杂的任务,在这些任务中,此表示可以有用并展示其效率和性能。最后,我们还提供有关仍需要解决并激发未来工作的优势和公开挑战的见解。
translated by 谷歌翻译
复杂的事件识别和预测(CER/F)技术试图使用预定义的事件模式在流输入中提前检测甚至预测。这种模式并不总是事先知道,或者它们经常随着时间的推移而变化,从而使机器学习技术能够从数据中提取此类模式,在CER/F中非常理想。由于许多CER/F系统使用符号自动机代表此类模式,因此我们提出了一个自动机的家族,其中通过答案集编程(ASP)规则来定义了实用过渡的条件,并且由于ASP与符号与符号的牢固联系学习,可以直接从数据中学习。我们在ASP及其增量版本中介绍了这种学习方法,该方法以效率为效率,并能够扩展到大型数据集。我们在两个CER数据集上评估了我们的方法,并将其与最先进的自动机学习技术进行比较,从经验上讲,在预测精度和可扩展性方面都表现出了卓越的性能。
translated by 谷歌翻译
一个重点的轨道旨在发现与目标主题相关的尽可能多的网页,同时避免无关紧要的网页。增强学习(RL)已被用来优化集中的爬行。在本文中,我们提出了TRE,这是一个具有RL授权的框架,用于集中爬行。我们将爬行环境建模为马尔可夫决策过程,RL代理商旨在通过确定良好的爬行策略来解决该过程。从一些人提供的关键字和一个小文本语料库开始,预计将与目标主题相关,TRE遵循关键字设置的扩展过程,该过程指导爬行,并培训构成奖励功能的分类器。为了避免选择最佳动作的计算上不可行的蛮力方法,我们提出了树框架,这是一种基于决策树的算法,可适应大型状态和动作空间,仅找到少数代表性的动作。树木范围使代理可以通过选择最佳代表性动作而贪婪地选择近乎最佳的动作。在实验上,我们表明TRE在收获率(相关页面的比率)方面显着胜过最先进的方法,而树木的范围则通过数量级降低,在每个时间段上需要评估的动作数量。
translated by 谷歌翻译
复杂的事件识别(CER)系统在过去二十年中变得流行,因为它们能够“立即”检测在实时事件流上的模式。然而,缺乏预测模式可能发生在例如由Cer发动机实际检测到这种发生之前的模式。我们提出了一项正式的框架,试图解决复杂事件预测(CEF)的问题。我们的框架结合了两个形式主义:a)用于编码复杂事件模式的符号自动机; b)预测后缀树,可以提供自动机构的行为的简洁概率描述。我们比较我们提出的方法,以防止最先进的方法,并在准确性和效率方面展示其优势。特别地,预测后缀树是可变的马尔可夫模型,可以通过仅记住足够的信息的过去序列来捕获流中的长期依赖性。我们的实验结果表明了能够捕获这种长期依赖性的准确性的益处。这是通过增加我们模型的顺序来实现的,以满足需要执行给定顺序的所有可能的过去序列的所有可能的过去序列的详尽枚举的全阶马尔可夫模型。我们还广泛讨论CEF解决方案如何最佳地评估其预测的质量。
translated by 谷歌翻译
View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/
translated by 谷歌翻译
Modern speech recognition systems exhibits rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where diversity of training data is limited. In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision. We find that including source domain self-supervision stabilizes training and avoids mode collapse of the latent representations. For evaluation, we collect HParl, a $120$ hour speech corpus for Greek, consisting of plenary sessions in the Greek Parliament. We merge HParl with two popular Greek corpora to create GREC-MD, a test-bed for multi-domain evaluation of Greek ASR systems. In our experiments we find that, while other Unsupervised Domain Adaptation baselines fail in this resource-constrained environment, M2DS2 yields significant improvements for cross-domain adaptation, even when a only a few hours of in-domain audio are available. When we relax the problem in a weakly supervised setting, we find that independent adaptation for audio using M2DS2 and language using simple LM augmentation techniques is particularly effective, yielding word error rates comparable to the fully supervised baselines.
translated by 谷歌翻译
In this work, we propose a novel framework for estimating the dimension of the data manifold using a trained diffusion model. A trained diffusion model approximates the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. If the data concentrates around a manifold embedded in the high-dimensional ambient space, then as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximum likelihood increase. Therefore, for small levels of corruption, the diffusion model provides us with access to an approximation of the normal bundle of the data manifold. This allows us to estimate the dimension of the tangent space, thus, the intrinsic dimension of the data manifold. Our method outperforms linear methods for dimensionality detection such as PPCA in controlled experiments.
translated by 谷歌翻译
This project leverages advances in multi-agent reinforcement learning (MARL) to improve the efficiency and flexibility of order-picking systems for commercial warehouses. We envision a warehouse of the future in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput) under given resource constraints. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, the MARL framework can be flexibly applied to any warehouse configuration (e.g. size, layout, number/types of workers, item replenishment frequency) and the agents learn via a process of trial-and-error how to optimally cooperate with one another. This paper details the current status of the R&D effort initiated by Dematic and the University of Edinburgh towards a general-purpose and scalable MARL solution for the order-picking problem in realistic warehouses.
translated by 谷歌翻译
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? We revisit the design choices of ViTs and propose an improved supernet with low latency and high parameter efficiency. We further introduce a fine-grained joint search strategy that can find efficient architectures by optimizing latency and number of parameters simultaneously. The proposed models, EfficientFormerV2, achieve about $4\%$ higher top-1 accuracy than MobileNetV2 and MobileNetV2$\times1.4$ on ImageNet-1K with similar latency and parameters. We demonstrate that properly designed and optimized vision transformers can achieve high performance with MobileNet-level size and speed.
translated by 谷歌翻译
Multimodal learning pipelines have benefited from the success of pretrained language models. However, this comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. The adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. During the adaptation process the pre-trained language model parameters remain frozen, allowing for fast, parameter-efficient training. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise. Our experiments on sentiment analysis with CMU-MOSEI show that AMB outperforms the current state-of-the-art across metrics, with 3.4% relative reduction in the resulting error and 2.1% relative improvement in 7-class classification accuracy.
translated by 谷歌翻译