我们提出了Clip-Lite,一种通过与文本注释的特征对齐方式进行视觉表示学习的信息有效方法。与先前提出的剪辑模型相比,剪辑液在优化其对比学学习目标期间只需要一个负图像文本样本对。我们通过利用信息有效的较低限制来实现这一点,以最大化两个输入模态之间的相互信息。这允许剪辑Lite培训,在获得比夹子的更好的性能的同时具有显着减少的数据和批量尺寸。我们通过在Coco-Tablions数据集上预先绘制来评估剪贴画并对其他数据集进行测试传输。 Clip-Lite在Pascal VOC分类上获得+ 15.4%的映射绝对增益,并在ImageNet上获得A + 22.1%的前1个精度增益,同时与其他更复杂,文本监督模型相当或优越。 Clip-Lite还优于剪辑图像和文本检索,零拍分类和视觉接地。最后,通过在表示学习期间执行显式图像文本对齐,我们显示Clip-Lite可以利用语言语义来鼓励可以在下游任务中使用的无偏见的视觉表示。
translated by 谷歌翻译
在这项工作中,我们提出了相互信息最大化知识蒸馏(MIMKD)。我们的方法使用对比目标来同时估计,并最大化教师和学生网络之间的本地和全球特征表示的相互信息的下限。我们通过广泛的实验证明,这可以通过将知识从更加性能但计算昂贵的模型转移来改善低容量模型的性能。这可用于产生更好的模型,可以在具有低计算资源的设备上运行。我们的方法灵活,我们可以将具有任意网络架构的教师蒸馏到任意学生网络。我们的经验结果表明,MIMKD优于各种学生教师对的竞争方法,具有不同的架构,以及学生网络的容量极低。我们能够通过从Reset-50蒸馏出来的知识,从基线精度为Shufflenetv2获得74.55%的精度。在Imagenet上,我们使用Reset-34教师网络将Reset-18网络从68.88%提高到70.32%的准确度(1.44%+)。
translated by 谷歌翻译
We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32× memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a Binary-Weight-Network version of AlexNet is the same as the full-precision AlexNet. We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than 16% in top-1 accuracy. Our code is available at: http://allenai.org/plato/xnornet.
translated by 谷歌翻译
Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects from a NeRF representation created from an RGB-D sequence. Our NeRF inpainting method leverages recent work in 2D image inpainting and is guided by a user-provided mask. Our algorithm is underpinned by a confidence based view selection procedure. It chooses which of the individual 2D inpainted images to use in the creation of the NeRF, so that the resulting inpainted NeRF is 3D consistent. We show that our method for NeRF editing is effective for synthesizing plausible inpaintings in a multi-view coherent manner. We validate our approach using a new and still-challenging dataset for the task of NeRF inpainting.
translated by 谷歌翻译
System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.
translated by 谷歌翻译
Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA.
translated by 谷歌翻译
自我定位是一种基本功能,移动机器人导航系统集成到使用地图从一个点转移到另一点。因此,任何提高本地化精度的增强对于执行精致的灵活性任务至关重要。本文描述了一个新的位置,该位置使用Monte Carlo定位(MCL)算法维护几个颗粒人群,始终选择最佳的粒子作为系统的输出。作为新颖性,我们的工作包括一种多尺度匹配匹配算法,以创建新的MCL群体和一个确定最可靠的指标。它还贡献了最新的实现,从错误的估计或未知的初始位置增加了恢复时间。在与NAV2完全集成的模块中评估了所提出的方法,并与当前的最新自适应ACML溶液进行了比较,从而获得了良好的精度和恢复时间。
translated by 谷歌翻译
多模式心脏成像在心血管疾病患者的治疗中起关键作用。它允许互补的解剖学,形态学和功能信息,提高诊断准确性,并提高心血管干预和临床结果的疗效。多模式心脏图像的完全自动化处理和定量分析可能会对临床研究和基于证据的患者管理产生直接影响。但是,这些需要克服重大挑战,包括模式间未对准和寻找最佳方法来整合来自不同模式的信息。本文旨在对心脏病学,计算方法,验证策略,相关临床工作流程和未来观点的多模式成像进行全面综述。对于计算方法,我们对这三个任务(即注册,融合和分割)有利,通常涉及多模式成像数据,\ textit {结合来自不同模式的信息或跨模态传输信息的信息}。该评论强调,多模式性心脏成像数据具有广泛适用性的诊所,例如跨体瓣植入指南,心肌生存能力评估和导管消融疗法及其患者选择。然而,许多挑战仍未解决,例如缺失模态,成像和非成像数据的组合以及统一的分析和不同方式的表示。定义完善的技术如何适合临床工作流程以及它们引入了多少其他相关信息,这也有工作要做。这些问题可能会继续是一个积极的研究领域,并且将来要回答的问题。
translated by 谷歌翻译
患者特异性的心脏计算模型对于使用数字双胞胎的精密医学和silico临床试验的有效实现至关重要。心脏数字双胞胎可以为个别患者提供心脏功能的非侵入性特征,因此对于患者特定的诊断和治疗分层有希望。然而,目前的解剖学和功能性孪生阶段的工作流,指的是模型解剖结构和临床数据的参数的推断,并不足够有效,稳健且准确。在这项工作中,我们提出了一个基于深度学习的特定于患者的计算模型,该模型可以融合解剖学和电生理信息,以推理心室激活特性,即传导速度和根节点。激活特性可以提供对心脏电生理功能的定量评估,以指导介入。我们采用Eikonal模型来生成具有地面真实特性的模拟心电图(ECG),以训练推理模型,在此还考虑了特定的患者信息。为了进行评估,我们在模拟数据上测试模型,并以快速的计算时间获得通常有希望的结果。
translated by 谷歌翻译
在本文中,我们介绍了e-genia3代理商的扩展,以为移情剂的发展提供支持。新扩展程序修改了代理商的推理过程,以根据分析事件以及代理商的情感状态和个性选择计划。此外,我们的建议允许软件代理通过两个不同的事件评估过程模拟自我和其他代理之间的区别:移情评估过程,以使情绪作为对其他代理情绪的反应以及其他非情感评估过程的反应,并为其他非情感评估过程 - 同情情感事件。移情调节过程适应了基于人际因素(例如,代理人的人格和情感记忆)和代理人的人际特征(例如,代理人之间的情感联系),适应引起的同理心情绪。使用过去事件的记忆及其相应的引起的情绪,可以保持情感联系,以支持代理之间的长期移情互动。
translated by 谷歌翻译