我们将人机协作问题解决的问题视为一项计划任务,再加上自然语言交流。我们的框架由三个组成部分组成 - 一种自然语言引擎,将语言话语解析为正式代表,反之亦然,这是一个概念学习者,该概念学习者基于与用户的有限互动来诱导计划的广义概念,以及解决方案的HTN规划师,以解决该计划。基于人类互动的任务。我们说明了该框架通过在基于Minecraft的Blocksworld域中的协作构建任务中证明协作问题解决的关键挑战的能力。随附的演示视频可在https://youtu.be/q1pwe4aahf0上获得。
translated by 谷歌翻译
索赔检测和验证对于新闻认识至关重要,并且已成为有前途的技术,以减轻新闻中的错误信息。然而,大多数现有的工作侧重于索赔句子的分析,同时俯瞰关键背景属性,例如索引者,声称对象和连接到索赔的其他知识。在这项工作中,我们提供了新闻本,新的基准,了解新闻领域的知识意识索赔检测。我们重新定义了索赔探测问题,包括提取与索赔相关的附加背景属性,并发布529索赔由103个新闻文章提示。此外,报讯人旨在在新兴场景中索取索赔检测系统,包括不少培训数据的看不见的主题。最后,我们对这款新基准测试提供了对各种零射和及时的基础基准的全面评估。
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Large-scale models combining text and images have made incredible progress in recent years. However, they can still fail at tasks requiring compositional knowledge, such as correctly picking out a red cube from a picture of multiple shapes. We examine the ability of CLIP (Radford et al., 2021), to caption images requiring compositional knowledge. We implement five compositional language models to probe the kinds of structure that CLIP may be using, and develop a novel training algorithm, Compositional Skipgram for Images (CoSI), to train these models. We look at performance in attribute-based tasks, requiring the identification of a particular combination of attribute and object (such as "red cube"), and in relational settings, where the spatial relation between two shapes (such as "cube behind sphere") must be identified. We find that in some conditions, CLIP is able to learn attribute-object labellings, and to generalize to unseen attribute-object combinations. However, we also see evidence that CLIP is not able to bind features together reliably. Moreover, CLIP is not able to reliably learn relations between objects, whereas some compositional models are able to learn these perfectly. Of the five models we developed, none were able to generalize to unseen relations.
translated by 谷歌翻译
Modal verbs, such as "can", "may", and "must", are commonly used in daily communication to convey the speaker's perspective related to the likelihood and/or mode of the proposition. They can differ greatly in meaning depending on how they're used and the context of a sentence (e.g. "They 'must' help each other out." vs. "They 'must' have helped each other out.") Despite their practical importance in natural language understanding, linguists have yet to agree on a single, prominent framework for the categorization of modal verb senses. This lack of agreement stems from high degrees of flexibility and polysemy from the modal verbs, making it more difficult for researchers to incorporate insights from this family of words into their work. This work presents Moverb dataset, which consists of 27,240 annotations of modal verb senses over 4,540 utterances containing one or more sentences from social conversations. Each utterance is annotated by three annotators using two different theoretical frameworks (i.e., Quirk and Palmer) of modal verb senses. We observe that both frameworks have similar inter-annotator agreements, despite having different numbers of sense types (8 for Quirk and 3 for Palmer). With the RoBERTa-based classifiers fine-tuned on \dataset, we achieve F1 scores of 82.2 and 78.3 on Quirk and Palmer, respectively, showing that modal verb sense disambiguation is not a trivial task. Our dataset will be publicly available with our final version.
translated by 谷歌翻译
To achieve autonomy in a priori unknown real-world scenarios, agents should be able to: i) act from high-dimensional sensory observations (e.g., images), ii) learn from past experience to adapt and improve, and iii) be capable of long horizon planning. Classical planning algorithms (e.g. PRM, RRT) are proficient at handling long-horizon planning. Deep learning based methods in turn can provide the necessary representations to address the others, by modeling statistical contingencies between observations. In this direction, we introduce a general-purpose planning algorithm called PALMER that combines classical sampling-based planning algorithms with learning-based perceptual representations. For training these perceptual representations, we combine Q-learning with contrastive representation learning to create a latent space where the distance between the embeddings of two states captures how easily an optimal policy can traverse between them. For planning with these perceptual representations, we re-purpose classical sampling-based planning algorithms to retrieve previously observed trajectory segments from a replay buffer and restitch them into approximately optimal paths that connect any given pair of start and goal states. This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning. The end result is an experiential framework for long-horizon planning that is significantly more robust and sample efficient compared to existing methods.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
将触觉反馈从指尖转移到手腕上的重新定位被认为是使与混合现实虚拟环境的触觉相互作用的一种方式,同时使手指免费完成其他任务。我们介绍了一对腕触觉触觉设备以及一个虚拟环境,以研究手指和触觉者之间的各种映射如何影响任务性能。腕部呈现的触觉反馈反映了由食指和拇指控制的虚拟物体和虚拟化头像之间发生的相互作用。我们进行了一项用户研究,比较了四个不同的手指触觉反馈映射和一个无反馈条件作为对照。我们评估了用户通过任务完成时间的指标,手指和虚拟立方体的路径长度以及在指尖处的正常和剪切力的大小来评估了用户执行简单的选择任务的能力。我们发现多次映射是有效的,并且当视觉提示受到限制时会产生更大的影响。我们讨论了方法的局限性,并描述了朝着腕部磨损设备进行多重自由度触觉渲染的下一步步骤,以改善虚拟环境中的任务性能。
translated by 谷歌翻译
在这项工作中,我们介绍了亲和力-VAE:基于其相似性在多维图像数据中自动聚类和对象分类的框架。该方法扩展了$ \ beta $ -vaes的概念,其基于亲和力矩阵驱动的知情相似性损失组件。与标准的$ \ beta $ -VAE相比,该亲和力VAE能够在潜在表示中创建旋转不变的,形态上均匀的簇,并具有改进的群集分离。我们探讨了2D和3D图像数据上潜在空间的潜在分离和连续性的程度,包括模拟的生物电子冷冻术(Cryo-ET)体积,作为科学应用的一个例子。
translated by 谷歌翻译
在做出与农业有关的决策时,对特定地区的干旱的可能性至关重要。预测这种概率对于同时管理和挑战至关重要。预测模型应考虑在关注区域和相邻区域之间具有复杂关系的多个因素。我们通过提出基于时空神经网络的端到端解决方案来解决这个问题。该模型预测了帕尔默干旱严重程度指数(PDSI)的感兴趣子区域。气候模型的预测为模型提供了额外的知识来源,从而导致更准确的干旱预测。我们的模型的精度比基线梯度提升解决方案更好,因为它的$ r^2 $得分为0.90美元,而梯度提升的$ 0.85 $。具体关注是对模型的适用性范围。我们检查全球各个地区,以在不同条件下验证它们。我们通过分析不同场景的未来气候变化如何影响PDSI以及我们的模型如何帮助做出更好的决策和更可持续的经济学来补充结果。
translated by 谷歌翻译