目前正在辩论中,将人工智能应用于科学问题(即科学的AI)。但是,科学问题与传统的问题,图像,文本等等传统问题有很大不同,在这些问题中,由于不平衡的科学数据和物理设置的复杂效果出现了新的挑战。在这项工作中,我们证明了深卷卷神经网络(CNN)在存在强热波动和不平衡数据的情况下重建晶格拓扑(即自旋连接性)的有效性。以Glauber动力学为例,以动力学模型为例,CNN映射了从特定的初始配置(称为演化实例)演变为时期的局部磁矩(单个节点特征),以映射到概率的概率可能的耦合。我们的方案与以前可能需要有关节点动力学的知识,来自扰动的响应或统计量的评估(例如相关性或转移熵)与许多进化实例的评估。微调避免了高温下强烈的热波动引起的“贫瘠高原”。可以进行准确的重建,如果热波动在相关性上占主导地位,从而总体上失败的统计方法。同时,我们揭示了CNN的概括,以处理从不太初始旋转构型和带有未经晶格的实例演变而来的实例。我们在几乎“双重指数”大型样本空间中使用不平衡的数据提出了一个关于学习的公开问题。
translated by 谷歌翻译
鉴于在黑板上画的白色鞋子的图像,白色像素如何被视为(例如人类的思想)来识别鞋子而没有任何像素的标签信息?在这里,我们从张量网络(TN)机器学习和量子纠缠的角度研究了这样的“白鞋”识别问题。利用一种捕获特征作为量子幅度的概率分布的生成型TN,我们提出了一种无监督的识别方案的信息特征方案,其纠缠熵(EE)的变化是由设计的测量引起的。这样,给定的样本,其特征的值在统计上是毫无意义的,它映射到统计上有意义的EE的变化。我们表明,EE变化确定了对识别该特定样本至关重要的功能,并且EE本身揭示了TN模型中的信息分布。变化的迹象进一步揭示了特征之间的纠缠结构。我们在带状图像的玩具数据集,手绘数字的MNIST数据集以及时尚文章图片的时尚数据集中测试了计划的有效性。我们的方案为量子启发和解释的无监督学习打开了途径,可以应用于例如图像分割和对象检测。
translated by 谷歌翻译
状态制备是在量子物理学中的基本重要性,这可以通过将量子电路构造为整体来实现,该单一地将初始状态转换为目标,或者实现量子控制协议以设计的汉密尔顿人发展到目标状态。在这项工作中,我们通过用固定耦合和变分磁场的时间演变来研究后者对量子的数量。具体而言,我们考虑准备汉密尔顿人的地面州,其中包含汉密尔顿人的某些互动的互动,以时间进化。提出了一种优化方法来通过“微粒”的离散化来优化磁场,以获得高精度和稳定性。利用反向传播技术来获得违反对数保真度的字段的梯度。我们的方法在准备Heisenberg链的地面状态与XY和Ising互动的时间演变进行了准备,其性能超过了两种使用本地和全球优化策略的基线方法。我们的工作可以应用和推广到其他量子型号,例如在高维格子上定义的型号。它启示以降低所需交互的复杂性,以通过优化磁场实现量子信息和计算中的量子信息和其他任务。
translated by 谷歌翻译
突起从量子物理学起源的张量网络是古典和量子机学习的有效工具。然而,张量网络与古典机器学习的复杂神经网络模型仍然存在相当大的精度差距。在这项工作中,我们将矩阵产品状态(MPS),最简单的张量网络结构和残差神经网络的思想结合起来,提出了残余矩阵产品状态(Resmps)。 RESMP可以被视为其层,其中其层将“隐藏”特征映射到输出(例如,分类),并且层的变分参数是样本的特征(例如图像的像素)的功能。这与神经网络不同,其中层将向前映射到输出的功能。 RESMP可以用非线性激活和丢弃层配备,并且在效率,稳定性和表达功率方面优于最先进的张量网络模型。此外,Resmps是从多项式扩展的角度解释的,其中分解和指数机器自然出现。我们的工作有助于连接和杂交的神经和张量网络,这对进一步提高了我们了解工作机制并提高两种模型的性能至关重要。
translated by 谷歌翻译
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
In this paper, we present the Circular Accessible Depth (CAD), a robust traversability representation for an unmanned ground vehicle (UGV) to learn traversability in various scenarios containing irregular obstacles. To predict CAD, we propose a neural network, namely CADNet, with an attention-based multi-frame point cloud fusion module, Stability-Attention Module (SAM), to encode the spatial features from point clouds captured by LiDAR. CAD is designed based on the polar coordinate system and focuses on predicting the border of traversable area. Since it encodes the spatial information of the surrounding environment, which enables a semi-supervised learning for the CADNet, and thus desirably avoids annotating a large amount of data. Extensive experiments demonstrate that CAD outperforms baselines in terms of robustness and precision. We also implement our method on a real UGV and show that it performs well in real-world scenarios.
translated by 谷歌翻译
Advanced visual localization techniques encompass image retrieval challenges and 6 Degree-of-Freedom (DoF) camera pose estimation, such as hierarchical localization. Thus, they must extract global and local features from input images. Previous methods have achieved this through resource-intensive or accuracy-reducing means, such as combinatorial pipelines or multi-task distillation. In this study, we present a novel method called SuperGF, which effectively unifies local and global features for visual localization, leading to a higher trade-off between localization accuracy and computational efficiency. Specifically, SuperGF is a transformer-based aggregation model that operates directly on image-matching-specific local features and generates global features for retrieval. We conduct experimental evaluations of our method in terms of both accuracy and efficiency, demonstrating its advantages over other methods. We also provide implementations of SuperGF using various types of local features, including dense and sparse learning-based or hand-crafted descriptors.
translated by 谷歌翻译
The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., decrease as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error (from an ontology of 7 types) is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) BUMP enables measuring the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, 3) BUMP enables the measurement of metrics' performance on individual error types and highlights areas of weakness for future work.
translated by 谷歌翻译
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs are skill-demanding, time-consuming, and non-scalable to batch production. Although generative models emerge to make design automation no longer utopian, it remains non-trivial to customize designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground contents. In this study, we propose \textit{LayoutDETR} that inherits the high quality and realism from generative modeling, in the meanwhile reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal elements in a layout. Experiments validate that our solution yields new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ads banner dataset. For practical usage, we build our solution into a graphical system that facilitates user studies. We demonstrate that our designs attract more subjective preference than baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.
translated by 谷歌翻译