A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly complicated and large-scale (e.g., BERT [1]). To address this problem, we propose leveraging a suite of visual simplification techniques, including a cycle-removing method, a module-based edge-pruning algorithm, and an isomorphic subgraph stacking strategy. We design and implement an interactive visualization system that is suitable for computational graphs with up to 10 thousand elements. Experimental results and usage scenarios demonstrate that our tool reduces 60% elements on average and hence enhances the performance for recognizing and diagnosing DNN models. Our contributions are integrated into an open-source DNN visualization toolkit, namely, MindInsight [2].
translated by 谷歌翻译
Masked Modeling (MM) has demonstrated widespread success in various vision challenges, by reconstructing masked visual patches. Yet, applying MM for large-scale 3D scenes remains an open problem due to the data sparsity and scene complexity. The conventional random masking paradigm used in 2D images often causes a high risk of ambiguity when recovering the masked region of 3D scenes. To this end, we propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points, effectively enhancing the pretext masking task for 3D scene understanding. Integrated with a progressive reconstruction manner, our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction. Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas. By elegantly combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded. We conduct comprehensive experiments on a host of downstream tasks. The consistent improvement (e.g., +6.1 mAP@0.5 on object detection and +2.2% mIoU on semantic segmentation) demonstrates the superiority of our approach.
translated by 谷歌翻译
The material science literature contains up-to-date and comprehensive scientific knowledge of materials. However, their content is unstructured and diverse, resulting in a significant gap in providing sufficient information for material design and synthesis. To this end, we used natural language processing (NLP) and computer vision (CV) techniques based on convolutional neural networks (CNN) to discover valuable experimental-based information about nanomaterials and synthesis methods in energy-material-related publications. Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively. Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3\% classification accuracy and 4.3% data extraction mean square error. Our results show that these systems could assess the suitability of materials for a certain application by evaluation of synthesis insights and case analysis with detailed references. This work offers a fresh perspective on mining knowledge from scientific literature, providing a wide swatch to accelerate nanomaterial research through CNN.
translated by 谷歌翻译
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.
translated by 谷歌翻译
脑膜瘤等级的术前和非侵入性预测在临床实践中很重要,因为它直接影响临床决策。更重要的是,脑膜瘤中的大脑侵袭(即,在相邻脑组织中存在肿瘤组织)是脑膜瘤分级的独立标准,并影响了治疗策略。尽管据报道已经努力解决这两个任务,但其中大多数依赖于手工制作的功能,并且没有尝试同时利用这两个预测任务。在本文中,我们提出了一种新型的任务意识到的对比学习算法,以共同预测来自多模式MRI的脑膜瘤等级和脑部侵袭。基于基本的多任务学习框架,我们的关键思想是采用对比度学习策略,以将图像功能分解为特定于任务的功能和任务遵守功能,并明确利用其固有的连接以改善两个预测任务的功能表示形式。在这项回顾性研究中,收集了一个MRI数据集,通过病理分析,有800名患者(含有148个高级,62名侵袭)患有脑膜瘤。实验结果表明,所提出的算法的表现优于替代性多任务学习方法,其AUCS分别为0:8870和0:9787,分别用于预测脑膜瘤等级和脑部侵袭。该代码可在https://github.com/isdling/predicttcl上找到。
translated by 谷歌翻译
多模式MR成像通常用于临床实践中,以通过提供丰富的互补信息来诊断和研究脑肿瘤。以前的多模式MRI分割方法通常通过在网络的早期/中阶段连接多模式MRIS来执行模态融合,这几乎无法探索模态之间的非线性依赖性。在这项工作中,我们提出了一种新型的嵌套模态感知变压器(嵌套形式),以明确探索多模式MRIS在脑肿瘤分割中的模式内和模式间关系。我们建立在基于变压器的多模型和单一码头结构的基础上,我们对不同模式的高级表示进行嵌套的多模式融合,并在较低的尺度上应用对模态敏感的门控(MSG),以进行更有效的跳过连接。具体而言,多模式融合是在我们提出的嵌套模态感知特征聚合(NMAFA)模块中进行的,该模块通过三个方向的空间意见变压器增强了单个模态内的长期依赖性,并进一步补充了模态信息之间的关键情境信息。通过跨模式注意变压器。关于BRATS2020基准和私人脑膜瘤细分(Maniseg)数据集的广泛实验表明,嵌套形式显然比最先进的表现优于最先进的。该代码可从https://github.com/920232796/nestedformer获得。
translated by 谷歌翻译
作为自动驾驶系统的核心部分,运动计划已受到学术界和行业的广泛关注。但是,由于非体力学动力学,尤其是在存在非结构化的环境和动态障碍的情况下,没有能够有效的轨迹计划解决方案能够为空间周期关节优化。为了弥合差距,我们提出了一种多功能和实时轨迹优化方法,该方法可以在任意约束下使用完整的车辆模型生成高质量的可行轨迹。通过利用类似汽车的机器人的差异平坦性能,我们使用平坦的输出来分析所有可行性约束,以简化轨迹计划问题。此外,通过全尺寸多边形实现避免障碍物,以产生较少的保守轨迹,并具有安全保证,尤其是在紧密约束的空间中。我们通过最先进的方法介绍了全面的基准测试,这证明了所提出的方法在效率和轨迹质量方面的重要性。现实世界实验验证了我们算法的实用性。我们将发布我们的代码作为开源软件包,目的是参考研究社区。
translated by 谷歌翻译
最近,已经开发了各种视觉变压器作为对远程依赖性建模的能力。在当前的基于变压器的主骨用于医疗图像分割的骨架中,卷积层被纯变压器替换,或者将变压器添加到最深的编码器中以学习全球环境。但是,从规模的角度来看,主要有两个挑战:(1)尺度内问题:在每个尺度中提取局部全球线索所缺乏的现有方法,这可能会影响小物体的信号传播; (2)尺度间问题:现有方法未能从多个量表中探索独特的信息,这可能会阻碍表示尺寸,形状和位置广泛的对象的表示形式学习。为了解决这些局限性,我们提出了一个新颖的骨干,即比例尺形式,具有两个吸引人的设计:(1)尺度上的尺度内变压器旨在将基于CNN的本地功能与每个尺度中的基于变压器的全球线索相结合,在行和列的全局依赖项上可以通过轻巧的双轴MSA提取。 (2)一种简单有效的空间感知尺度变压器旨在以多个尺度之间的共识区域相互作用,该区域可以突出跨尺度依赖性并解决复杂量表的变化。对不同基准测试的实验结果表明,我们的尺度形式的表现优于当前最新方法。该代码可公开可用:https://github.com/zjugivelab/scaleformer。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions. Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which suggests that the data-dependent term in the upper bound is intrinsic. Our theoretical results also highlight a notion of "relative uncertainty", which characterizes the necessary and sufficient condition for achieving sample efficiency in offline MGs. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation.
translated by 谷歌翻译