终身学习发生在几分钟到几十年的时间尺度上。人们可以在新技能上失去自己,练习几个小时,直到精疲力尽。他们可以在几天或几十年的时间里掌握掌握,也许完全放弃了旧技能,以寻求新的挑战。对学习的充分理解需要一个整合这些时间尺度的帐户。在这里,我们提出了一个最小的定量模型,该模型统一了学习的嵌套时间尺度。我们的动态模型恢复了技能获取的经典记载,并描述了学习如何从动机,疲劳和工作的动力学动态出现,同时也位于技能选择,精通和遗弃的长期动态中。我们应用此模型来探索各种培训制度的好处和陷阱,并表征动机和技能发展方面的个体差异。我们的模型连接以前不同的时间尺度 - 以及通常在每个时间范围内孤立研究的子学科,以提供有关技能获取时间的统一说明。
translated by 谷歌翻译
随机傅立叶特征(RFF)方法是内核方法可扩展性的强大而流行的技术。 RFF的理论基础是基于将对称,正定(PD)函数与概率度量相关联的Bochner定理。这种条件自然排除了在实践中具有广泛应用的不对称函数,例如有向图,条件概率和不对称内核。然而,从理论和经验上尚不清楚理解不对称函数(内核)及其通过RFF的可伸缩性尚不清楚。在本文中,我们引入了一种复杂的度量,其真实和虚构部分对应于四个有限的正措施,从而扩大了Bochner定理的应用范围。通过这样做,该框架允许通过一种积极度量来处理经典的对称,PD内核;通过签名措施对称,非阳性的确定内核;并通过复杂的措施通过不对称内核,从而将它们统一为RFF的一般框架,称为Ask-RFF。从统一收敛的角度来看,通过复杂措施通过复杂度量的这种近似方案享有理论保证。在算法实现中,由于总质量的计算而加快内核近似过程,这是昂贵的,我们采用了一种基于子集的快速估计方法,可优化子训练集中的总质量。我们的ask-rffs方法在几个典型的大规模数据集上得到了经验验证,并实现了有希望的内核近似性能,这证明了Ask-RFF的有效性。
translated by 谷歌翻译
部署各种深度学习(DL)型号有效地推动了DL编译器的研究。生成优化的张量码的难度驱动DL编译器以询问自动调整方法,并且越来越多的需求需要增加自动调整效率和质量。目前,DL编译器将输入DL模型分区为几个子图,并利用自动调整以找到这些子图的最佳张量代码。然而,现有的自学方法通常将子图视为个体,并且在其上忽略了它们的相似性,因此在有限的时间预算下未能利用更好的张力代码。我们向DL编译器提出Familyseer,即使有限的时间预算也可以生成更好的张量码。 Familyseer利用子图之间的相似性,并且子图之间的差异可以将它们组织成示例家庭,其中调整一个子图也可以改善同一家庭内的其他子图。每个家庭的成本模型获得了更多由家庭产生的纯化培训样本,并更准确,以便通过成本模型用轻量级估计来替换真正硬件上的昂贵测量。我们的实验表明,FamilySeer可以比最先进的自动调整框架更有效地生成模型代码,比最先进的自动调整框架更有效。
translated by 谷歌翻译
深度学习框架和硬件平台的蓬勃发展一直在要求一个有效的编译器,该编译器可以掩盖软件和硬件的多样性,以便提供应用程序可移植性。在现有的深度学习编译器中,TVM以其在各种硬件设备之间的代码生成和优化方面的效率而闻名。同时,Sunway多核处理器将其作为竞争性候选人,因为其在科学计算和深度学习工作负载中都有吸引力的计算能力。本文结合了这两个方向的趋势。具体来说,我们提出了SWTVM,该SWTVM扩展了原始TVM,以提前支持架构,以进行跨补偿,例如Sunway。此外,我们利用汇编过程中的体系结构特征,例如用于大规模并行性的核心组,用于高带宽内存传输的DMA和局部设备存储器的数据区域,以生成有效的代码,以在Sunway上进行深度学习工作负载。实验结果表明,与六个代表性基准相比,SWTVM生成的代码平均达到1.79倍。这项工作是从编译器角度进行的首次尝试,以弥合深度学习和Sunway处理器的差距,尤其是在生产力和效率方面。我们认为,这项工作将鼓励更多的人拥抱深度学习和Sunway多核处理器的力量。
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译