Novel artificial intelligence (AI) technology has expedited various scientific research, e.g., cosmology, physics and bioinformatics, inevitably becoming a significant category of workload on high performance computing (HPC) systems. Existing AI benchmarks tend to customize well-recognized AI applications, so as to evaluate the AI performance of HPC systems under predefined problem size, in terms of datasets and AI models. Due to lack of scalability on the problem size, static AI benchmarks might be under competent to help understand the performance trend of evolving AI applications on HPC systems, in particular, the scientific AI applications on large-scale systems. In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications. To enable scalability, SAIH builds a set of novel mechanisms for augmenting problem sizes. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems, and further diagnose system bottlenecks. To verify our methodology, we augment a cosmological AI application to evaluate a real HPC system equipped with GPUs as a case study of SAIH.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
人脸图像通常以广泛的视觉量表出现。现有的面部表示通过组装有限系列的预定尺度的多尺度方案来追求处理量表变化的带宽。这种多弹药方案带来了推理负担,而预定义的量表不可避免地从真实数据中差异。取而代之的是,从数据中学习比例参数,并将其用于单发功能推理是一个不错的解决方案。为此,我们通过诉诸规模空间理论并实现两倍的设施来改革Conv层:1)Conv层从真实数据分布中学习一组尺度,每个数据分布都由Conv内核来实现; 2)该图层自动在适当的通道和位置上突出显示与输入模式量表及其存在相对应的位置。然后,我们通过堆叠改革层的层来实现分层尺度的关注,建立一种名为“比例尺注意Cons Neurnet网络”(\ textbf {scan-cnn})的新颖风格。我们将扫描CNN应用于面部识别任务,并推动SOTA性能的前沿。当面部图像模糊时,准确性增长更为明显。同时,作为单发方案,该推断比多弹性融合更有效。与普通CNN相比,制造了一组工具,以确保对扫描CNN进行快速训练和推理成本的零增加。
translated by 谷歌翻译
痤疮检测对于解释性诊断和对皮肤疾病的精确治疗至关重要。任意边界和痤疮病变的尺寸较小,导致在两阶段检测中大量质量较差的建议。在本文中,我们提出了一个针对地区建议网络的新型头部结构,以两种方式提高建议的质量。首先,提出了一个空间意识的双头(SADH)结构,以从两个不同的空间角度从分类和本地化进行分类和本地化的表示。拟议的SADH确保了更陡峭的分类信心梯度,并抑制了与匹配的地面真理相交(IOU)低相交(IOU)的建议。然后,我们提出了一个归一化的Wasserstein距离预测分支,以改善提议分类评分与IOU之间的相关性。此外,为了促进痤疮检测的进一步研究,我们构建了一个名为Acnescu的新数据集,具有高分辨率成像,精确的注释和细粒度的病变类别。对AcnesCU和公共数据集Acne04进行了广泛的实验,结果表明该方法可以提高建议的质量,始终超过最先进的方法。代码和收集的数据集可在https://github.com/pingguokiller/acnedetection中找到。
translated by 谷歌翻译
在本文中,我们探讨了构建统一基础模型的可能性,该模型可以适应愿景和仅文本任务。从BERT和VIT开始,我们设计一个由模态特定标记,共享变压器编码器和任务特定的输出头组成的统一变压器。为了有效地预先列车在未配对的图像和文本上联合培训拟议的模型,我们提出了两种新颖的技术:(i)我们使用单独培训的BERT和VIT模型作为老师,并应用知识蒸馏,为关节提供额外的准确的监督信号训练; (ii)我们提出了一种新颖的渐变掩蔽策略,以平衡图像和文本预培训损失的参数更新。我们通过微调它分别在图像分类任务和自然语言理解任务上进行微调,评估联合预训练的变压器。实验表明,由此产生的统一基础变压器令人惊讶地在视觉和仅文本任务中令人惊讶地令人惊讶,并且所提出的知识蒸馏和梯度掩蔽策略可以有效地提升分别训练模型水平的性能。
translated by 谷歌翻译
人脸识别是计算机视觉中最受欢迎和最长的主题之一。随着最近的深度学习技术和大规模数据集的发展,深刻的面貌识别取得了显着的进展,并广泛用于许多现实世界应用。给定自然图像或视频帧作为输入,端到端的深脸识别系统输出面部特征以识别。为此,典型的端到端系统通常具有三个关键元素:面部检测,面部对准和面部表示。面部检测定位在图像或框架中的面部。然后,继续进行面对准以将面部校准到规范视图并将它们裁剪到归一化的像素尺寸。最后,在面部表示的阶段,从对准的面部提取歧视特征以进行识别。如今,所有三个要素都是通过深度卷积神经网络的技术实现的。在本调查文章中,我们对最近的端到端深刻识别的每个元素的进步进行了全面的审查,因为蓬勃发展学习技巧极大地提高了它们的能力。首先,我们概述了端到端深表识的概述。然后,我们分别审查每个元素的前进,涵盖许多方面,例如迄今的算法设计,评估指标,数据集,性能比较,对未来研究的有希望和有希望的方向。通过这项调查,我们希望在两个方面提出贡献:首先,读者可以方便地识别子类别中具有相当强大的基础风格的方法,以进一步探索;其次,人们还可以采用合适的方法来从划痕建立最先进的端到端面部识别系统。
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. When executing SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, we can reach 60% sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches.
translated by 谷歌翻译
Despite the success of large language models (LLMs) in various natural language processing (NLP) tasks, the stored knowledge in these models may inevitably be incomplete, out-of-date, or incorrect. This motivates the need to utilize external knowledge to assist LLMs. Unfortunately, current methods for incorporating external knowledge often require additional training or fine-tuning, which can be costly and may not be feasible for LLMs. To address this issue, we propose a novel post-processing approach, rethinking with retrieval (RR), which retrieves relevant external knowledge based on the decomposed reasoning steps obtained from the chain-of-thought (CoT) prompting. This lightweight approach does not require additional training or fine-tuning and is not limited by the input length of LLMs. We evaluate the effectiveness of RR through extensive experiments with GPT-3 on three complex reasoning tasks: commonsense reasoning, temporal reasoning, and tabular reasoning. Our results show that RR can produce more faithful explanations and improve the performance of LLMs.
translated by 谷歌翻译
Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32$\times$, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels ($\sim$30$\times$, $\sim$40$\times$), our method significantly improves the test accuracy and reduces the model size.
translated by 谷歌翻译