As one of the most popular micro-mobility options, e-scooters are spreading in hundreds of big cities and college towns in the US and worldwide. In the meantime, e-scooters are also posing new challenges to traffic safety. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more flexible and much faster than the pedestrains and bicyclists. These features make e-scooters challenging for human drivers, pedestrians, vehicle active safety modules, and self-driving modules to see and interact. To study this new mobility option and address e-scooter riders' and other road users' safety concerns, this paper proposes a wearable data collection system for investigating the micro-level e-Scooter motion behavior in a Naturalistic road environment. An e-Scooter-based data acquisition system has been developed by integrating LiDAR, cameras, and GPS using the robot operating system (ROS). Software frameworks are developed to support hardware interfaces, sensor operation, sensor synchronization, and data saving. The integrated system can collect data continuously for hours, meeting all the requirements including calibration accuracy and capability of collecting the vehicle and e-Scooter encountering data.
translated by 谷歌翻译
Are extralinguistic signals such as image pixels crucial for inducing constituency grammars? While past work has shown substantial gains from multimodal cues, we investigate whether such gains persist in the presence of rich information from large language models (LLMs). We find that our approach, LLM-based C-PCFG (LC-PCFG), outperforms previous multi-modal methods on the task of unsupervised constituency parsing, achieving state-of-the-art performance on a variety of datasets. Moreover, LC-PCFG results in an over 50% reduction in parameter count, and speedups in training time of 1.7x for image-aided models and more than 5x for video-aided models, respectively. These results challenge the notion that extralinguistic signals such as image pixels are needed for unsupervised grammar induction, and point to the need for better text-only baselines in evaluating the need of multi-modality for the task.
translated by 谷歌翻译
Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide multifaceted assessments of a model's robustness performance. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt. In addition, we define robustness metrics for code generation models considering the worst-case behavior under each type of perturbation, taking advantage of the fact that executing the generated code can serve as objective evaluation. We demonstrate ReCode on SOTA models using HumanEval, MBPP, as well as function completion tasks derived from them. Interesting observations include: better robustness for CodeGen over InCoder and GPT-J; models are most sensitive to syntax perturbations; more challenging robustness evaluation on MBPP over HumanEval.
translated by 谷歌翻译
Novel artificial intelligence (AI) technology has expedited various scientific research, e.g., cosmology, physics and bioinformatics, inevitably becoming a significant category of workload on high performance computing (HPC) systems. Existing AI benchmarks tend to customize well-recognized AI applications, so as to evaluate the AI performance of HPC systems under predefined problem size, in terms of datasets and AI models. Due to lack of scalability on the problem size, static AI benchmarks might be under competent to help understand the performance trend of evolving AI applications on HPC systems, in particular, the scientific AI applications on large-scale systems. In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications. To enable scalability, SAIH builds a set of novel mechanisms for augmenting problem sizes. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems, and further diagnose system bottlenecks. To verify our methodology, we augment a cosmological AI application to evaluate a real HPC system equipped with GPUs as a case study of SAIH.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
开放世界对象检测是一个更具笼统和挑战性的目标,旨在识别和本地化由任意类别名称描述的对象。最近的工作GLIP通过将检测数据集的所有类别名称连接到句子中,从而将此问题作为接地问题,从而导致类别名称之间的效率低下的相互作用。本文介绍了Distclip,这是一种通过诉诸于设计概念词典的知识富集,是一种平行的视觉概念训练预训练方法,用于开放世界检测。为了提高学习效率,我们提出了一种新型的并行概念公式,该公式分别提取概念,以更好地利用异质数据集(即检测,接地和图像文本对)进行培训。我们进一步设计了来自各种在线资源和检测数据集的概念字典〜(带有描述),以提供每个概念的先验知识。通过用描述丰富这些概念,我们明确地建立了各种概念之间的关系,以促进开放域学习。所提出的概念词典进一步用于提供足够的负面概念,用于构建单词区域对齐损失\,并完成图像对文本对数据标题中缺少描述的对象的标签。所提出的框架显示出强烈的零射击性能性能,例如,在LVIS数据集上,我们的DETCLIP-T优于9.9%的地图GLIPT-T优于GLIP-T,并且与完全避免的型号相比,稀有类别的稀有类别提高了13.5%。作为我们的。
translated by 谷歌翻译
随着深度学习的进步,演讲者的验证取得了很高的准确性,并且在我们日常生活中的许多场景中,尤其是Web服务市场不断增长的一种生物识别验证选项,成为一种生物识别验证选项。与传统密码相比,“人声密码”更加方便,因为它们可以减轻人们记住不同密码的记忆。但是,新的机器学习攻击使这些语音身份验证系统处于危险之中。没有强大的安全保证,攻击者可以通过欺骗基于深神经网络(DNN)的语音识别模型来访问合法用户的Web帐户。在本文中,我们证明了对语音身份验证系统的易于实现的数据中毒攻击,这几乎无法通过现有的防御机制来捕获。因此,我们提出了一种更强大的防御方法,称为“卫报”,该方法是基于卷积神经网络的歧视者。监护人歧视者整合了一系列新型技术,包括减少偏见,输入增强和集成学习。我们的方法能够将约95%的攻击帐户与普通帐户区分开,这比仅准确性60%的现有方法更有效。
translated by 谷歌翻译
准确,快速的双核细胞(BC)检测在预测白血病和其他恶性肿瘤的风险中起着重要作用。但是,手动显微镜计数是耗时的,缺乏客观性。此外,由于bc显微镜整体幻灯片图像(WSIS)的染色质量和多样性的限制,传统的图像处理方法是无助的。为了克服这一挑战,我们提出了一种基于深度学习的结构启发的两阶段检测方法,该方法是基于深度学习的,该方法是在斑块级别的WSI-Level和细粒度分类处实施BCS粗略检测的级联。粗糙检测网络是基于用于细胞检测的圆形边界框的多任务检测框架,以及用于核检测的中心关键点。圆的表示降低了自由度,与通常的矩形盒子相比,减轻周围杂质的影响,并且在WSI中可能是旋转不变的。检测细胞核中的关键点可以帮助网络感知,并在后来的细粒分类中用于无监督的颜色层分割。精细的分类网络由基于颜色层掩模的监督和基于变压器的关键区域选择模块组成的背景区域抑制模块,其全局建模能力。此外,首先提出了无监督和未配对的细胞质发生器网络来扩展长尾分配数据集。最后,在BC多中心数据集上进行实验。拟议的BC罚款检测方法在几乎所有评估标准中都优于其他基准,从而为诸如癌症筛查等任务提供了澄清和支持。
translated by 谷歌翻译