我们探索了知识蒸馏(KD)的使用来学习紧凑和准确的模型,这些模型可以从可穿戴设备上的加速度计算数据中分类动物行为。为此,我们采用了一个深厚而复杂的卷积神经网络,称为残留神经网络(RESNET)作为教师模型。 RESNET专为多元时间序列分类而设计。我们使用Resnet将动物行为分类数据集的知识歪曲到软标签中,其中由每个数据点的每个类别的伪概率组成。然后,我们使用软标签来训练我们的复杂学生模型,这些模型基于门控复发单元(GRU)和多层感知器(MLP)。使用两个现实世界动物行为分类数据集的评估结果表明,学生GRU-MLP模型的分类准确性通过KD明显改善,接近教师Resnet模型的分类精度。为了进一步减少使用KD训练的学生模型执行推理的计算和记忆要求,我们通过适当修改模型的计算图来利用动态定量量化。我们在我们专门构建的衣领的嵌入式系统和耳牌设备的嵌入式系统上实施了未量化和量化的版本,以实时和实时对动物行为进行分类。结果证实了KD和量化在分类准确性以及计算和记忆效率方面提高推理性能的有效性。
translated by 谷歌翻译
我们使用来自多种传感模式的数据,即加速度计和全球导航卫星系统(GNSS)来对动物行为进行分类。我们从GNSS数据中提取三个新功能,即距水点,中值和中位数估计的水平位置误差的距离。我们考虑了将加速度计和GNSS数据可用信息组合的两种方法。第一种方法是基于从传感器数据中提取的特征并将串联特征向量馈入多层感知器(MLP)分类器中的串联。第二种方法是基于将两个MLP分类器预测的后验概率融合,每个概率每个都以从一个传感器的数据为输入中提取的功能。我们使用两个通过智能牛领和耳号收集的现实世界数据集评估了开发的多模式动物行为行为分类算法的性能。一对一的动物交叉验证结果表明,这两种方法都可以显着改善分类性能,而仅使用一种传感模式的数据,特别是对于步行和饮酒的不经常但重要的行为。基于两种方法开发的算法都需要相当小的计算和内存资源,因此适合于我们的衣领和耳罩的嵌入式系统实现。但是,基于后验概率融合的多模式动物行为分类算法比基于特征串联的算法更可取,因为它提供了更好的分类精度,具有较低的计算和记忆复杂性,对传感器数据失败更强大,并且享受更好的模块化。 。
translated by 谷歌翻译
我们通过各种经常性神经网络(RNN)模型来研究使用加速度数据的动物行为的分类。我们评估所考虑模型的分类性能和复杂性,该模型具有长短短时间内存储器(LSTM)或具有不同深度和宽度的GET的经常性单元(GRU)架构,使用来自牛或耳朵标签获取的四个数据集。我们还包括两种最先进的卷积神经网络(CNN)基本的时间级分类模型在评估中。结果表明,与基于CNN的模型相比,基于RNN的模型可以实现相似或更高的分类精度,同时具有较少的计算和内存要求。我们还观察到,尽管不太复杂,但是Gru架构的模型通常以LSTM架构的架构优于架构。具有64个隐藏单元的单层单向GRU模型似乎在准确性和复杂性之间提供了良好的平衡,使其适合在边缘/嵌入式设备上实现。
translated by 谷歌翻译
我们开发了一种基于端到端的深神经网络基于基于网络的基于网络行为,用于使用安装在可穿戴式衣领标签中的嵌入式系统的嵌入式系统上的加速度数据进行分类动物行为。该算法共同执行利用一组无限脉冲响应(IIR)和有限脉冲响应(FIR)滤波器与多层的感知响应(FIR)滤波器共同执行特征提取和分类。使用的IIR和FIR滤波器可以分别被视为特定类型的复发和卷积神经网络层。我们通过从放牧牛收集的两个现实世界数据集评估所提出的算法的性能。结果表明,该算法提供了良好的数据集和数据集良好的分类准确性,并且优于其最接近的竞争者,包括基于两个最先进的卷积神经网络的时间序列分类算法,这些分类算法显着更复杂。我们在套领标签的AIOT设备的嵌入式系统上实施了所提出的算法,以便对动物行为的原位分类。我们从加速度数据中实现了实时的原位行为,而不会对嵌入式系统的可用计算,内存或能量资源产生任何应变。
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few. In this work, we present a new, compelling alternative for representing shapes using a sequence of cross-sectional closed loops. The loops across all planes form an organizational hierarchy which we leverage for autoregressive shape synthesis and editing. Loops are a non-local description of the underlying shape, as simple loop manipulations (such as shifts) result in significant structural changes to the geometry. This is in contrast to manipulating local primitives such as points in a point cloud or a triangle in a triangle mesh. We further demonstrate that loops are intuitive and natural primitive for analyzing and editing shapes, both computationally and for users.
translated by 谷歌翻译
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.
translated by 谷歌翻译
A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D data generation. We identify a technical challenge of distribution mismatch that arises in this application, and propose a novel estimation mechanism to resolve it. We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.
translated by 谷歌翻译
Very large language models such as GPT-3 have shown impressive performance across a wide variety of tasks, including text summarization. In this paper, we show that this strong performance extends to opinion summarization. We explore several pipeline methods for applying GPT-3 to summarize a large collection of user reviews in a zero-shot fashion, notably approaches based on recursive summarization and selecting salient content to summarize through supervised clustering or extraction. On two datasets, an aspect-oriented summarization dataset of hotel reviews and a generic summarization dataset of Amazon and Yelp reviews, we show that the GPT-3 models achieve very strong performance in human evaluation. We argue that standard evaluation metrics do not reflect this, and evaluate against several new measures targeting faithfulness, factuality, and genericity to contrast these different methods.
translated by 谷歌翻译
在对关键安全环境的强化学习中,通常希望代理在所有时间点(包括培训期间)服从安全性限制。我们提出了一种称为Spice的新型神经符号方法,以解决这个安全的探索问题。与现有工具相比,Spice使用基于符号最弱的先决条件的在线屏蔽层获得更精确的安全性分析,而不会不适当地影响培训过程。我们在连续控制基准的套件上评估了该方法,并表明它可以达到与现有的安全学习技术相当的性能,同时遭受较少的安全性违规行为。此外,我们提出的理论结果表明,在合理假设下,香料会收敛到最佳安全政策。
translated by 谷歌翻译