IOT应用中的总是关于Tinyml的感知任务需要非常高的能量效率。模拟计算内存(CIM)使用非易失性存储器(NVM)承诺高效率,并提供自包含的片上模型存储。然而,模拟CIM推出了新的实际考虑因素,包括电导漂移,读/写噪声,固定的模数转换器增益等。必须解决这些附加约束,以实现可以通过可接受的模拟CIM部署的模型精度损失。这项工作描述了$ \ textit {analognets} $:tinyml模型用于关键字点(kws)和视觉唤醒词(VWW)的流行始终是on。模型架构专门为模拟CIM设计,我们详细介绍了一种全面的培训方法,以在推理时间内保持面对模拟非理想的精度和低精度数据转换器。我们还描述了AON-CIM,可编程,最小面积的相变存储器(PCM)模拟CIM加速器,具有新颖的层串行方法,以消除与完全流水线设计相关的复杂互连的成本。我们在校准的模拟器以及真正的硬件中评估了对校准模拟器的矛盾,并发现精度下降限制为KWS / VWW的PCM漂移(8位)24小时后的0.8 $ \%$ / 1.2 $ \%$。在14nm AON-CIM加速器上运行的analognets使用8位激活,分别使用8位激活,并增加到57.39 / 25.69个顶部/ w,以4美元$ 4 $ 57.39 / 25.69。
translated by 谷歌翻译
Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
translated by 谷歌翻译
Periocular refers to the region of the face that surrounds the eye socket. This is a feature-rich area that can be used by itself to determine the identity of an individual. It is especially useful when the iris or the face cannot be reliably acquired. This can be the case of unconstrained or uncooperative scenarios, where the face may appear partially occluded, or the subject-to-camera distance may be high. However, it has received revived attention during the pandemic due to masked faces, leaving the ocular region as the only visible facial area, even in controlled scenarios. This paper discusses the state-of-the-art of periocular biometrics, giving an overall framework of its most significant research aspects.
translated by 谷歌翻译
Traditionally, data analysis and theory have been viewed as separate disciplines, each feeding into fundamentally different types of models. Modern deep learning technology is beginning to unify these two disciplines and will produce a new class of predictively powerful space weather models that combine the physical insights gained by data and theory. We call on NASA to invest in the research and infrastructure necessary for the heliophysics' community to take advantage of these advances.
translated by 谷歌翻译
Multi-class ensemble classification remains a popular focus of investigation within the research community. The popularization of cloud services has sped up their adoption due to the ease of deploying large-scale machine-learning models. It has also drawn the attention of the industrial sector because of its ability to identify common problems in production. However, there are challenges to conform an ensemble classifier, namely a proper selection and effective training of the pool of classifiers, the definition of a proper architecture for multi-class classification, and uncertainty quantification of the ensemble classifier. The robustness and effectiveness of the ensemble classifier lie in the selection of the pool of classifiers, as well as in the learning process. Hence, the selection and the training procedure of the pool of classifiers play a crucial role. An (ensemble) classifier learns to detect the classes that were used during the supervised training. However, when injecting data with unknown conditions, the trained classifier will intend to predict the classes learned during the training. To this end, the uncertainty of the individual and ensemble classifier could be used to assess the learning capability. We present a novel approach for novel detection using ensemble classification and evidence theory. A pool selection strategy is presented to build a solid ensemble classifier. We present an architecture for multi-class ensemble classification and an approach to quantify the uncertainty of the individual classifiers and the ensemble classifier. We use uncertainty for the anomaly detection approach. Finally, we use the benchmark Tennessee Eastman to perform experiments to test the ensemble classifier's prediction and anomaly detection capabilities.
translated by 谷歌翻译
This paper is about the design of an automated machine to cut turbot fish specimens. Machine vision is a key part of this project as it is used to compute a cutting curve for the specimen head. This task is impossible to be carried out by mechanical means. Machine vision is used to detect head boundary and a robot is used to cut the head. Binarization and mathematical morphology are used to detect fish boundary and this boundary is subsequently analyzed (using Hough transform and convex hull) to detect key points and thus defining the cutting curve. Afterwards, mechanical systems are used to slice fish to get an easy presentation for end consumer (as fish fillets than can be easily marketed and consumed).
translated by 谷歌翻译
Chatbots are expected to be knowledgeable across multiple domains, e.g. for daily chit-chat, exchange of information, and grounding in emotional situations. To effectively measure the quality of such conversational agents, a model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains. Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another. This calls for a dedicated network architecture for domain generalization. To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. To validate the idea, we construct a high-quality multi-domain dialogue dataset leveraging data augmentation and pseudo-labeling. The PoE network is comprehensively assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. It achieves state-of-the-art performance in terms of mean Spearman correlation over all the evaluation datasets. It exhibits better zero-shot generalization than existing state-of-the-art ADEMs and the ability to easily adapt to new domains with few-shot transfer learning.
translated by 谷歌翻译
Early recognition of clinical deterioration (CD) has vital importance in patients' survival from exacerbation or death. Electronic health records (EHRs) data have been widely employed in Early Warning Scores (EWS) to measure CD risk in hospitalized patients. Recently, EHRs data have been utilized in Machine Learning (ML) models to predict mortality and CD. The ML models have shown superior performance in CD prediction compared to EWS. Since EHRs data are structured and tabular, conventional ML models are generally applied to them, and less effort is put into evaluating the artificial neural network's performance on EHRs data. Thus, in this article, an extremely boosted neural network (XBNet) is used to predict CD, and its performance is compared to eXtreme Gradient Boosting (XGBoost) and random forest (RF) models. For this purpose, 103,105 samples from thirteen Brazilian hospitals are used to generate the models. Moreover, the principal component analysis (PCA) is employed to verify whether it can improve the adopted models' performance. The performance of ML models and Modified Early Warning Score (MEWS), an EWS candidate, are evaluated in CD prediction regarding the accuracy, precision, recall, F1-score, and geometric mean (G-mean) metrics in a 10-fold cross-validation approach. According to the experiments, the XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
translated by 谷歌翻译
The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy in few-shot settings.
translated by 谷歌翻译
Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
translated by 谷歌翻译