Unhealthy dietary habits are considered as the primary cause of multiple chronic diseases such as obesity and diabetes. The automatic food intake monitoring system has the potential to improve the quality of life (QoF) of people with dietary related diseases through dietary assessment. In this work, we propose a novel contact-less radar-based food intake monitoring approach. Specifically, a Frequency Modulated Continuous Wave (FMCW) radar sensor is employed to recognize fine-grained eating and drinking gestures. The fine-grained eating/drinking gesture contains a series of movement from raising the hand to the mouth until putting away the hand from the mouth. A 3D temporal convolutional network (3D-TCN) is developed to detect and segment eating and drinking gestures in meal sessions by processing the Range-Doppler Cube (RD Cube). Unlike previous radar-based research, this work collects data in continuous meal sessions. We create a public dataset that contains 48 meal sessions (3121 eating gestures and 608 drinking gestures) from 48 participants with a total duration of 783 minutes. Four eating styles (fork & knife, chopsticks, spoon, hand) are included in this dataset. To validate the performance of the proposed approach, 8-fold cross validation method is applied. Experimental results show that our proposed 3D-TCN outperforms the model that combines a convolutional neural network and a long-short-term-memory network (CNN-LSTM), and also the CNN-Bidirectional LSTM model (CNN-BiLSTM) in eating and drinking gesture detection. The 3D-TCN model achieves a segmental F1-score of 0.887 and 0.844 for eating and drinking gestures, respectively. The results of the proposed approach indicate the feasibility of using radar for fine-grained eating and drinking gesture detection and segmentation in meal sessions.
translated by 谷歌翻译
使用毫米波(MMWAVE)信号的人类手势识别提供有吸引力的应用,包括智能家居和车载界面。虽然现有的作品在受控设置下实现有前途的性能,但实际应用仍然有限,因为需要密集数据收集,适应新域时的额外培训努力(即环境,人员和地点)和实时识别的表现不佳。在本文中,我们提出了Di-Gesture,一个独立于域和实时MMWAVE手势识别系统。具体地,我们首先导出与具有空间时间处理的人体手势对应的信号变化。为了增强系统的稳健性并减少数据收集工作,我们根据信号模式与手势变化之间的相关性设计数据增强框架。此外,我们提出了一种动态窗口机制来自动且准确地执行手势分割,从而能够实时识别。最后,我们建立了一种轻量级神经网络,以从用于手势分类的数据中提取空间信息。广泛的实验结果表明,Di-Gesture分别为新用户,环境和地点的平均精度为97.92%,99.18%和98.76%。在实时场景中,Di-Gesutre的准确性达到97%以上,平均推断时间为2.87ms,这表明了我们系统的优越稳健性和有效性。
translated by 谷歌翻译
检测有害的携带物体在智能监控系统中起着关键作用,例如,在机场安全中具有广泛的应用。在本文中,我们专注于使用低成本77GHz MMWVEAVE雷达的相对未开发的区域,用于携带物体检测问题。该建议的系统能够实时检测三类对象 - 笔记本电脑,手机和刀具 - 在开放的携带和隐藏的情况下,物体隐藏着衣服或袋子。这种能力是通过用于定位的初始信号处理来实现的,用于定位和生成范围 - 方位角升降图像立方体,然后是基于深度学习的预测网络和用于检测对象的多枪后处理模块。用于验证检测开放携带和隐藏物体的系统性能的广泛实验已经提出了一种自制雷达相机测试用和数据集。此外,分析了不同输入,因素和参数对系统性能的影响,为系统提供了直观的理解。该系统是旨在使用77GHz雷达检测携带物体的其他未来作品的第一个基线。
translated by 谷歌翻译
人类身份是对日常生活中许多应用的关键要求,例如个性化服务,自动监视,连续身份验证和大流行期间的接触跟踪等。这项工作研究了跨模式人类重新识别(REID)的问题,对跨摄像机允许区域(例如街道)和摄像头限制区域(例如办公室)的常规人类运动的反应。通过利用新出现的低成本RGB-D摄像机和MMWave雷达,我们提出了同时跨模式多人REID的首个视觉RF系统。首先,为了解决基本模式间差异,我们提出了一种基于人体观察到的镜面反射模型的新型签名合成算法。其次,引入了有效的跨模式深度度量学习模型,以应对在雷达和相机之间由非同步数据引起的干扰。通过在室内和室外环境中进行的广泛实验,我们证明了我们所提出的系统能够达到约92.5%的TOP-1准确性,而在56名志愿者中,〜97.5%的前5位精度。我们还表明,即使传感器的视野中存在多个主题,我们提出的系统也能够重新识别受试者。
translated by 谷歌翻译
近年来,MMWave FMCW雷达吸引了人类居中应用的大量研究兴趣,例如人类姿态/活动识别。大多数现有的管道由传统的离散傅立叶变换(DFT)预处理和深神经网络分类器混合方法建立,其中大多数以前的作品专注于设计下游分类器以提高整体精度。在这项工作中,我们返回返回并查看预处理模块。为了避免传统DFT预处理的缺点,我们提出了一个名为Cubelearn的学习预处理模块,直接从原始雷达信号中提取特征,并为MMWAVE FMCW雷达运动识别应用构建端到端的深神经网络。广泛的实验表明,我们的立方体模块一直提高不同管道的分类准确性,特别是利益以前较弱的模型。我们提供关于所提出的模块的初始化方法和结构的消融研究,以及对PC和边缘设备上运行时间的评估。这项工作也用作不同方法对数据立方体切片的比较。通过我们的任务无关设计,我们向雷达识别问题提出了一步迈向通用端到端解决方案。
translated by 谷歌翻译
In this article we present SHARP, an original approach for obtaining human activity recognition (HAR) through the use of commercial IEEE 802.11 (Wi-Fi) devices. SHARP grants the possibility to discern the activities of different persons, across different time-spans and environments. To achieve this, we devise a new technique to clean and process the channel frequency response (CFR) phase of the Wi-Fi channel, obtaining an estimate of the Doppler shift at a radio monitor device. The Doppler shift reveals the presence of moving scatterers in the environment, while not being affected by (environment-specific) static objects. SHARP is trained on data collected as a person performs seven different activities in a single environment. It is then tested on different setups, to assess its performance as the person, the day and/or the environment change with respect to those considered at training time. In the worst-case scenario, it reaches an average accuracy higher than 95%, validating the effectiveness of the extracted Doppler information, used in conjunction with a learning algorithm based on a neural network, in recognizing human activities in a subject and environment independent way. The collected CFR dataset and the code are publicly available for replicability and benchmarking purposes.
translated by 谷歌翻译
设计可以成功部署在日常生活环境中的活动检测系统需要构成现实情况典型挑战的数据集。在本文中,我们介绍了一个新的未修剪日常生存数据集,该数据集具有几个现实世界中的挑战:Toyota Smarthome Untrimmed(TSU)。 TSU包含以自发方式进行的各种活动。数据集包含密集的注释,包括基本的,复合活动和涉及与对象相互作用的活动。我们提供了对数据集所需的现实世界挑战的分析,突出了检测算法的开放问题。我们表明,当前的最新方法无法在TSU数据集上实现令人满意的性能。因此,我们提出了一种新的基线方法,以应对数据集提供的新挑战。此方法利用一种模态(即视线流)生成注意力权重,以指导另一种模态(即RGB)以更好地检测活动边界。这对于检测以高时间差异为特征的活动特别有益。我们表明,我们建议在TSU和另一个受欢迎的挑战数据集Charades上优于最先进方法的方法。
translated by 谷歌翻译
Human Activity Recognition (HAR) is an emerging technology with several applications in surveillance, security, and healthcare sectors. Noninvasive HAR systems based on Wi-Fi Channel State Information (CSI) signals can be developed leveraging the quick growth of ubiquitous Wi-Fi technologies, and the correlation between CSI dynamics and body motions. In this paper, we propose Principal Component-based Wavelet Convolutional Neural Network (or PCWCNN) -- a novel approach that offers robustness and efficiency for practical real-time applications. Our proposed method incorporates two efficient preprocessing algorithms -- the Principal Component Analysis (PCA) and the Discrete Wavelet Transform (DWT). We employ an adaptive activity segmentation algorithm that is accurate and computationally light. Additionally, we used the Wavelet CNN for classification, which is a deep convolutional network analogous to the well-studied ResNet and DenseNet networks. We empirically show that our proposed PCWCNN model performs very well on a real dataset, outperforming existing approaches.
translated by 谷歌翻译
可穿戴设备,不断收集用户的各种传感器数据,增加了无意和敏感信息的推论的机会,例如在物理键盘上键入的密码。我们彻底看看使用电拍摄(EMG)数据的潜力,这是一个新的传感器模式,这是市场新的,但最近在可穿戴物的上下文中受到关注,用于增强现实(AR),用于键盘侧通道攻击。我们的方法是基于使用Myo Armband收集传感器数据的逼真场景中对象攻击之间的神经网络。在我们的方法中,与加速度计和陀螺相比,EMG数据被证明是最突出的信息来源,增加了击键检测性能。对于我们对原始数据的端到端方法,我们报告了击键检测的平均平衡准确性,击键检测的平均高度高精度为52级,为不同优势密码的密钥识别约32% 。我们创建了一个广泛的数据集,包括从37个志愿者录制的310 000次击键,它可作为开放式访问,以及用于创建给定结果的源代码。
translated by 谷歌翻译
低成本毫米波(MMWAVE)通信和雷达设备的商业可用性开始提高消费市场中这种技术的渗透,为第五代(5G)的大规模和致密的部署铺平了道路(5G) - 而且以及6G网络。同时,普遍存在MMWAVE访问将使设备定位和无设备的感测,以前所未有的精度,特别是对于Sub-6 GHz商业级设备。本文使用MMWAVE通信和雷达设备在基于设备的定位和无设备感应中进行了现有技术的调查,重点是室内部署。我们首先概述关于MMWAVE信号传播和系统设计的关键概念。然后,我们提供了MMWaves启用的本地化和感应方法和算法的详细说明。我们考虑了在我们的分析中的几个方面,包括每个工作的主要目标,技术和性能,每个研究是否达到了一定程度的实现,并且该硬件平台用于此目的。我们通过讨论消费者级设备的更好算法,密集部署的数据融合方法以及机器学习方法的受过教育应用是有前途,相关和及时的研究方向的结论。
translated by 谷歌翻译
最近,出于手术目的,基于视频的应用程序的发展不断增长。这些应用程序的一部分可以在程序结束后离线工作,其他应用程序必须立即做出反应。但是,在某些情况下,应在过程中进行响应,但可以接受一些延迟。在文献中,已知在线访问性能差距。我们在这项研究中的目标是学习绩效 - 延迟权衡并设计一种基于MS-TCN ++的算法,该算法可以利用这种权衡。为此,我们使用了开放手术模拟数据集,其中包含96个参与者的视频,这些视频在可变的组织模拟器上执行缝合任务。在这项研究中,我们使用了从侧视图捕获的视频数据。对网络进行了训练,以识别执行的手术手势。幼稚的方法是减少MS-TCN ++深度,结果减少了接受场,并且还减少了所需的未来帧数。我们表明该方法是最佳的,主要是在小延迟情况下。第二种方法是限制每个时间卷积中可访问的未来。这样,我们在网络设计方面具有灵活性,因此,与幼稚的方法相比,我们的性能要好得多。
translated by 谷歌翻译
Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive. The code will be available soon.
translated by 谷歌翻译
The ability to identify and temporally segment finegrained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.
translated by 谷歌翻译
本文介绍了一场组织的结果,以评估3D手姿势序列中异质手势的在线识别方法的方法。任务是检测属于以不同姿势和运动特征为特征的16个类词典的手势。该数据集具有手跟踪数据的连续序列,其中手势与不显着的动作交织在一起。在现实的混合现实交互用例中,使用HoloLens 2手指跟踪系统捕获了数据。评估不仅基于检测性能,还基于延迟和误报,使您可以根据提出的算法了解实际交互工具的可行性。比赛评估的结果表明需要进一步研究以减少识别错误,而所提出的算法的计算成本足够低。
translated by 谷歌翻译
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.
translated by 谷歌翻译
睡眠是一种基本的生理过程,对于维持健康的身心至关重要。临床睡眠监测的黄金标准是多核桃摄影(PSG),基于哪个睡眠可以分为五个阶段,包括尾脉冲睡眠(REM睡眠)/非REM睡眠1(N1)/非REM睡眠2 (n2)/非REM睡眠3(n3)。然而,PSG昂贵,繁重,不适合日常使用。对于长期睡眠监测,无处不在的感测可以是解决方案。最近,心脏和运动感测在分类三阶段睡眠方面变得流行,因为两种方式都可以从研究级或消费者级设备中获得(例如,Apple Watch)。但是,为最大准确性融合数据的最佳仍然是一个打开的问题。在这项工作中,我们综合地研究了深度学习(DL)的高级融合技术,包括三种融合策略,三个融合方法以及三级睡眠分类,基于两个公共数据集。实验结果表明,通过融合心脏/运动传感方式可以可靠地分类三阶段睡眠,这可能成为在睡眠中进行大规模睡眠阶段评估研究或长期自动跟踪的实用工具。为了加快普遍存在/可穿戴计算社区的睡眠研究的进展,我们制作了该项目开源,可以在:https://github.com/bzhai/ubi-sleepnet找到代码。
translated by 谷歌翻译
机器学习和非接触传感器的进步使您能够在医疗保健环境中理解复杂的人类行为。特别是,已经引入了几种深度学习系统,以实现对自闭症谱系障碍(ASD)等神经发展状况的全面分析。这种情况会影响儿童的早期发育阶段,并且诊断完全依赖于观察孩子的行为和检测行为提示。但是,诊断过程是耗时的,因为它需要长期的行为观察以及专家的稀缺性。我们展示了基于区域的计算机视觉系统的效果,以帮助临床医生和父母分析孩子的行为。为此,我们采用并增强了一个数据集,用于使用在不受控制的环境中捕获的儿童的视频来分析自闭症相关的动作(例如,在各种环境中使用消费级摄像机收集的视频)。通过检测视频中的目标儿童以减少背景噪声的影响,可以预处理数据。在时间卷积模型的有效性的推动下,我们提出了能够从视频帧中提取动作功能并通过分析视频中的框架之间的关系来从视频帧中提取动作功能并分类与自闭症相关的行为。通过对功能提取和学习策略的广泛评估,我们证明了通过膨胀的3D Convnet和多阶段的时间卷积网络实现最佳性能,达到了0.83加权的F1得分,以分类三种自闭症相关的动作,超越表现优于表现现有方法。我们还通过在同一系统中采用ESNET主链来提出一个轻重量解决方案,实现0.71加权F1得分的竞争结果,并在嵌入式系统上实现潜在的部署。
translated by 谷歌翻译
Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional approaches follow a two-step pipeline, by generating framewise probabilities and then feeding them to high-level temporal models, recent approaches use temporal convolutions to directly classify the video frames. In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. Each stage features a set of dilated temporal convolutions to generate an initial prediction that is refined by the next one. This architecture is trained using a combination of a classification loss and a proposed smoothing loss that penalizes over-segmentation errors. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our model achieves state-of-the-art results on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.
translated by 谷歌翻译
5G无线技术和社会经济转型的最新进展带来了传感器应用的范式转移。 Wi-Fi信号表明其时间变化与身体运动之间存在很强的相关性,可以利用这些变化来识别人类活动。在本文中,我们证明了基于时间尺度Wi-Fi通道状态信息的自由互助人与人类相互作用识别方法的认知能力。所检查的共同活动是稳定的,接近,离职的,握手的,高五,拥抱,踢(左腿),踢(右腿),指向(左手),指向(右手),拳打(左手),打孔(右手)和推动。我们探索并提出了一个自我发项的双向封盖复发性神经网络模型,以从时间序列数据中对13种人类到人类的相互作用类型进行分类。我们提出的模型可以识别两个主题对相互作用,最大基准精度为94%。这已经扩展了十对对象,该对象对围绕交互 - 转变区域的分类得到了改善,从而确保了88%的基准精度。同样,使用PYQT5 Python模块开发了可执行的图形用户界面(GUI),以实时显示总体相互交流识别过程。最后,我们简要地讨论了有关残障的可能解决方案,这些解决方案导致了研究期间观察到的缩减。这种Wi-Fi渠道扰动模式分析被认为是一种有效,经济和隐私友好的方法,可在相互的人际关系识别中用于室内活动监测,监视系统,智能健康监测系统和独立的辅助生活。
translated by 谷歌翻译
全球2019百万人被感染,450万失去了持续的Covid-19大流行病。直到疫苗变得广泛的可用,预防措施和安全措施,如戴着面具,身体疏远,避免面对面触摸是一些抑制病毒传播的主要手段。脸部触摸是一种强迫性的人Begvior,在不进行持续派生的情况下,不能防止,即使那么它是不可避免的。为了解决这个问题,我们设计了一种基于SmartWatch的解决方案,Covidalert,利用了随机森林算法,从SmartWatch训练了加速度计和陀螺数据,以检测到面部的手动转换,并向用户发送快速触觉警报。 Covidalert是高能量的,因为它使用STA / LTA算法作为网守,在用户处于非活动状态时缩短手表上随机林模型的使用。我们的系统的整体准确性为88.4%,具有低假阴性和误报。我们还通过在商业化石Gen 5 Smartwatch上实现了系统的活力。
translated by 谷歌翻译