The rise in data has led to the need for dimension reduction techniques, especially in the area of non-scalar variables, including time series, natural language processing, and computer vision. In this paper, we specifically investigate dimension reduction for time series through functional data analysis. Current methods for dimension reduction in functional data are functional principal component analysis and functional autoencoders, which are limited to linear mappings or scalar representations for the time series, which is inefficient. In real data applications, the nature of the data is much more complex. We propose a non-linear function-on-function approach, which consists of a functional encoder and a functional decoder, that uses continuous hidden layers consisting of continuous neurons to learn the structure inherent in functional data, which addresses the aforementioned concerns in the existing approaches. Our approach gives a low dimension latent representation by reducing the number of functional features as well as the timepoints at which the functions are observed. The effectiveness of the proposed model is demonstrated through multiple simulations and real data examples.
translated by 谷歌翻译
Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. Therefore, it is of great significance to develop a method that can automatically segment tumor target regions. In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information. In addition, our network uses dual-channel inputs in the coding stage and fuses them in the decoding stage, which can take advantage of the differences and complementarities between PET and CT. We validated the proposed ISA-Net method on two clinical datasets, a soft tissue sarcoma(STS) and a head and neck tumor(HECKTOR) dataset, and compared with other attention methods for tumor segmentation. The DSC score of 0.8378 on STS dataset and 0.8076 on HECKTOR dataset show that ISA-Net method achieves better segmentation performance and has better generalization. Conclusions: The method proposed in this paper is based on multi-modal medical image tumor segmentation, which can effectively utilize the difference and complementarity of different modes. The method can also be applied to other multi-modal data or single-modal data by proper adjustment.
translated by 谷歌翻译
不确定性量化是现实世界应用中机器学习的主要挑战之一。在强化学习中,一个代理人面对两种不确定性,称为认识论不确定性和态度不确定性。同时解开和评估这些不确定性,有机会提高代理商的最终表现,加速培训并促进部署后的质量保证。在这项工作中,我们为连续控制任务的不确定性感知强化学习算法扩展了深层确定性策略梯度算法(DDPG)。它利用了认识论的不确定性,以加快探索和不确定性来学习风险敏感的政策。我们进行数值实验,表明我们的DDPG变体在机器人控制和功率网络优化方面的基准任务中均优于香草DDPG而没有不确定性估计。
translated by 谷歌翻译
最近,移动健康(MHealth)信息服务的使用增长,这些信息提供了有关改善体育活动的丰富指南。这些丰富的指南源于考虑各种个人行为因素,这些因素通常会偏离用户的健康状况。行为因素包括改变健身偏好,依从性问题以及对未来健身结果的不确定性,这可能都导致MHealth信息服务质量的下降。由于用户健康状况的动态,这些MHealth信息服务中有许多提供了有限的健身指南。本文使用深度强化学习寻求一种自适应方法,以提出个性化的体育活动建议,这是从回顾性的体育活动数据中学到的,并可以模拟现实的行为轨迹。我们基于有关体育活动的科学知识来为MHealth信息服务系统构建实时交互模型,以评估其运动表现。体育活动绩效评估模型用于考虑适应性和疲劳效果的最佳运动强度,以避免缺乏运动或超负荷。短期活动计划是使用深入的强化学习和个人健康状况随着时间而变化的。使用此方法,我们可以根据实际实施行为动态更新体育活动建议策略。通过与其他基准政策进行比较,我们基于DRL的推荐政策得到了验证。实验结果表明,这种自适应学习算法可以将推荐性能提高到4.13%以上。
translated by 谷歌翻译
常规的自我监督单眼深度预测方法基于静态环境假设,这导致由于对象运动引入的不匹配和遮挡问题而导致动态场景的准确性降解。现有的以动态对象为中心的方法仅部分解决了训练损失级别的不匹配问题。在本文中,我们因此提出了一种新型的多帧单眼预测方法,以在预测和监督损失水平上解决这些问题。我们的方法称为DynamicDepth,是一个新框架,该框架是通过自我监督周期一致的学习方案训练的。提出了动态对象运动解开(DOMD)模块以解开对象运动以解决不匹配问题。此外,新颖的闭塞成本量和重新投射损失旨在减轻对象运动的闭塞作用。对CityScapes和Kitti数据集进行的广泛分析和实验表明,我们的方法显着优于最先进的单眼深度预测方法,尤其是在动态对象的领域。代码可从https://github.com/autoailab/dynamicdepth获得
translated by 谷歌翻译
自动音频标题(AAC)旨在使用自然语言描述具有标题的音频数据。大多数现有的AAC方法采用编码器 - 解码器结构,其中基于注意的机制是解码器(例如,变压器解码器)中的受欢迎选择,用于预测来自音频特征的标题。这种基于注意的解码器可以从音频特征捕获全局信息,然而,它们在提取本地信息的能力可以是有限的,这可能导致所生成的标题中的质量下降。在本文中,我们介绍了一种具有无注意解码器的AAC方法,其中基于Pann的编码器用于音频特征提取,并且设计了无注意的解码器以引入本地信息。所提出的方法使得能够从音频信号中有效地使用全局和本地信息。实验表明,我们的方法在DCASE 2021挑战的任务6中具有基于标准的解码器的最先进的方法。
translated by 谷歌翻译
场景流程描绘了3D场景的动态,这对于传统上,从诸如自主驾驶,机器人导航,AR / VR等的各种应用来说至关重要。从密集/常规RGB视频帧估计场景流。随着深度感测技术的发展,通过点云可获得精确的3D测量,这在3D场景流中引发了新的研究。然而,由于典型点云采样模式中的稀缺性和不规则性,从点云中提取场景流量仍然具有挑战性。与不规则采样相关的一个主要问题被识别为点设置抽象/特征提取期间的随机性 - 许多流程估计场景中的基本进程。因此,提出了一种注意力(SA ^ 2)层的新型空间抽象,以减轻不稳定的抽象问题。此外,提出了一种注意力(TA ^ 2)层的时间抽象来纠正时间域中的注意力,导致运动中的运动缩放在更大范围内。广泛的分析和实验验证了我们方法的动机和显着性能收益,与空间 - 时间注意(Festa)称为流量估计,与场景流估计的几个最先进的基准相比。
translated by 谷歌翻译
The high feature dimensionality is a challenge in music emotion recognition. There is no common consensus on a relation between audio features and emotion. The MER system uses all available features to recognize emotion; however, this is not an optimal solution since it contains irrelevant data acting as noise. In this paper, we introduce a feature selection approach to eliminate redundant features for MER. We created a Selected Feature Set (SFS) based on the feature selection algorithm (FSA) and benchmarked it by training with two models, Support Vector Regression (SVR) and Random Forest (RF) and comparing them against with using the Complete Feature Set (CFS). The result indicates that the performance of MER has improved for both Random Forest (RF) and Support Vector Regression (SVR) models by using SFS. We found using FSA can improve performance in all scenarios, and it has potential benefits for model efficiency and stability for MER task.
translated by 谷歌翻译
Autonomous cars are indispensable when humans go further down the hands-free route. Although existing literature highlights that the acceptance of the autonomous car will increase if it drives in a human-like manner, sparse research offers the naturalistic experience from a passenger's seat perspective to examine the human likeness of current autonomous cars. The present study tested whether the AI driver could create a human-like ride experience for passengers based on 69 participants' feedback in a real-road scenario. We designed a ride experience-based version of the non-verbal Turing test for automated driving. Participants rode in autonomous cars (driven by either human or AI drivers) as a passenger and judged whether the driver was human or AI. The AI driver failed to pass our test because passengers detected the AI driver above chance. In contrast, when the human driver drove the car, the passengers' judgement was around chance. We further investigated how human passengers ascribe humanness in our test. Based on Lewin's field theory, we advanced a computational model combining signal detection theory with pre-trained language models to predict passengers' humanness rating behaviour. We employed affective transition between pre-study baseline emotions and corresponding post-stage emotions as the signal strength of our model. Results showed that the passengers' ascription of humanness would increase with the greater affective transition. Our study suggested an important role of affective transition in passengers' ascription of humanness, which might become a future direction for autonomous driving.
translated by 谷歌翻译
Detecting abnormal crowd motion emerging from complex interactions of individuals is paramount to ensure the safety of crowds. Crowd-level abnormal behaviors (CABs), e.g., counter flow and crowd turbulence, are proven to be the crucial causes of many crowd disasters. In the recent decade, video anomaly detection (VAD) techniques have achieved remarkable success in detecting individual-level abnormal behaviors (e.g., sudden running, fighting and stealing), but research on VAD for CABs is rather limited. Unlike individual-level anomaly, CABs usually do not exhibit salient difference from the normal behaviors when observed locally, and the scale of CABs could vary from one scenario to another. In this paper, we present a systematic study to tackle the important problem of VAD for CABs with a novel crowd motion learning framework, multi-scale motion consistency network (MSMC-Net). MSMC-Net first captures the spatial and temporal crowd motion consistency information in a graph representation. Then, it simultaneously trains multiple feature graphs constructed at different scales to capture rich crowd patterns. An attention network is used to adaptively fuse the multi-scale features for better CAB detection. For the empirical study, we consider three large-scale crowd event datasets, UMN, Hajj and Love Parade. Experimental results show that MSMC-Net could substantially improve the state-of-the-art performance on all the datasets.
translated by 谷歌翻译