We apply computer vision pose estimation techniques developed expressly for the data-scarce infant domain to the study of torticollis, a common condition in infants for which early identification and treatment is critical. Specifically, we use a combination of facial landmark and body joint estimation techniques designed for infants to estimate a range of geometric measures pertaining to face and upper body symmetry, drawn from an array of sources in the physical therapy and ophthalmology research literature in torticollis. We gauge performance with a range of metrics and show that the estimates of most these geometric measures are successful, yielding strong to very strong Spearman's $\rho$ correlation with ground truth values. Furthermore, we show that these estimates, derived from pose estimation neural networks designed for the infant domain, cleanly outperform estimates derived from more widely known networks designed for the adult domain
translated by 谷歌翻译
准确注释的图像数据集是研究动物行为的重要组成部分。与我们知道并且可能存在的物种数量相比,现有的标记姿势数据集仅覆盖其中的一小部分,而构建全面的大规模数据集则非常昂贵。在这里,我们提出了一种非常数据有效的策略,该策略针对四足动物的姿势估计,该策略仅需要少量来自目标动物的真实图像。可以证实,在诸如ImageNet之类的通用图像数据集上具有预计权重的骨干网络可以减轻对目标动物姿势数据的高需求,并通过了解对物体细分和关键点估计的先验知识来缩短训练时间。但是,当面对严重的数据稀缺性(即$ <10^2 $真实图像)时,模型性能保持不令人满意,尤其是对于具有相当灵活性和几个可比零件的四肢而言。因此,我们引入了一种称为Pasyn的先前感知的合成动物数据生成管道,以增强动物姿势数据对可靠的姿势估计所必需的数据。 Pasyn通过在几种动画3D动物模型上训练变异生成模型,生成概率 - valid合成姿势数据集,突触。此外,样式转移策略被用来将合成动物形象融合到真实背景中。我们通过三个流行的骨干网络评估了方法的改进,并测试了其姿势估计的准确性,并在动物园中从真实动物中收集的公共动物姿势图像以及从真实的动物中收集的姿势估计准确性。
translated by 谷歌翻译
时间序列内的3D人体姿势和形状估计对于理解人类行为至关重要。尽管近年来人类姿势估计取得了重大进展,这些进展通常是基于单个图像或视频,但考虑到其对实时输出和时间一致性的特殊要求,实时视频中的人类运动估计仍然是一个很少的触摸区域。为了解决这个问题,我们提出了一个时间嵌入的3D人体姿势和形状估计(Tepose)方法,以提高实时流视频中姿势估计的准确性和时间一致性。 Tepose使用以前的预测作为反馈错误的桥梁,以在当前帧中更好地估计,并了解数据框架和历史上的预测之间的对应关系。多尺度时空图形卷积网络被视为使用数据集的运动判别器,用于对抗训练,而没有任何3D标记。我们提出了一个顺序数据加载策略,以满足实时流的特殊起始数据处理要求。我们通过广泛的实验证明了每个提出的模块的重要性。结果表明,多孔在具有最先进的性能的广泛使用的人姿势基准上的有效性。
translated by 谷歌翻译
双边姿势对称性是自闭症谱系障碍(ASD)的潜在风险标志物以及婴儿中先天性肌肉核核糖(CMT)的症状的关键作用,但是当前评估对称性的方法需要费力的临床专家评估。在本文中,我们开发了一个基于计算机视觉的婴儿对称评估系统,利用婴儿的3D人姿势估计。通过对人类角度和对称性评级的调查,我们的发现对我们的系统进行评估和校准,使这种评级表现出较低的评价者可靠性。为了纠正这一点,我们开发了一个贝叶斯的估计量,该估计量是从可犯错的人类评估者的概率图形模型中得出的。我们显示,在预测贝叶斯骨料标签方面,3D婴儿姿势估计模型可以在接收器工作特征曲线性能下实现68%的面积,而2D婴儿姿势估计模型仅为61%,而3D成人姿势估计模型的61%和60% ,强调了3D姿势和婴儿领域知识在评估婴儿身体对称性方面的重要性。我们的调查分析还表明,人类评分易受较高的偏见和不一致性的影响,因此,我们的最终基于3D姿势的对称评估系统是校准的,但没有直接受到贝叶斯汇总人类评分的直接监督,从而产生了更高的一致性和较低水平的水平和​​较低的水平。 LIMB间评估偏见。
translated by 谷歌翻译
婴儿运动分析是在儿童早期开发研究中具有重要意义的主题。然而,虽然人类姿势估计的应用变得越来越宽,但是在大规模成年姿势数据集上培训的模型几乎不能在估计婴幼儿姿势,因为它们的身体比率显着差异以及它们的构成的多功能性。此外,隐私和安全考虑因素阻碍了从头划痕培训强大模型所需的适当婴儿姿势数据的可用性。为了解决这个问题,本文提出(1)建立和公开发布具有小但不同实际婴儿图像的混合综合和真正的婴儿姿势(Syrip)数据集以及生成的合成婴儿姿势和(2)多级不变表示学习策略可以将知识从成人姿势和合成婴儿图像的相邻域和综合性婴儿图像转移到我们的微调域适应婴儿姿势(FIDEP)估计模型中。在我们的消融研究中,具有相同的网络结构,在SyRip数据集上培训的模型对唯一的其他公共婴儿姿势数据集接受过的培训明显改进。与具有不同复杂性的姿势估计骨干网络集成,FIDEP比这些模型的微调版本始终如一。我们最先进的暗影模型上最好的婴儿姿势估计表演者显示了93.6的平均平均精度(MAP)。
translated by 谷歌翻译
Remote sensing imagery provides comprehensive views of the Earth, where different sensors collect complementary data at different spatial scales. Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a $5.0\%$ non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a $0.9$ mIoU to $3.8$ mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.
translated by 谷歌翻译
Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled data and ambiguous definitions of anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress in tackling this problem using either single-timestamp predictions or time series reconstructions. While traditionally considered separately, these methods are not mutually exclusive and can offer complementary perspectives on anomaly detection. This paper first highlights the successes and limitations of prediction-based and reconstruction-based methods with visualized time series signals and anomaly scores. We then propose AER (Auto-encoder with Regression), a joint model that combines a vanilla auto-encoder and an LSTM regressor to incorporate the successes and address the limitations of each method. Our model can produce bi-directional predictions while simultaneously reconstructing the original time series by optimizing a joint objective function. Furthermore, we propose several ways of combining the prediction and reconstruction errors through a series of ablation studies. Finally, we compare the performance of the AER architecture against two prediction-based methods and three reconstruction-based methods on 12 well-known univariate time series datasets from NASA, Yahoo, Numenta, and UCR. The results show that AER has the highest averaged F1 score across all datasets (a 23.5% improvement compared to ARIMA) while retaining a runtime similar to its vanilla auto-encoder and regressor components. Our model is available in Orion, an open-source benchmarking tool for time series anomaly detection.
translated by 谷歌翻译
We present an update on the current architecture of the Zoea knowledge-based, Composable Inductive Programming system. The Zoea compiler is built using a modern variant of the black-board architecture. Zoea integrates a large number of knowledge sources that encode different aspects of programming language and software development expertise. We describe the use of synthetic test cases as a ubiquitous form of knowledge and hypothesis representation that sup-ports a variety of reasoning strategies. Some future plans are also outlined.
translated by 谷歌翻译
Quantum machine learning (QML) has received increasing attention due to its potential to outperform classical machine learning methods in various problems. A subclass of QML methods is quantum generative adversarial networks (QGANs) which have been studied as a quantum counterpart of classical GANs widely used in image manipulation and generation tasks. The existing work on QGANs is still limited to small-scale proof-of-concept examples based on images with significant down-scaling. Here we integrate classical and quantum techniques to propose a new hybrid quantum-classical GAN framework. We demonstrate its superior learning capabilities by generating $28 \times 28$ pixels grey-scale images without dimensionality reduction or classical pre/post-processing on multiple classes of the standard MNIST and Fashion MNIST datasets, which achieves comparable results to classical frameworks with 3 orders of magnitude less trainable generator parameters. To gain further insight into the working of our hybrid approach, we systematically explore the impact of its parameter space by varying the number of qubits, the size of image patches, the number of layers in the generator, the shape of the patches and the choice of prior distribution. Our results show that increasing the quantum generator size generally improves the learning capability of the network. The developed framework provides a foundation for future design of QGANs with optimal parameter set tailored for complex image generation tasks.
translated by 谷歌翻译
Datasets for training recommender systems are often subject to distribution shift induced by users' and recommenders' selection biases. In this paper, we study the impact of selection bias on datasets with different quantization. We then leverage two differently quantized datasets from different source distributions to mitigate distribution shift by applying the inverse probability scoring method from causal inference. Empirically, our approach gains significant performance improvement over single-dataset methods and alternative ways of combining two datasets.
translated by 谷歌翻译