The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.
translated by 谷歌翻译
Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.
translated by 谷歌翻译
深度学习已在许多神经影像应用中有效。但是,在许多情况下,捕获与小血管疾病有关的信息的成像序列的数量不足以支持数据驱动的技术。此外,基于队列的研究可能并不总是具有用于准确病变检测的最佳或必需成像序列。因此,有必要确定哪些成像序列对于准确检测至关重要。在这项研究中,我们旨在找到磁共振成像(MRI)序列的最佳组合,以深入基于学习的肿瘤周围空间(EPV)。为此,我们实施了一个有效的轻巧U-NET,适用于EPVS检测,并全面研究了来自易感加权成像(SWI),流体侵入的反转恢复(FLAIR),T1加权(T1W)和T2的不同信息组合 - 加权(T2W)MRI序列。我们得出的结论是,T2W MRI对于准确的EPV检测最为重要,并且在深神经网络中掺入SWI,FLAIR和T1W MRI可能会使精度的提高无关。
translated by 谷歌翻译
translated by 谷歌翻译
图像引导放射疗法中的CBCT为患者的设置和计划评估提供了关键的解剖学信息。纵向CBCT图像登记可以量化分裂间的解剖变化。这项研究的目的是提出一个无监督的基于深度学习的CBCT-CBCT变形图像登记。提出的可变形注册工作流程包括训练和推理阶段,这些培训和推理阶段通过基于空间转换的网络(STN)共享相同的进率前路。 STN由全球生成对抗网络(Globalgan)和本地GAN(Localgan)组成,分别预测了粗略和细尺度运动。通过最小化图像相似性损失和可变形矢量场(DVF)正则化损失,而无需监督地面真实DVF的训练,对网络进行了训练。在推理阶段,训练有素的Localgan预测了局部DVF的斑块,并融合形成全图像DVF。随后将局部全图像DVF与Globalgan生成的DVF合并以获得最终的DVF。在实验中,使用来自20名腹部癌症患者的100个分数CBCT评估了该方法,并在保持测试中来自21名不同腹部癌症患者的队列中的105个分数CBCT。从定性上讲,注册结果显示了变形的CBCT图像与目标CBCT图像之间的对齐。定量地,在基准标记和手动确定的地标计算的平均目标注册误差(TRE)为1.91+-1.11 mm。变形CBCT和目标CBCT之间的平均平均绝对误差(MAE),归一化的跨相关性(NCC)分别为33.42+-7.48 HU,0.94+-0.04。这种有希望的注册方法可以提供快速准确的纵向CBCT对准,以促进分流的解剖变化分析和预测。
translated by 谷歌翻译
translated by 谷歌翻译
随着在高风险决策中引入机器学习,确保算法公平已成为越来越重要的问题。为此,已经提出了许多关于公平性的数学定义,并且已经开发了多种优化技术,所有这些都旨在最大化明确的公平概念。但是,公平解决方案取决于训练数据的质量,并且对噪声高度敏感。最近的研究表明,鲁棒性(模型在看不见的数据上表现良好的能力)在解决新问题时应使用的策略类型起着重要作用,因此,测量这些策略的鲁棒性已成为一种基本问题。因此,在这项工作中,我们提出了一个新标准,以衡量各种公平优化策略的鲁棒性 - \ textit {稳健性比率}。我们使用三种最受欢迎​​的公平策略在五个最受欢迎的公平定义方面,在五个基准标记公平数据集上进行了多次广泛的实验。我们的实验从经验上表明,依赖阈值优化的公平方法对所有评估的数据集中的噪声非常敏感,尽管大多数表现优于其他方法。这与其他两种方法相反,这对于低噪声方案而言不太公平,但对于高噪声方案而言更公平。据我们所知,我们是第一个定量评估公平优化策略的鲁棒性的人。这可以作为选择各种数据集的最合适的公平策略的指南。
translated by 谷歌翻译
Many real-world problems are inherently multimodal, from the communicative modalities humans use to express social and emotional states to the force, proprioception, and visual sensors ubiquitous on robots. While there has been an explosion of interest in multimodal representation learning, these methods are still largely focused on a small set of modalities, primarily in the language, vision, and audio space. In order to accelerate generalization towards diverse and understudied modalities, this paper studies efficient representation learning for high-modality scenarios. Since adding new models for every new modality or task becomes prohibitively expensive, a critical technical challenge is heterogeneity quantification: how can we measure which modalities encode similar information and interactions in order to permit parameter sharing with previous modalities? We propose two new information-theoretic metrics for heterogeneity quantification: (1) modality heterogeneity studies how similar 2 modalities $\{X_1,X_2\}$ are by measuring how much information can be transferred from $X_1$ to $X_2$, while (2) interaction heterogeneity studies how similarly pairs of modalities $\{X_1,X_2\}, \{X_3,X_4\}$ interact by measuring how much interaction information can be transferred from $\{X_1,X_2\}$ to $\{X_3,X_4\}$. We show the importance of these proposed metrics in high-modality scenarios as a way to automatically prioritize the fusion of modalities that contain unique information or interactions. The result is a single model, HighMMT, that scales up to $10$ modalities and $15$ tasks from $5$ different research areas. Not only does HighMMT outperform prior methods on the tradeoff between performance and efficiency, it also demonstrates a crucial scaling behavior: performance continues to improve with each modality added, and transfers to entirely new modalities and tasks during fine-tuning.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译