Heart failure remains a major public health challenge with growing costs. Ejection fraction (EF) is a key metric for the diagnosis and management of heart failure however estimation of EF using echocardiography remains expensive for the healthcare system and subject to intra/inter operator variability. While chest x-rays (CXR) are quick, inexpensive, and require less expertise, they do not provide sufficient information to the human eye to estimate EF. This work explores the efficacy of computer vision techniques to predict reduced EF solely from CXRs. We studied a dataset of 3488 CXRs from the MIMIC CXR-jpg (MCR) dataset. Our work establishes benchmarks using multiple state-of-the-art convolutional neural network architectures. The subsequent analysis shows increasing model sizes from 8M to 23M parameters improved classification performance without overfitting the dataset. We further show how data augmentation techniques such as CXR rotation and random cropping further improves model performance another ~5%. Finally, we conduct an error analysis using saliency maps and Grad-CAMs to better understand the failure modes of convolutional models on this task.
translated by 谷歌翻译
The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner. This way often leads to severe performance degradation when they try to extrapolate a longer period of future, thus limiting the practical use of the prediction model. Alternatively, a Multi-In-Multi-Out (MIMO) architecture that outputs all the future frames at one shot naturally breaks the recursive manner and therefore prevents error accumulation. However, only a few MIMO models for video prediction are proposed and they only achieve inferior performance due to the date. The real strength of the MIMO model in this area is not well noticed and is largely under-explored. Motivated by that, we conduct a comprehensive investigation in this paper to thoroughly exploit how far a simple MIMO architecture can go. Surprisingly, our empirical studies reveal that a simple MIMO model can outperform the state-of-the-art work with a large margin much more than expected, especially in dealing with longterm error accumulation. After exploring a number of ways and designs, we propose a new MIMO architecture based on extending the pure Transformer with local spatio-temporal blocks and a new multi-output decoder, namely MIMO-VP, to establish a new standard in video prediction. We evaluate our model in four highly competitive benchmarks (Moving MNIST, Human3.6M, Weather, KITTI). Extensive experiments show that our model wins 1st place on all the benchmarks with remarkable performance gains and surpasses the best SISO model in all aspects including efficiency, quantity, and quality. We believe our model can serve as a new baseline to facilitate the future research of video prediction tasks. The code will be released.
translated by 谷歌翻译
基于骨架的人类行动识别是由于其复杂的动态而是一项长期挑战。动态的一些细颗粒细节在分类中起着至关重要的作用。现有的工作主要集中在设计带有更复杂的相邻矩阵的增量神经网络上,以捕获关节关系的细节。但是,他们仍然很难区分具有广泛相似运动模式但属于不同类别的动作。有趣的是,我们发现运动模式上的细微差异可以显着放大,并且可以轻松地通过指定的视图方向来区分观众,在这些方向上,该属性以前从未得到充分探索。与以前的工作截然不同,我们通过提出一种概念上简单而有效的多视图策略来提高性能,该策略从一系列动态视图功能中识别动作。具体而言,我们设计了一个新颖的骨骼锚定建议(SAP)模块,该模块包含一个多头结构来学习一组视图。为了学习不同观点的特征学习,我们引入了一个新的角度表示,以在不同视图下的动作转换并将转换归因于基线模型。我们的模块可以与现有的动作分类模型无缝合作。与基线模型合并,我们的SAP模块在许多具有挑战性的基准上展示了明显的性能增长。此外,全面的实验表明,我们的模型始终击败了最新的实验,并且在处理损坏的数据时保持有效和健壮。相关代码将在https://github.com/ideal-idea/sap上提供。
translated by 谷歌翻译
We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems 1 .
translated by 谷歌翻译
We consider the end-to-end abstract-to-title generation problem, exploring seven recent transformer based models (including ChatGPT) fine-tuned on more than 30k abstract-title pairs from NLP and machine learning venues. As an extension, we also consider the harder problem of generating humorous paper titles. For the latter, we compile the first large-scale humor annotated dataset for scientific papers in the NLP/ML domains, comprising almost 2.5k titles. We evaluate all models using human and automatic metrics. Our human evaluation suggests that our best end-to-end system performs similarly to human authors (but arguably slightly worse). Generating funny titles is more difficult, however, and our automatic systems clearly underperform relative to humans and often learn dataset artefacts of humor. Finally, ChatGPT, without any fine-tuning, performs on the level of our best fine-tuned system.
translated by 谷歌翻译
最近提出的基于BERT的评估指标在标准评估基准方面表现良好,但容易受到对抗性攻击的影响,例如与事实错误有关。我们认为这(部分原因)是因为它们是语义相似性的模型。相反,我们根据自然语言推断(NLI)制定评估指标,我们认为这是更合适的建模。我们设计了一个基于偏好的对抗攻击框架,并表明我们的基于NLI的指标比最近基于BERT的指标更强大。在标准基准上,我们的基于NLI的指标的表现优于现有的摘要指标,但在SOTA MT指标下执行。但是,当我们将现有指标与NLI指标相结合时,我们可以获得更高的对抗性鲁棒性( +20%至 +30%)和较高质量的指标,如标准基准测量( +5%至 +25%)。
translated by 谷歌翻译
准确性和稳定性是四轨轨迹跟踪系统的常见要求。设计准确稳定的跟踪控制器仍然具有挑战性,尤其是在具有复杂空气动力学干扰的未知和动态环境中。我们提出了基于分位数的分布分布 - 增强不确定性估计量(Quadue),以准确识别空气动力障碍的影响,即真实和估计的控制收缩度量(CCMS)之间的不确定性。从收缩理论中汲取灵感并整合了不确定性的Quadue,我们的新型基于CCM的轨迹跟踪框架可以准确地跟踪任何可行的参考轨迹,同时保证指数收敛。更重要的是,分别从理论角度保证和分析了分布RL的收敛和训练加速度。我们还在未知和多样化的空气动力下演示了我们的系统。在大型空气动力(> 2m/s^2)下,与经典数据驱动方法相比,我们的Quadue-CCM在跟踪误差方面至少提高了56.6%。与四型MPC(一种基于分布RL的方法)相比,Quadue-CCM的收缩率至少提高了3倍。
translated by 谷歌翻译
这项工作将控制屏障功能(CBF)与全身控制器结合在一起,以使MIT类人动物自我避免。现有的反应性控制器进行自我避免,不能保证无碰撞的轨迹,因为它们不利用机器人的完整动态,从而损害了运动学的可行性。相比之下,拟议的CBF-WBC控制器可以实时理解机器人的动力学不足,以确保无碰撞运动。该方法的有效性在模拟中得到了验证。首先,一个简单的手段实验表明,CBF-WBC使机器人的手能够偏离不可行的参考轨迹,以避免自我收集。其次,CBF-WBC与设计用于动态运动的线性模型预测控制器(LMPC)结合使用,并使用CBF-WBC来跟踪LMPC预测。质心动量任务还用于产生有助于人形运动和干扰恢复的手臂运动。步行实验表明,CBF允许质心动量任务产生可行的手臂运动,并在高级规划师提供的脚步位置或摇摆轨迹时避免腿部自我收获,对于真正的机器人来说是不可行的。
translated by 谷歌翻译
本文提出了一个模型预测控制(MPC)框架,以实现MIT类人体上的动态步态。除了适应脚步位置和在线时机外,该建议的方法还可以理解高度,接触扳手,躯干旋转,运动学限制和谈判不均匀的地形。具体而言,线性MPC(LMPC)通过与当前的脚步位置进行线性线性线性线性来优化所需的脚步位置。低级任务空间控制器跟踪从LMPC的预测状态和控制轨迹,以利用全身动力学。最后,采用自适应步态频率方案来修改步进频率并增强步行控制器的鲁棒性。 LMPC和任务空间控制都可以作为二次程序(QP)有效地求解,因此适用于实时应用程序。模拟研究中,MIT类人动物遍历波场并从冲动性干扰中恢复为拟议方法恢复。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译