稀疏的张量正在迅速成为现代深度学习工作负载的关键组成部分。但是,开发高性能的稀疏运营商可能很困难和乏味,现有的供应商库无法满足新运营商的不断升级要求。稀疏张量编译器简化了操作员的开发,但是对深度学习的有效稀疏编译仍然具有挑战性,因为单个稀疏格式无法最大程度地提高硬件效率,并且单次弹出编译器无法跟上最新的硬件和系统进步。我们表明,解决这两个挑战的关键是两种合成性。在本文中,我们提出了SparSetir,这是一种稀疏的张张汇编抽象,可为深度学习工作负载提供可合理的格式和可组合的转换。 Sparsetir在这些可组合组件上构建一个搜索空间,以进行性能调整。通过这些改进,SparSetir获得了单个操作员的GPU上的一致性能加速与供应商库:GNN操作员的1.1-3.3倍,稀疏变压器操作员的1.1-4.4x。 Sparsetir还以1.1-2.2倍的速度加速了端到端GNN,用于图形训练,而RGCN推断为0.9-26x。
translated by 谷歌翻译
在各种设备上部署深度学习模型已成为一个重要的话题。硬件专业化的浪潮为多维张量计算带来了一套多样化的加速度原始图。这些新的加速原始基原料以及新兴的机器学习模型带来了巨大的工程挑战。在本文中,我们提出了Tensorir,这是一种编译器抽象,用于通过这些张量计算原始素优化程序。Tensorir概括了现有机器学习编译器中使用的循环巢表示,以将张量计算作为一流的公民。最后,我们在抽象之上构建了一个端到端框架,以自动优化给定的张量计算原始图的深度学习模型。实验结果表明,Tensorir编译会自动使用给定硬件后端的张量计算原始图,并提供与跨平台的最新手工精制系统竞争性能的性能。
translated by 谷歌翻译
根据历史运动序列预测未来的运动是计算机视觉中的一个基本问题,并且在自主驾驶和机器人技术中具有广泛的应用。最近的一些作品表明,图形卷积网络(GCN)有助于对不同关节之间的关系进行建模。但是,考虑到人类运动数据中的变体和各种作用类型,由于解耦的建模策略,很难描绘时空关系的交叉依赖性,这也可能加剧了不足的概括问题。因此,我们提出时空门控速度ADJACENCY GCN(GAGCN)学习对各种作用类型的复杂时空依赖性。具体而言,我们采用门控网络来通过混合候选时空邻接矩阵获得的可训练的自适应邻接矩阵来增强GCN的概括。此外,GAGCN通过平衡时空建模的重量并融合了脱钩时空特征来解决空间和时间的交叉依赖性。对人类360万,积聚和3DPW的广泛实验表明,GAGCN在短期和长期预测中都能达到最先进的表现。
translated by 谷歌翻译
社会偏差可以减少Covid-19等呼吸流行病中的感染率。交通交叉路口特别适用于在大都市中监测和评估社会疏散行为。我们提出并评估了一个隐私保留的社会疏散分析系统(B-SDA),它使用鸟瞰观看跨越交通交叉口的行人的录像。我们设计用于视频预处理,对象检测和跟踪的算法,这些算法源于已知的计算机视觉和深度学习技术,而是修改以解决检测由高度升高的相机捕获的非常小的物体/行人的问题。我们提出了一种纳入行人分组以检测社会疏散侵权行为的方法。 B-SDA用于比较基于大都会区域前大流行和大流行视频的行人行为。完成的行人检测性能为63.0美元$ $ $ ap_ {50} $,跟踪性能为47.6美元\%$ mota。大流行期间的社会疏散违规率为15.6 \%$ 31.4 \%$ Pandemic基线,表明行人遵循CDC规定的社会休闲建议。建议的系统适用于现实世界应用中的部署。
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
translated by 谷歌翻译
Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Reinforcement learning (RL) is one of the most important branches of AI. Due to its capacity for self-adaption and decision-making in dynamic environments, reinforcement learning has been widely applied in multiple areas, such as healthcare, data markets, autonomous driving, and robotics. However, some of these applications and systems have been shown to be vulnerable to security or privacy attacks, resulting in unreliable or unstable services. A large number of studies have focused on these security and privacy problems in reinforcement learning. However, few surveys have provided a systematic review and comparison of existing problems and state-of-the-art solutions to keep up with the pace of emerging threats. Accordingly, we herein present such a comprehensive review to explain and summarize the challenges associated with security and privacy in reinforcement learning from a new perspective, namely that of the Markov Decision Process (MDP). In this survey, we first introduce the key concepts related to this area. Next, we cover the security and privacy issues linked to the state, action, environment, and reward function of the MDP process, respectively. We further highlight the special characteristics of security and privacy methodologies related to reinforcement learning. Finally, we discuss the possible future research directions within this area.
translated by 谷歌翻译