隐式3D表示的最新进展,即神经辐射场(NERFS),以可区分的方式使准确且具有逼真的3D重建成为可能。这种新的表示可以有效地以一种紧凑的格式传达数百个高分辨率图像的信息,并允许对新观点的逼真综合。在这项工作中,使用NERF的变体称为全体氧,我们为感知任务创建了第一个大规模隐式表示数据集,称为Fustection,该数据集由两个部分组成,这些部分既包含以对象为中心和场景为中心的扫描,用于分类和分段, 。它显示了原始数据集的显着内存压缩率(96.4 \%),同时以统一形式包含2D和3D信息。我们构建了直接作为输入这种隐式格式的分类和分割模型,并提出了一种新颖的增强技术,以避免在图像的背景上过度拟合。代码和数据可在https://postech-cvlab.github.io/perfception中公开获得。
translated by 谷歌翻译
气候变化所扩大的极端天气正在造成全球日益毁灭性的影响。由于高计算成本和严格的时间到解决方案限制,目前基于物理的数值天气预测(NWP)的使用限制了精度。我们报告说,数据驱动的深度学习地球系统模拟器Fourcastnet可以预测全球天气,并在接近最先进的准确性的同时,比NWP更快地产生五个量子的预测。四个超级计算系统(Selene,Perlmutter和Juwels Booster高达3,808 nvidia a100 GPU)在三个超级计算系统上进行了优化,并有效地缩放,并在混合精度中获得140.8 PETAFLOPS(该规模的峰值为11.9%)。在3,072GPU上在Juwels Booster上测量的训练四界的时间到达的时间为67.4分钟,相对于最新的NWP,在推理中,相对于最先进的NWP的时间更快。 Fourcastnet提前一周可产生准确的瞬时天气预测,使巨大的合奏更好地捕捉了极端天气,并支持更高的全球预测决议。
translated by 谷歌翻译
我们提出了一种基于学习的方法,以通过穿越城市环境的移动机器人来重建当地地形进行机车。使用板载摄像头和机器人轨迹的深度测量流,该算法估计机器人附近的地形。这些相机的原始测量值嘈杂,仅提供部分和遮挡的观察结果,在许多情况下,这些观察结果并未显示机器人所占据的地形。因此,我们提出了一个3D重建模型,该模型忠实地重建了场景,尽管嘈杂的测量和大量丢失的数据来自相机布置的盲点。该模型由点云上的4D完全卷积网络组成,该网络学习了几何先验,以从上下文中完成场景和自动回归反馈,以利用时空的一致性并使用过去的证据。该网络只能通过合成数据对网络进行训练,并且由于广泛的增强,它在现实世界中是强大的,如四足机器人(Anymal)验证中所示,Anymal,遍历具有挑战性的设置。我们使用有效的稀疏张量实现在机器人的机载低功率计算机上运行管道,并表明所提出的方法的表现优于经典地图表示。
translated by 谷歌翻译
Quantum many-body problems are some of the most challenging problems in science and are central to demystifying some exotic quantum phenomena, e.g., high-temperature superconductors. The combination of neural networks (NN) for representing quantum states, coupled with the Variational Monte Carlo (VMC) algorithm, has been shown to be a promising method for solving such problems. However, the run-time of this approach scales quadratically with the number of simulated particles, constraining the practically usable NN to - in machine learning terms - minuscule sizes (<10M parameters). Considering the many breakthroughs brought by extreme NN in the +1B parameters scale to other domains, lifting this constraint could significantly expand the set of quantum systems we can accurately simulate on classical computers, both in size and complexity. We propose a NN architecture called Vector-Quantized Neural Quantum States (VQ-NQS) that utilizes vector-quantization techniques to leverage redundancies in the local-energy calculations of the VMC algorithm - the source of the quadratic scaling. In our preliminary experiments, we demonstrate VQ-NQS ability to reproduce the ground state of the 2D Heisenberg model across various system sizes, while reporting a significant reduction of about ${\times}10$ in the number of FLOPs in the local-energy calculation.
translated by 谷歌翻译
There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path optimizations and a fused Einsum mapping strategy to bridge the gap between theoretical benefits and real hardware efficiency improvement. Our two-stage knowledge distillation flow resolves the trainability bottleneck and thus significantly boosts the final accuracy of factorized Transformers. Overall, we experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss and achieve a better efficiency-accuracy Pareto frontier than hand-tuned and heuristic baselines.
translated by 谷歌翻译
The physics-informed neural operator (PINO) is a machine learning architecture that has shown promising empirical results for learning partial differential equations. PINO uses the Fourier neural operator (FNO) architecture to overcome the optimization challenges often faced by physics-informed neural networks. Since the convolution operator in PINO uses the Fourier series representation, its gradient can be computed exactly on the Fourier space. While Fourier series cannot represent nonperiodic functions, PINO and FNO still have the expressivity to learn nonperiodic problems with Fourier extension via padding. However, computing the Fourier extension in the physics-informed optimization requires solving an ill-conditioned system, resulting in inaccurate derivatives which prevent effective optimization. In this work, we present an architecture that leverages Fourier continuation (FC) to apply the exact gradient method to PINO for nonperiodic problems. This paper investigates three different ways that FC can be incorporated into PINO by testing their performance on a 1D blowup problem. Experiments show that FC-PINO outperforms padded PINO, improving equation loss by several orders of magnitude, and it can accurately capture the third order derivatives of nonsmooth solution functions.
translated by 谷歌翻译
Recently, neural networks have proven their impressive ability to solve partial differential equations (PDEs). Among them, Fourier neural operator (FNO) has shown success in learning solution operators for highly non-linear problems such as turbulence flow. FNO is discretization-invariant, where it can be trained on low-resolution data and generalizes to problems with high-resolution. This property is related to the low-pass filters in FNO, where only a limited number of frequency modes are selected to propagate information. However, it is still a challenge to select an appropriate number of frequency modes and training resolution for different PDEs. Too few frequency modes and low-resolution data hurt generalization, while too many frequency modes and high-resolution data are computationally expensive and lead to over-fitting. To this end, we propose Incremental Fourier Neural Operator (IFNO), which augments both the frequency modes and data resolution incrementally during training. We show that IFNO achieves better generalization (around 15% reduction on testing L2 loss) while reducing the computational cost by 35%, compared to the standard FNO. In addition, we observe that IFNO follows the behavior of implicit regularization in FNO, which explains its excellent generalization ability.
translated by 谷歌翻译
State estimation is important for a variety of tasks, from forecasting to substituting for unmeasured states in feedback controllers. Performing real-time state estimation for PDEs using provably and rapidly converging observers, such as those based on PDE backstepping, is computationally expensive and in many cases prohibitive. We propose a framework for accelerating PDE observer computations using learning-based approaches that are much faster while maintaining accuracy. In particular, we employ the recently-developed Fourier Neural Operator (FNO) to learn the functional mapping from the initial observer state and boundary measurements to the state estimate. By employing backstepping observer gains for previously-designed observers with particular convergence rate guarantees, we provide numerical experiments that evaluate the increased computational efficiency gained with FNO. We consider the state estimation for three benchmark PDE examples motivated by applications: first, for a reaction-diffusion (parabolic) PDE whose state is estimated with an exponential rate of convergence; second, for a parabolic PDE with exact prescribed-time estimation; and, third, for a pair of coupled first-order hyperbolic PDEs that modeling traffic flow density and velocity. The ML-accelerated observers trained on simulation data sets for these PDEs achieves up to three orders of magnitude improvement in computational speed compared to classical methods. This demonstrates the attractiveness of the ML-accelerated observers for real-time state estimation and control.
translated by 谷歌翻译
This report describes the winning solution to the Robust Vision Challenge (RVC) semantic segmentation track at ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses SegFormer as the segmentation framework. The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy. All the original labels are projected to a 256-class unified label space, and the model is trained using a cross-entropy loss. Without significant hyperparameter tuning or any specific loss weighting, our solution ranks the first place on all the testing semantic segmentation benchmarks from multiple domains (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, and WildDash 2). The proposed method can serve as a strong baseline for the multi-domain segmentation task and benefit future works. Code will be available at https://github.com/lambert-x/RVC_Segmentation.
translated by 谷歌翻译