由于维度的诅咒和训练数据的限制,即使对于强大的深度神经网络,近似高维功能是一个非常具有挑战性的任务。灵感来自使用可逆剩余网络(REVNET)的非线性级别集学习(NLL)方法,本文提出了一种通过学习级别集(钻头)的尺寸减少方法,用于函数近似。我们的方法包含两个主要组件:一个是伪可逆神经网络(PRNN)模块,有效地将高维输入变量转换为低维活动变量,另一个是基于变换的近似函数值的合成回归模块低维空间中的数据。 PRNN由于使用RevEN而言,PRNN不仅放宽了NLL方法中存在的非线性变换的可逆性约束,还可以自适应地重量每个样本的影响并控制函数对学习的活动变量的灵敏度。合成的回归使用输入空间中的欧几里德距离来选择相邻样本,其在活动变量的空间上的投影用于执行局部最小二乘性多项式拟合。这有助于解决传统本地和全球回归中存在的数值振荡问题。广泛的实验结果表明,我们的钻探方法优于NLL和有源子空间方法,特别是当目标函数在其输入域内部拥有临界点时。
translated by 谷歌翻译
部分微分方程通常用于模拟各种物理现象,例如热扩散,波传播,流体动力学,弹性,电动力学和图像处理,并且已经开发了许多分析方法或传统的数值方法并广泛用于其溶液。受深度学习对科学和工程研究的迅速影响的启发,在本文中,我们提出了一个新型的神经网络GF-NET,以无监督的方式学习绿色的线性反应扩散方程的功能。所提出的方法克服了通过使用物理信息的方法和绿色功能的对称性来查找任意域上方程函数的挑战。结果,它尤其导致了在不同边界条件和来源下解决目标方程的有效方法。我们还通过正方形,环形和L形域中的实验证明了所提出的方法的有效性。
translated by 谷歌翻译
We propose Hierarchical ProtoPNet: an interpretable network that explains its reasoning process by considering the hierarchical relationship between classes. Different from previous methods that explain their reasoning process by dissecting the input image and finding the prototypical parts responsible for the classification, we propose to explain the reasoning process for video action classification by dissecting the input video frames on multiple levels of the class hierarchy. The explanations leverage the hierarchy to deal with uncertainty, akin to human reasoning: When we observe water and human activity, but no definitive action it can be recognized as the water sports parent class. Only after observing a person swimming can we definitively refine it to the swimming action. Experiments on ActivityNet and UCF-101 show performance improvements while providing multi-level explanations.
translated by 谷歌翻译
Sparse principal component analysis (SPCA) has been widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite there are many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) based on the elastic net are still unknown. We aim to close this important theoretical gap in this paper. We first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. Also, we study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case of SPCA. We provide the guarantees of convergence to a stationary point for both algorithms. We prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the numerical performance of both algorithms in simulation studies.
translated by 谷歌翻译
The counting task, which plays a fundamental rule in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-increment module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-increment module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of this module. Then, instead of retraining the model from scratch, we utilize knowledge distilling to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.
translated by 谷歌翻译
Long-term non-prehensile planar manipulation is a challenging task for robot planning and feedback control. It is characterized by underactuation, hybrid control, and contact uncertainty. One main difficulty is to determine contact points and directions, which involves joint logic and geometrical reasoning in the modes of the dynamics model. To tackle this issue, we propose a demonstration-guided hierarchical optimization framework to achieve offline task and motion planning (TAMP). Our work extends the formulation of the dynamics model of the pusher-slider system to include separation mode with face switching cases, and solves a warm-started TAMP problem by exploiting human demonstrations. We show that our approach can cope well with the local minima problems currently present in the state-of-the-art solvers and determine a valid solution to the task. We validate our results in simulation and demonstrate its applicability on a pusher-slider system with real Franka Emika robot in the presence of external disturbances.
translated by 谷歌翻译
Human modeling and relighting are two fundamental problems in computer vision and graphics, where high-quality datasets can largely facilitate related research. However, most existing human datasets only provide multi-view human images captured under the same illumination. Although valuable for modeling tasks, they are not readily used in relighting problems. To promote research in both fields, in this paper, we present UltraStage, a new 3D human dataset that contains more than 2K high-quality human assets captured under both multi-view and multi-illumination settings. Specifically, for each example, we provide 32 surrounding views illuminated with one white light and two gradient illuminations. In addition to regular multi-view images, gradient illuminations help recover detailed surface normal and spatially-varying material maps, enabling various relighting applications. Inspired by recent advances in neural representation, we further interpret each example into a neural human asset which allows novel view synthesis under arbitrary lighting conditions. We show our neural human assets can achieve extremely high capture performance and are capable of representing fine details such as facial wrinkles and cloth folds. We also validate UltraStage in single image relighting tasks, training neural networks with virtual relighted data from neural assets and demonstrating realistic rendering improvements over prior arts. UltraStage will be publicly available to the community to stimulate significant future developments in various human modeling and rendering tasks.
translated by 谷歌翻译
Accurate spatial-temporal traffic flow forecasting is essential for helping traffic managers to take control measures and drivers to choose the optimal travel routes. Recently, graph convolutional networks (GCNs) have been widely used in traffic flow prediction owing to their powerful ability to capture spatial-temporal dependencies. The design of the spatial-temporal graph adjacency matrix is a key to the success of GCNs, and it is still an open question. This paper proposes reconstructing the binary adjacency matrix via tensor decomposition, and a traffic flow forecasting method is proposed. First, we reformulate the spatial-temporal fusion graph adjacency matrix into a three-way adjacency tensor. Then, we reconstructed the adjacency tensor via Tucker decomposition, wherein more informative and global spatial-temporal dependencies are encoded. Finally, a Spatial-temporal Synchronous Graph Convolutional module for localized spatial-temporal correlations learning and a Dilated Convolution module for global correlations learning are assembled to aggregate and learn the comprehensive spatial-temporal dependencies of the road network. Experimental results on four open-access datasets demonstrate that the proposed model outperforms state-of-the-art approaches in terms of the prediction performance and computational cost.
translated by 谷歌翻译
Existing approaches for vision-and-language navigation (VLN) are mainly based on cross-modal reasoning over discrete views. However, this scheme may hamper an agent's spatial and numerical reasoning because of incomplete objects within a single view and duplicate observations across views. A potential solution is mapping discrete views into a unified birds's-eye view, which can aggregate partial and duplicate observations. Existing metric maps could achieve this goal, but they suffer from less expressive semantics (e.g. usually predefined labels) and limited map size, which weakens an agent's language grounding and long-term planning ability. Inspired by the robotics community, we introduce hybrid topo-metric maps into VLN, where a topological map is used for long-term planning and a metric map for short-term reasoning. Beyond mapping with more expressive deep features, we further design a pre-training framework via the hybrid map to learn language-informed map representations, which enhances cross-modal grounding and facilitates the final language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based route for VLN, and the proposed method sets the new state-of-the-art on three VLN benchmarks.
translated by 谷歌翻译
Ongoing risks from climate change have impacted the livelihood of global nomadic communities, and are likely to lead to increased migratory movements in coming years. As a result, mobility considerations are becoming increasingly important in energy systems planning, particularly to achieve energy access in developing countries. Advanced Plug and Play control strategies have been recently developed with such a decentralized framework in mind, more easily allowing for the interconnection of nomadic communities, both to each other and to the main grid. In light of the above, the design and planning strategy of a mobile multi-energy supply system for a nomadic community is investigated in this work. Motivated by the scale and dimensionality of the associated uncertainties, impacting all major design and decision variables over the 30-year planning horizon, Deep Reinforcement Learning (DRL) is implemented for the design and planning problem tackled. DRL based solutions are benchmarked against several rigid baseline design options to compare expected performance under uncertainty. The results on a case study for ger communities in Mongolia suggest that mobile nomadic energy systems can be both technically and economically feasible, particularly when considering flexibility, although the degree of spatial dispersion among households is an important limiting factor. Key economic, sustainability and resilience indicators such as Cost, Equivalent Emissions and Total Unmet Load are measured, suggesting potential improvements compared to available baselines of up to 25%, 67% and 76%, respectively. Finally, the decomposition of values of flexibility and plug and play operation is presented using a variation of real options theory, with important implications for both nomadic communities and policymakers focused on enabling their energy access.
translated by 谷歌翻译