研究的一个快速增长的领域是使用机器学习方法(例如自动编码器)来降低科学应用中数据和模型的尺寸。我们表明,自动编码器的规范配方遭受了几种可能阻碍其性能的缺陷。使用元学习方法,我们将自动编码器问题重新制定为双层优化程序,该程序明确解决了降低降低任务。我们证明,新的配方用规范的自动编码器纠正了已确定的缺陷,提供了一种实用方法来求解它,并使用简单的数值说明来展示该配方的强度。
translated by 谷歌翻译
科学和工程学中的一个基本问题是设计最佳的控制政策,这些政策将给定的系统转向预期的结果。这项工作提出了同时求解给定系统状态和最佳控制信号的控制物理信息的神经网络(控制PINNS),在符合基础物理定律的一个阶段框架中。先前的方法使用两个阶段的框架,该框架首先建模然后按顺序控制系统。相比之下,控制PINN将所需的最佳条件纳入其体系结构和损耗函数中。通过解决以下开环的最佳控制问题来证明控制PINN的成功:(i)一个分析问题,(ii)一维热方程,以及(iii)二维捕食者捕食者问题。
translated by 谷歌翻译
许多操作数值天气预报系统中使用的数据同化程序基于4D-VAR算法的变体。解决4D-VAR问题的成本是由物理模型的前进和伴随评估的成本为主。这通过快速,近似代理模型来激励他们的替代。神经网络为代理模型的数据驱动创建提供了一个有希望的方法。已显示代理4D-VAR问题解决方案的准确性,明确地依赖于对其他代理建模方法和一般非线性设置的准确建模和伴随的准确建模。我们制定和分析若干方法,将衍生信息纳入神经网络替代品的构建。通过训练集数据和Lorenz-63系统上的顺序数据同化设置来测试生成的网络。与没有伴随信息的替代网络培训的代理网络相比,两种方法表现出卓越的性能,显示将伴随信息纳入训练过程的益处。
translated by 谷歌翻译
We address the challenge of building domain-specific knowledge models for industrial use cases, where labelled data and taxonomic information is initially scarce. Our focus is on inductive link prediction models as a basis for practical tools that support knowledge engineers with exploring text collections and discovering and linking new (so-called open-world) entities to the knowledge graph. We argue that - though neural approaches to text mining have yielded impressive results in the past years - current benchmarks do not reflect the typical challenges encountered in the industrial wild properly. Therefore, our first contribution is an open benchmark coined IRT2 (inductive reasoning with text) that (1) covers knowledge graphs of varying sizes (including very small ones), (2) comes with incidental, low-quality text mentions, and (3) includes not only triple completion but also ranking, which is relevant for supporting experts with discovery tasks. We investigate two neural models for inductive link prediction, one based on end-to-end learning and one that learns from the knowledge graph and text data in separate steps. These models compete with a strong bag-of-words baseline. The results show a significant advance in performance for the neural approaches as soon as the available graph data decreases for linking. For ranking, the results are promising, and the neural approaches outperform the sparse retriever by a wide margin.
translated by 谷歌翻译
Artificial Intelligence (AI) and Machine Learning (ML) are weaving their way into the fabric of society, where they are playing a crucial role in numerous facets of our lives. As we witness the increased deployment of AI and ML in various types of devices, we benefit from their use into energy-efficient algorithms for low powered devices. In this paper, we investigate a scale and medium that is far smaller than conventional devices as we move towards molecular systems that can be utilized to perform machine learning functions, i.e., Molecular Machine Learning (MML). Fundamental to the operation of MML is the transport, processing, and interpretation of information propagated by molecules through chemical reactions. We begin by reviewing the current approaches that have been developed for MML, before we move towards potential new directions that rely on gene regulatory networks inside biological organisms as well as their population interactions to create neural networks. We then investigate mechanisms for training machine learning structures in biological cells based on calcium signaling and demonstrate their application to build an Analog to Digital Converter (ADC). Lastly, we look at potential future directions as well as challenges that this area could solve.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bounds the largest change an adversary can make to a gradient-based explanation via bounded manipulation of either the input features or model parameters. By propagating a compact input or parameter set as symbolic intervals through the forwards and backwards computations of the neural network we can formally certify the robustness of gradient-based explanations. Our bounds are differentiable, hence we can incorporate provable explanation robustness into neural network training. Empirically, our method surpasses the robustness provided by previous heuristic approaches. We find that our training method is the only method able to learn neural networks with certificates of explanation robustness across all six datasets tested.
translated by 谷歌翻译
Training a Neural Radiance Field (NeRF) without pre-computed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes. However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. These priors are generated by correcting scale and shift parameters during training, with which we are then able to constrain the relative poses between consecutive frames. This constraint is achieved using our proposed novel loss functions. Experiments on real-world indoor and outdoor scenes show that our method can handle challenging camera trajectories and outperforms existing methods in terms of novel view rendering quality and pose estimation accuracy.
translated by 谷歌翻译
Predicting the political polarity of news headlines is a challenging task that becomes even more challenging in a multilingual setting with low-resource languages. To deal with this, we propose to utilise the Inferential Commonsense Knowledge via a Translate-Retrieve-Translate strategy to introduce a learning framework. To begin with, we use the method of translation and retrieval to acquire the inferential knowledge in the target language. We then employ an attention mechanism to emphasise important inferences. We finally integrate the attended inferences into a multilingual pre-trained language model for the task of bias prediction. To evaluate the effectiveness of our framework, we present a dataset of over 62.6K multilingual news headlines in five European languages annotated with their respective political polarities. We evaluate several state-of-the-art multilingual pre-trained language models since their performance tends to vary across languages (low/high resource). Evaluation results demonstrate that our proposed framework is effective regardless of the models employed. Overall, the best performing model trained with only headlines show 0.90 accuracy and F1, and 0.83 jaccard score. With attended knowledge in our framework, the same model show an increase in 2.2% accuracy and F1, and 3.6% jaccard score. Extending our experiments to individual languages reveals that the models we analyze for Slovenian perform significantly worse than other languages in our dataset. To investigate this, we assess the effect of translation quality on prediction performance. It indicates that the disparity in performance is most likely due to poor translation quality. We release our dataset and scripts at: https://github.com/Swati17293/KG-Multi-Bias for future research. Our framework has the potential to benefit journalists, social scientists, news producers, and consumers.
translated by 谷歌翻译
Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, which is essential in their visualizations, has been an obstacle to understanding and improving the robustness of feature attribution methods. In this paper, we provide new insights by taking such normalization into account. First, we show that for every non-negative homogeneous neural network, a naive $\ell_2$-robust criterion for gradients is \textit{not} normalization invariant, which means that two functions with the same normalized gradient can have different values. Second, we formulate a normalization invariant cosine distance-based criterion and derive its upper bound, which gives insight for why simply minimizing the Hessian norm at the input, as has been done in previous work, is not sufficient for attaining robust feature attribution. Finally, we propose to combine both $\ell_2$ and cosine distance-based criteria as regularization terms to leverage the advantages of both in aligning the local gradient. As a result, we experimentally show that models trained with our method produce much more robust interpretations on CIFAR-10 and ImageNet-100 without significantly hurting the accuracy, compared to the recent baselines. To the best of our knowledge, this is the first work to verify the robustness of interpretation on a larger-scale dataset beyond CIFAR-10, thanks to the computational efficiency of our method.
translated by 谷歌翻译