由于其高质量的重建以及将现有迭代求解器结合起来的易于性,因此最近将扩散模型作为强大的生成反问题解决器研究。但是,大多数工作都专注于在无噪声设置中解决简单的线性逆问题,这显着不足以使实际问题的复杂性不足。在这项工作中,我们将扩散求解器扩展求解器,以通过后采样的拉普拉斯近似有效地处理一般噪声(非)线性反问题。有趣的是,所得的后验采样方案是扩散采样的混合版本,具有歧管约束梯度,而没有严格的测量一致性投影步骤,与先前的研究相比,在嘈杂的设置中产生了更可取的生成路径。我们的方法表明,扩散模型可以结合各种测量噪声统计量,例如高斯和泊松,并且还有效处理嘈杂的非线性反问题,例如傅立叶相检索和不均匀的脱毛。
translated by 谷歌翻译
我们提出了一种监督学习稀疏促进正规化器的方法,以降低信号和图像。促进稀疏性正则化是解决现代信号重建问题的关键要素。但是,这些正规化器的基础操作员通常是通过手动设计的,要么以无监督的方式从数据中学到。监督学习(主要是卷积神经网络)在解决图像重建问题方面的最新成功表明,这可能是设计正规化器的富有成果的方法。为此,我们建议使用带有参数,稀疏的正规器的变异公式来贬低信号,其中学会了正常器的参数,以最大程度地减少在地面真实图像和测量对的训练集中重建的平均平方误差。培训涉及解决一个具有挑战性的双层优化问题;我们使用denoising问题的封闭形式解决方案得出了训练损失梯度的表达,并提供了随附的梯度下降算法以最大程度地减少其。我们使用结构化1D信号和自然图像的实验表明,所提出的方法可以学习一个超过众所周知的正规化器(总变化,DCT-SPARSITY和无监督的字典学习)的操作员和用于DeNoisis的协作过滤。尽管我们提出的方法是特定于denoising的,但我们认为它可以适应线性测量模型的较大类反问题,使其在广泛的信号重建设置中适用。
translated by 谷歌翻译
放射造影通常用于探测动态系统中的复杂,不断发展的密度字段,以便在潜在的物理学中实现进入洞察力。该技术已用于许多领域,包括材料科学,休克物理,惯性监禁融合和其他国家安全应用。然而,在许多这些应用中,噪声,散射,复杂光束动力学等的并发症防止了密度的重建足以足以识别具有足够置信度的底层物理。因此,来自静态/动态射线照相的密度重建通常限于在许多这些应用中识别诸如裂缝和空隙的不连续特征。在这项工作中,我们提出了一种从基本上重建密度的基本上新的射线照片序列的密度。仅使用射线照相识别的稳健特征,我们将它们与使用机器学习方法的底层流体动力方程组合,即条件生成对冲网络(CGAN),以从射线照片的动态序列确定密度字段。接下来,我们寻求通过参数估计和投影的过程进一步提高ML的密度重建的流体动力学一致性,并进入流体动力歧管。在这种情况下,我们注意到,训练数据给出的流体动力歧管在被认为的参数空间中给出的测试数据是用于预测的稳定性的诊断,并用于增强培训数据库,期望后者将进一步降低未来的密度重建错误。最后,我们展示了这种方法优于传统的射线照相重建在捕获允许的流体动力学路径中的能力,即使存在相对少量的散射。
translated by 谷歌翻译
我们提出了一种监督稀疏性促进计划的方法,是许多现代信号重建问题的关键成分。学习规范器的参数,以最小化在训练地面真理信号和测量对训练组中重建的平均平方误差。培训涉及解决一个充满挑战性的彼得利优化问题,具有非光滑的下层目标。我们使用其双重问题给出的较低级别变分问题的隐式闭合状态来得出培训损失梯度的表达,并提供伴随梯度下降算法(被称为Blorc)以最小化损耗。我们在简单的自然图像和用于去噪1D信号的实验表明,该方法可以学习有意义的运算符,并且计算的分析梯度比标准自动分化方法更快。虽然我们存在的方法适用于去噪,但我们认为它可以适应线性测量模型的各种逆问题,从而使其在各种场景中适用。
translated by 谷歌翻译
In this paper, we propose a novel deep convolutional neural network (CNN)-based algorithm for solving ill-posed inverse problems. Regularized iterative algorithms have emerged as the standard approach to ill-posed inverse problems in the past few decades. These methods produce excellent results, but can be challenging to deploy in practice due to factors including the high computational cost of the forward and adjoint operators and the difficulty of hyper parameter selection. The starting point of our work is the observation that unrolled iterative methods have the form of a CNN (filtering followed by point-wise non-linearity) when the normal operator (H * H, the adjoint of H times H) of the forward model is a convolution. Based on this observation, we propose using direct inversion followed by a CNN to solve normal-convolutional inverse problems. The direct inversion encapsulates the physical model of the system, but leads to artifacts when the problem is ill-posed; the CNN combines multiresolution decomposition and residual learning in order to learn to remove these artifacts while preserving image structure. We demonstrate the performance of the proposed network in sparse-view reconstruction (down to 50 views) on parallel beam X-ray computed tomography in synthetic phantoms as well as in real experimental sinograms. The proposed network outperforms total variation-regularized iterative reconstruction for the more realistic phantoms and requires less than a second to reconstruct a 512 × 512 image on the GPU. K.H. Jin acknowledges the support from the "EPFL Fellows" fellowship program co-funded by Marie Curie from the European Unions Horizon 2020 Framework Programme for Research and Innovation under grant agreement 665667.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with custom code and ready for complex training task such as generation of 3d molecular structures.
translated by 谷歌翻译
Antrophonegic pressure (i.e. human influence) on the environment is one of the largest causes of the loss of biological diversity. Wilderness areas, in contrast, are home to undisturbed ecological processes. However, there is no biophysical definition of the term wilderness. Instead, wilderness is more of a philosophical or cultural concept and thus cannot be easily delineated or categorized in a technical manner. With this paper, (i) we introduce the task of wilderness mapping by means of machine learning applied to satellite imagery (ii) and publish MapInWild, a large-scale benchmark dataset curated for that task. MapInWild is a multi-modal dataset and comprises various geodata acquired and formed from a diverse set of Earth observation sensors. The dataset consists of 8144 images with a shape of 1920 x 1920 pixels and is approximately 350 GB in size. The images are weakly annotated with three classes derived from the World Database of Protected Areas - Strict Nature Reserves, Wilderness Areas, and National Parks. With the dataset, which shall serve as a testbed for developments in fields such as explainable machine learning and environmental remote sensing, we hope to contribute to a deepening of our understanding of the question "What makes nature wild?".
translated by 谷歌翻译