Multispectral imaging has been used for numerous applications in e.g., environmental monitoring, aerospace, defense, and biomedicine. Here, we present a diffractive optical network-based multispectral imaging system trained using deep learning to create a virtual spectral filter array at the output image field-of-view. This diffractive multispectral imager performs spatially-coherent imaging over a large spectrum, and at the same time, routes a pre-determined set of spectral channels onto an array of pixels at the output plane, converting a monochrome focal plane array or image sensor into a multispectral imaging device without any spectral filters or image recovery algorithms. Furthermore, the spectral responsivity of this diffractive multispectral imager is not sensitive to input polarization states. Through numerical simulations, we present different diffractive network designs that achieve snapshot multispectral imaging with 4, 9 and 16 unique spectral bands within the visible spectrum, based on passive spatially-structured diffractive surfaces, with a compact design that axially spans ~72 times the mean wavelength of the spectral band of interest. Moreover, we experimentally demonstrate a diffractive multispectral imager based on a 3D-printed diffractive network that creates at its output image plane a spatially-repeating virtual spectral filter array with 2x2=4 unique bands at terahertz spectrum. Due to their compact form factor and computation-free, power-efficient and polarization-insensitive forward operation, diffractive multispectral imagers can be transformative for various imaging and sensing applications and be used at different parts of the electromagnetic spectrum where high-density and wide-area multispectral pixel arrays are not widely available.
translated by 谷歌翻译
A unidirectional imager would only permit image formation along one direction, from an input field-of-view (FOV) A to an output FOV B, and in the reverse path, the image formation would be blocked. Here, we report the first demonstration of unidirectional imagers, presenting polarization-insensitive and broadband unidirectional imaging based on successive diffractive layers that are linear and isotropic. These diffractive layers are optimized using deep learning and consist of hundreds of thousands of diffractive phase features, which collectively modulate the incoming fields and project an intensity image of the input onto an output FOV, while blocking the image formation in the reverse direction. After their deep learning-based training, the resulting diffractive layers are fabricated to form a unidirectional imager. As a reciprocal device, the diffractive unidirectional imager has asymmetric mode processing capabilities in the forward and backward directions, where the optical modes from B to A are selectively guided/scattered to miss the output FOV, whereas for the forward direction such modal losses are minimized, yielding an ideal imaging system between the input and output FOVs. Although trained using monochromatic illumination, the diffractive unidirectional imager maintains its functionality over a large spectral band and works under broadband illumination. We experimentally validated this unidirectional imager using terahertz radiation, very well matching our numerical results. Using the same deep learning-based design strategy, we also created a wavelength-selective unidirectional imager, where two unidirectional imaging operations, in reverse directions, are multiplexed through different illumination wavelengths. Diffractive unidirectional imaging using structured materials will have numerous applications in e.g., security, defense, telecommunications and privacy protection.
translated by 谷歌翻译
置换矩阵构成了一个重要的计算构建块,这些构建块在各个领域中经常使用,例如通信,信息安全和数据处理。具有相对较大数量的基于功率,快速和紧凑型平台的输入输出互连的置换运算符的光学实现是非常可取的。在这里,我们提出了通过深度学习设计的衍射光学网络,以全面执行置换操作,可以使用被动的传播层在输入和视场之间扩展到数十万个互连,这些互连是在波长规模上单独构造的。 。我们的发现表明,衍射光网络在近似给定置换操作中的容量与系统中衍射层和可训练的传输元件的数量成正比。这种更深的衍射网络设计可以在系统的物理对齐和输出衍射效率方面构成实际挑战。我们通过设计不对对准的衍射设计来解决这些挑战,这些设计可以全面执行任意选择的置换操作,并首次在实验中证明了在频谱的THZ部分运行的衍射排列网络。衍射排列网络可能会在例如安全性,图像加密和数据处理以及电信中找到各种应用程序;尤其是在无线通信中的载波频率接近THZ波段的情况下,提出的衍射置换网络可以潜在地充当无线网络中的通道路由和互连面板。
translated by 谷歌翻译
波前调节器的限制空间散宽产品(SBP)阻碍了大型视野(FOV)上图像的高分辨率合成/投影。我们报告了一种深度学习的衍射显示设计,该设计基于一对训练的电子编码器和衍射光学解码器,用于合成/项目超级分辨图像,使用低分辨率波形调节器。由训练有素的卷积神经网络(CNN)组成的数字编码器迅速预处理了感兴趣的高分辨率图像,因此它们的空间信息被编码为低分辨率(LR)调制模式,该模式通过低SBP Wavefront调制器投影。衍射解码器使用薄的传播层处理该LR编码的信息,这些层是使用深度学习构成的,以在其输出FOV处进行全面合成和项目超级分辨图像。我们的结果表明,这种衍射图像显示可以达到〜4的超分辨率因子,表明SBP增加了约16倍。我们还使用3D打印的衍射解码器在THZ光谱上进行实验验证了这种衍射超分辨率显示器的成功。该衍射图像解码器可以缩放以在可见的波长下运行,并激发紧凑,低功率和计算效率的大型FOV和高分辨率显示器的设计。
translated by 谷歌翻译
Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.
translated by 谷歌翻译
Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart these possibilities, we introduce ALERT, a benchmark and suite of analyses for assessing language models' reasoning ability comparing pre-trained and finetuned models on complex tasks that require reasoning skills to solve. ALERT provides a test bed to asses any language model on fine-grained reasoning skills, which spans over 20 datasets and covers 10 different reasoning skills. We leverage ALERT to further investigate the role of finetuning. With extensive empirical analysis we find that language models learn more reasoning skills such as textual entailment, abductive reasoning, and analogical reasoning during finetuning stage compared to pretraining state. We also find that when language models are finetuned they tend to overfit to the prompt template, which hurts the robustness of models causing generalization problems.
translated by 谷歌翻译
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that - across 6 architectures trained with 4 algorithms on massive datasets - they exhibit little compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark CREPE which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of three test datasets. The three test sets are designed to test models trained on three of the popular training datasets: CC-12M, YFCC-15M, and LAION-400M. They contain 385K, 385K, and 373K image-text pairs and 237K, 210K, and 178K hard negative captions. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 246K hard negative captions with atomic, swapping, and negation foils. The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3. For systematicity, we find that model performance decreases consistently when novel compositions dominate the retrieval set, with Recall@1 dropping by up to 8%. For productivity, models' retrieval success decays as complexity increases, frequently nearing random chance at high complexity. These results hold regardless of model and training dataset size.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
Federated learning (FL) enables the building of robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.
translated by 谷歌翻译
智能运输系统(ITS)对可持续和绿色城市生活的发展至关重要。它是数据驱动的,并通过从气管到智能相机的传感器大量来启用。这项工作探索了基于基于光纤的分布式声传感器(DAS)的新型数据源,以进行交通分析。检测车辆的类型和估计车辆的占用是其主要关注点。第一个是由于需要跟踪,控制和预测交通流的动机。第二个目标是对高占用车辆车道的调节,以减少排放和拥堵。这些任务通常是通过检查车辆或使用新兴计算机视觉技术来执行的。前者不可扩展或有效,而后者对乘客的隐私有侵入性。为此,我们提出了一种深度学习技术,以分析DAS信号,以通过连续感应和不暴露个人信息来应对这一挑战。我们提出了一种处理DAS信号的深度学习方法,并基于在受控条件下收集的DAS数据来实现92%的车辆分类准确性和92-97%的占用检测。
translated by 谷歌翻译