本文介绍了$ SE(3)$上的雷达射测方法,该方法利用了恒定的加速运动。运动先验被整合到滑动窗口优化方案中。我们使用Magnus扩展来准确整合运动,同时保持实时性能。此外,我们采用极地测量模型来更好地表示雷达检测不确定性。使用原型高分辨率雷达传感器的大型现实世界数据集评估我们的估计器。新的运动先验和测量模型明显地改善了相对于先前工作的恒定速度运动和笛卡尔测量模型,尤其是在滚动,音高和高度上。
translated by 谷歌翻译
Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric. Observing a strong correlation between the ground truth search metric and self-supervised losses, we introduce self-supervised AutoFlow to handle real-world videos without ground truth labels. Using self-supervised loss as the search metric, our self-supervised AutoFlow performs on par with AutoFlow on Sintel and KITTI where ground truth is available, and performs better on the real-world DAVIS dataset. We further explore using self-supervised AutoFlow in the (semi-)supervised setting and obtain competitive results against the state of the art.
translated by 谷歌翻译
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic segmentation it first infers CLIP features for every 3D point and later classifies them based on similarities to embeddings of arbitrary class labels. More interestingly, it enables a suite of open-vocabulary scene understanding applications that have never been done before. For example, it allows a user to enter an arbitrary text query and then see a heat map indicating which parts of a scene match. Our approach is effective at identifying objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data.
translated by 谷歌翻译
Artificial intelligence (AI) has enormous potential to improve Air Force pilot training by providing actionable feedback to pilot trainees on the quality of their maneuvers and enabling instructor-less flying familiarization for early-stage trainees in low-cost simulators. Historically, AI challenges consisting of data, problem descriptions, and example code have been critical to fueling AI breakthroughs. The Department of the Air Force-Massachusetts Institute of Technology AI Accelerator (DAF-MIT AI Accelerator) developed such an AI challenge using real-world Air Force flight simulator data. The Maneuver ID challenge assembled thousands of virtual reality simulator flight recordings collected by actual Air Force student pilots at Pilot Training Next (PTN). This dataset has been publicly released at Maneuver-ID.mit.edu and represents the first of its kind public release of USAF flight training data. Using this dataset, we have applied a variety of AI methods to separate "good" vs "bad" simulator data and categorize and characterize maneuvers. These data, algorithms, and software are being released as baselines of model performance for others to build upon to enable the AI ecosystem for flight simulator training.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.
translated by 谷歌翻译
Artificial Intelligence (AI) is one of the most transformative technologies of the 21st century. The extent and scope of future AI capabilities remain a key uncertainty, with widespread disagreement on timelines and potential impacts. As nations and technology companies race toward greater complexity and autonomy in AI systems, there are concerns over the extent of integration and oversight of opaque AI decision processes. This is especially true in the subfield of machine learning (ML), where systems learn to optimize objectives without human assistance. Objectives can be imperfectly specified or executed in an unexpected or potentially harmful way. This becomes more concerning as systems increase in power and autonomy, where an abrupt capability jump could result in unexpected shifts in power dynamics or even catastrophic failures. This study presents a hierarchical complex systems framework to model AI risk and provide a template for alternative futures analysis. Survey data were collected from domain experts in the public and private sectors to classify AI impact and likelihood. The results show increased uncertainty over the powerful AI agent scenario, confidence in multiagent environments, and increased concern over AI alignment failures and influence-seeking behavior.
translated by 谷歌翻译
Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning. Yet, they fail miserably on the recently proposed Winoground dataset, which challenges models to match paired images and English captions, with items constructed to overlap lexically but differ in meaning (e.g., "there is a mug in some grass" vs. "there is some grass in a mug"). By annotating the dataset using new fine-grained tags, we show that solving the Winoground task requires not just compositional language understanding, but a host of other abilities like commonsense reasoning or locating small, out-of-focus objects in low-resolution images. In this paper, we identify the dataset's main challenges through a suite of experiments on related tasks (probing task, image retrieval task), data augmentation, and manual inspection of the dataset. Our analysis suggests that a main challenge in visuolinguistic models may lie in fusing visual and textual representations, rather than in compositional language understanding. We release our annotation and code at https://github.com/ajd12342/why-winoground-hard .
translated by 谷歌翻译
Accurate high-altitude wind forecasting is important for air traffic control. And the large volume of data available for this task makes deep neural network-based models a possibility. However, special methods are required because the data is measured only sparsely: along the main aircraft trajectories and arranged sparsely in space, namely along the main air corridors. Several deep learning approaches have been proposed, and in this work, we show that Transformers can fit this data efficiently and are able to extrapolate coherently from a context set. We show this by an extensive comparison of Transformers to numerous existing deep learning-based baselines in the literature. Besides high-altitude wind forecasting, we compare competing models on other dynamical physical systems, namely those modelled by partial differential equations, in particular the Poisson equation and Darcy Flow equation. For these experiments, in the case where the data is arranged non-regularly in space, Transformers outperform all the other evaluated methods. We also compared them in a more standard setup where the data is arranged on a grid and show that the Transformers are competitive with state-of-the-art methods, even though it does not require regular spacing. The code and datasets of the different experiments will be made publicly available at publication time.
translated by 谷歌翻译
Neural Radiance Field (NeRF), a new novel view synthesis with implicit scene representation has taken the field of Computer Vision by storm. As a novel view synthesis and 3D reconstruction method, NeRF models find applications in robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, and more. Since the original paper by Mildenhall et al., more than 250 preprints were published, with more than 100 eventually being accepted in tier one Computer Vision Conferences. Given NeRF popularity and the current interest in this research area, we believe it necessary to compile a comprehensive survey of NeRF papers from the past two years, which we organized into both architecture, and application based taxonomies. We also provide an introduction to the theory of NeRF based novel view synthesis, and a benchmark comparison of the performance and speed of key NeRF models. By creating this survey, we hope to introduce new researchers to NeRF, provide a helpful reference for influential works in this field, as well as motivate future research directions with our discussion section.
translated by 谷歌翻译