智能论文笔记

Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion

Zian Wang , Wenzheng Chen , David Acuna , Jan Kautz , Sanja Fidler

分类：计算机视觉

2022-08-19

我们考虑了户外照明估算的挑战性问题，即影像逼真的虚拟对象将其插入照片中的目标。现有在室外照明估计的作品通常将场景照明简化为环境图，该图无法捕获室外场景中的空间变化的照明效果。在这项工作中，我们提出了一种神经方法，该方法可以从单个图像中估算5D HDR光场，以及一个可区分的对象插入公式，该公式可以通过基于图像的损失来端对端训练，从而鼓励现实主义。具体而言，我们设计了针对室外场景量身定制的混合照明表示，其中包含一个HDR Sky Dome，可处理太阳的极端强度，并具有体积的照明表示，该代表模拟了周围场景的空间变化外观。通过估计的照明，我们的阴影感知对象插入是完全可区分的，这使得对复合图像的对抗训练可以为照明预测提供其他监督信号。我们在实验上证明，混合照明表示比现有的室外照明估计方法更具性能。我们进一步显示了AR对象插入在自主驾驶应用程序中的好处，在对我们的增强数据进行培训时，我们可以在其中获得3D对象检测器的性能提高。

translated by 谷歌翻译

How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

Rafid Mahmood , James Lucas , David Acuna , Daiqing Li , Jonah Philion , Jose M. Alvarez , Zhiding Yu , Sanja Fidler , Marc T. Law

分类：计算机视觉 | 机器学习

2022-07-04

给定一个较小的培训数据集和学习算法，要达到目标验证或测试性能需要多少数据？这个问题至关重要，在诸如自动驾驶或医学成像之类的应用中，收集数据昂贵且耗时。高估或低估数据需求会带来大量费用，而预算可以避免。关于神经缩放定律的先前工作表明，幂律函数可以符合验证性能曲线并将其推断为较大的数据集大小。我们发现，这并不能立即转化为估计所需数据集大小以满足目标性能的更困难的下游任务。在这项工作中，我们考虑了一系列的计算机视觉任务，并系统地研究了一个概括功能功能的功能家族，以便更好地估算数据需求。最后，我们表明，结合调整的校正因子并在多个回合中收集会显着提高数据估计器的性能。使用我们的准则，从业人员可以准确估算机器学习系统的数据要求，以节省开发时间和数据采集成本。

translated by 谷歌翻译

Scalable Neural Data Server: A Data Recommender for Transfer Learning

Tianshi Cao , Sasha Doubov , David Acuna , Sanja Fidler

分类：机器学习 | 计算机视觉

2022-06-19

在实践中，在实践中应用机器学习算法的瓶颈缺乏大规模标记的数据。转移学习是利用其他数据来改善下游性能的流行策略，但是找到最相关的数据可能是具有挑战性的。神经数据服务器（NDS）是一种为给定的下游任务提供相关数据的搜索引擎，以前已被提议解决此问题。 NDS使用经过数据源培训的专家组合，以估计每个源和下游任务之间的相似性。因此，每个用户的计算成本都随着来源的数量而增长。为了解决这些问题，我们提出了可扩展的神经数据服务器（SND），这是一种大规模搜索引擎，理论上可以索引数千个数据集以将相关的ML数据提供给最终用户。 SND在初始化过程中训练专家在中介数据集上的混合物，并通过与中介数据集的近距离表示数据源和下游任务。因此，随着新数据集添加到服务器中，SNDS用户产生的计算成本仍然固定。我们验证SND在许多现实世界任务上，发现SNDS推荐的数据改善了基线的下游任务性能。我们还通过显示其选择相关数据以在自然图像设置之外传输的能力来证明SND的可伸缩性。

translated by 谷歌翻译

Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation

David Acuna , Jonah Philion , Sanja Fidler

分类：计算机视觉 | 人工智能 | 机器学习

2021-11-15

自动驾驶依赖于大量的现实数据被标记为高精度。替代解决方案寻求利用驾驶模拟器，该模拟器可以使用多种内容变体产生大量标记数据。但是，合成和实际数据之间的域间隙仍然存在，提高以下重要问题：利用自动驾驶模拟器进行感知任务的最佳方法是什么？在这项工作中，我们建立了域 - 适应理论的最近进步之上，从这个角度来看，提出了最小化现实差距的方法。我们主要专注于单独使用合成域中的标签。我们的方法介绍了学习神经不变的表示的原则方法以及关于如何从模拟器对数据进行采样的理论上灵感的视图。我们的方法在实践中易于实施，因为它是网络架构的不可知论由和模拟器的选择。我们在使用开源模拟器（Carla）的多传感器数据（摄像机，LIDAR）上展示了我们的方法，使用开源模拟器（Carla），并在真实世界数据集（NUSCENES）上评估整个框架。最后但并非最不重要的是，在用驾驶模拟器训练时，我们展示了在感知网络中对感知网络的任何类型的变化（例如天气状况，资产，地图设计和色彩分集），并且可以使用我们的域适配技术来补偿这些类型。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Posterior Collapse and Latent Variable Non-identifiability

Yixin Wang , David M. Blei , John P. Cunningham

分类： (统计)机器学习 | 机器学习

2023-01-02

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Morphology-based non-rigid registration of coronary computed tomography and intravascular images through virtual catheter path optimization

Karim Kadry , Abhishek Karmakar , Andreas Schuh , Kersten Peterson , Michiel Schaap , David Marlevi , Charles Taylor , Elazer Edelman , Farhad Nezami

分类：计算机视觉

2022-12-30

Coronary Computed Tomography Angiography (CCTA) provides information on the presence, extent, and severity of obstructive coronary artery disease. Large-scale clinical studies analyzing CCTA-derived metrics typically require ground-truth validation in the form of high-fidelity 3D intravascular imaging. However, manual rigid alignment of intravascular images to corresponding CCTA images is both time consuming and user-dependent. Moreover, intravascular modalities suffer from several non-rigid motion-induced distortions arising from distortions in the imaging catheter path. To address these issues, we here present a semi-automatic segmentation-based framework for both rigid and non-rigid matching of intravascular images to CCTA images. We formulate the problem in terms of finding the optimal \emph{virtual catheter path} that samples the CCTA data to recapitulate the coronary artery morphology found in the intravascular image. We validate our co-registration framework on a cohort of $n=40$ patients using bifurcation landmarks as ground truth for longitudinal and rotational registration. Our results indicate that our non-rigid registration significantly outperforms other co-registration approaches for luminal bifurcation alignment in both longitudinal (mean mismatch: 3.3 frames) and rotational directions (mean mismatch: 28.6 degrees). By providing a differentiable framework for automatic multi-modal intravascular data fusion, our developed co-registration modules significantly reduces the manual effort required to conduct large-scale multi-modal clinical studies while also providing a solid foundation for the development of machine learning-based co-registration approaches.

translated by 谷歌翻译

Controllable Mechanical-domain Energy Accumulators

Sung Y. Kim , David J. Braun

分类：机器人

2022-12-29

Springs are efficient in storing and returning elastic potential energy but are unable to hold the energy they store in the absence of an external load. Lockable springs use clutches to hold elastic potential energy in the absence of an external load but have not yet been widely adopted in applications, partly because clutches introduce design complexity, reduce energy efficiency, and typically do not afford high-fidelity control over the energy stored by the spring. Here, we present the design of a novel lockable compression spring that uses a small capstan clutch to passively lock a mechanical spring. The capstan clutch can lock up to 1000 N force at any arbitrary deflection, unlock the spring in less than 10 ms with a control force less than 1 % of the maximal spring force, and provide an 80 % energy storage and return efficiency (comparable to a highly efficient electric motor operated at constant nominal speed). By retaining the form factor of a regular spring while providing high-fidelity locking capability even under large spring forces, the proposed design could facilitate the development of energy-efficient spring-based actuators and robots.

translated by 谷歌翻译