智能论文笔记

Beyond Low Earth Orbit: Biomonitoring, Artificial Intelligence, and Precision Space Health

Ryan T. Scott , Erik L. Antonsen , Lauren M. Sanders , Jaden J. A. Hastings , Seung-min Park , Graham Mackintosh , Robert J. Reynolds , Adrienne L. Hoarfrost , Aenor Sawyer , Casey S. Greene

分类：机器学习

2021-12-22

超越地球轨道的人类空间勘探将涉及大量距离和持续时间的任务。为了有效减轻无数空间健康危害，数据和空间健康系统的范式转移是实现地球独立性的，而不是Earth-Reliance所必需的。有希望在生物学和健康的人工智能和机器学习领域的发展可以解决这些需求。我们提出了一个适当的自主和智能精密空间健康系统，可以监控，汇总和评估生物医学状态;分析和预测个性化不良健康结果;适应并响应新累积的数据;并提供对其船员医务人员的个人深度空间机组人员和迭代决策支持的预防性，可操作和及时的见解。在这里，我们介绍了美国国家航空航天局组织的研讨会的建议摘要，以便在太空生物学和健康中未来的人工智能应用。在未来十年，生物监测技术，生物标志科学，航天器硬件，智能软件和简化的数据管理必须成熟，并编织成精确的空间健康系统，以使人类在深空中茁壮成长。

translated by 谷歌翻译

Beyond Low Earth Orbit: Biological Research, Artificial Intelligence, and Self-Driving Labs

Lauren M. Sanders , Jason H. Yang , Ryan T. Scott , Amina Ann Qutub , Hector Garcia Martin , Daniel C. Berrios , Jaden J. A. Hastings , Jon Rask , Graham Mackintosh , Adrienne L. Hoarfrost

分类：机器学习

2021-12-22

空间生物学研究旨在了解太空飞行对生物的根本影响，制定支持深度空间探索的基础知识，最终生物工程航天器和栖息地稳定植物，农作物，微生物，动物和人类的生态系统，为持续的多行星寿命稳定。要提高这些目标，该领域利用了来自星空和地下模拟研究的实验，平台，数据和模型生物。由于研究扩展到低地球轨道之外，实验和平台必须是最大自主，光，敏捷和智能化，以加快知识发现。在这里，我们介绍了由美国国家航空航天局的人工智能，机器学习和建模应用程序组织的研讨会的建议摘要，这些应用程序为这些空间生物学挑战提供了关键解决方案。在未来十年中，将人工智能融入太空生物学领域将深化天空效应的生物学理解，促进预测性建模和分析，支持最大自主和可重复的实验，并有效地管理星载数据和元数据，所有目标使生活能够在深空中茁壮成长。

translated by 谷歌翻译

Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery

Odhran O'Donoghue , Paul Duckworth , Giuseppe Ughi , Linus Scheibenreif , Kia Khezeli , Adrienne Hoarfrost , Samuel Budd , Patrick Foley , Nicholas Chia , John Kalantari

分类：机器学习

2021-11-14

由于数据隐私问题，人类的医疗数据可能具有挑战性，难以进行某些类型的实验，或禁止的相关成本。在许多设置中，可以获得来自动物模型或体外细胞系的数据，以帮助增加我们对人类数据的理解。然而，与人类数据相比，该数据已知具有低病因有效性。在这项工作中，我们使用体外数据和动物模型增强了小型人类医疗数据集。我们使用不变的风险最小化（IRM）来阐明通过考虑属于不同数据生成环境的交叉器件数据来阐明不变的功能。我们的模型识别与人类癌症发展相关的基因。我们观察到不同于使用的人和小鼠数据的数量之间的一致性，但是需要进一步的工作来获得结论性见解。作为次要贡献，我们增强了现有的开源数据集，并提供了两个均匀加工，交叉生物的同源基因匹配的数据集。

translated by 谷歌翻译

Large Language Models as Corporate Lobbyists

John J. Nay

分类：自然语言处理

2023-01-03

We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.

translated by 谷歌翻译

Posterior Collapse and Latent Variable Non-identifiability

Yixin Wang , David M. Blei , John P. Cunningham

分类： (统计)机器学习 | 机器学习

2023-01-02

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译

PAC-Bayesian-Like Error Bound for a Class of Linear Time-Invariant Stochastic State-Space Models

Deividas Eringis , John Leth , Zheng-Hua Tan , Rafal Wisniewski , Mihaly Petreczky

分类： (统计)机器学习 | 机器学习

2022-12-30

In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.

translated by 谷歌翻译

Synthetic Aperture Sensing for Occlusion Removal with Drone Swarms

Rakesh John Amala Arokia Nathan , Indrajit Kurmi , Oliver Bimber

分类：机器人 | 计算机视觉

2022-12-30

We demonstrate how efficient autonomous drone swarms can be in detecting and tracking occluded targets in densely forested areas, such as lost people during search and rescue missions. Exploration and optimization of local viewing conditions, such as occlusion density and target view obliqueness, provide much faster and much more reliable results than previous, blind sampling strategies that are based on pre-defined waypoints. An adapted real-time particle swarm optimization and a new objective function are presented that are able to deal with dynamic and highly random through-foliage conditions. Synthetic aperture sensing is our fundamental sampling principle, and drone swarms are employed to approximate the optical signals of extremely wide and adaptable airborne lenses.

translated by 谷歌翻译

MAUVE Scores for Generative Models: Theory and Practice

Krishna Pillutla , Lang Liu , John Thickstun , Sean Welleck , Swabha Swayamdipta , Rowan Zellers , Sewoong Oh , Yejin Choi , Zaid Harchaoui

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-30

Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.

translated by 谷歌翻译

From Single-Visit to Multi-Visit Image-Based Models: Single-Visit Models are Enough to Predict Obstructive Hydronephrosis

Stanley Bryan Z. Hua , Mandy Rickard , John Weaver , Alice Xiang , Daniel Alvarez , Kyla N. Velear , Kunj Sheth , Gregory E. Tasian , Armando J. Lorenzo , Anna Goldenberg

分类：计算机视觉 | 人工智能

2022-12-27

Previous work has shown the potential of deep learning to predict renal obstruction using kidney ultrasound images. However, these image-based classifiers have been trained with the goal of single-visit inference in mind. We compare methods from video action recognition (i.e. convolutional pooling, LSTM, TSM) to adapt single-visit convolutional models to handle multiple visit inference. We demonstrate that incorporating images from a patient's past hospital visits provides only a small benefit for the prediction of obstructive hydronephrosis. Therefore, inclusion of prior ultrasounds is beneficial, but prediction based on the latest ultrasound is sufficient for patient risk stratification.

translated by 谷歌翻译