智能论文笔记

Transformers for prompt-level EMA non-response prediction

Supriya Nagesh , Alexander Moreno , Stephanie M. Carpenter , Jamie Yap , Soujanya Chatterjee , Steven Lloyd Lizotte , Neng Wan , Santosh Kumar , Cho Lam , David W. Wetter

分类：机器学习

2021-11-01

生态瞬间评估（EMAS）是用于测量移动卫生（MHECHEATH）研究和治疗方案的当前认知状态，影响，行为和环境因素的重要心理数据源。非反应，其中参与者未能响应EMA提示，是一个地方问题。准确预测非响应的能力可用于改善EMA交付和发展顺应性干预。事先工作已经探索了古典机器学习模型，以预测非反应。然而，正如越来越大的EMA数据集可用，有可能利用在其他领域有效的深度学习模型。最近，变压器模型在NLP和其他域中显示了最先进的性能。这项工作是第一个探索用于EMA数据分析的变压器的使用。我们在将变压器应用于EMA数据时解决了三个关键问题：1。输入表示，2.编码时间信息，3.预先培训提高下游预测任务性能的效用。变压器模型实现了0.77的非响应预测AUC，并且明显优于古典ML和基于LSTM的深度学习模型。我们将使我们的一个预测模型在研究界可自由地提供40k EMA样品的核查，以便于开发未来的基于变压器的EMA分析工作。

translated by 谷歌翻译

AdverSAR: Adversarial Search and Rescue via Multi-Agent Reinforcement Learning

Aowabin Rahman , Arnab Bhattacharya , Thiagarajan Ramachandran , Sayak Mukherjee , Himanshu Sharma , Ted Fujimoto , Samrat Chatterjee

分类：机器人 | 机器学习

2022-12-20

Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.

translated by 谷歌翻译

Observability-aware online multi-lidar extrinsic calibration

Sandipan Das , Ludvig af Klinteberg , Maurice Fallon , Saikat Chatterjee

分类：机器人

2022-12-19

Accurate and robust extrinsic calibration is necessary for deploying autonomous systems which need multiple sensors for perception. In this paper, we present a robust system for real-time extrinsic calibration of multiple lidars in vehicle base frame without the need for any fiducial markers or features. We base our approach on matching absolute GNSS and estimated lidar poses in real-time. Comparing rotation components allows us to improve the robustness of the solution than traditional least-square approach comparing translation components only. Additionally, instead of comparing all corresponding poses, we select poses comprising maximum mutual information based on our novel observability criteria. This allows us to identify a subset of the poses helpful for real-time calibration. We also provide stopping criteria for ensuring calibration completion. To validate our approach extensive tests were carried out on data collected using Scania test vehicles (7 sequences for a total of ~ 6.5 Km). The results presented in this paper show that our approach is able to accurately determine the extrinsic calibration for various combinations of sensor setups.

translated by 谷歌翻译

Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks

Mathias Lechner , Đorđe Žikelić , Krishnendu Chatterjee , Thomas A. Henzinger , Daniela Rus

分类：机器学习

2022-11-29

We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs.

translated by 谷歌翻译

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Xinyan Velocity Yu , Akari Asai , Trina Chatterjee , Junjie Hu , Eunsol Choi

分类：自然语言处理 | 人工智能

2022-11-28

While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.

translated by 谷歌翻译

A survey of some recent developments in measures of association

Sourav Chatterjee

分类：机器学习 | (统计)机器学习

2022-11-09

This paper surveys some recent developments in measures of association related to a new coefficient of correlation introduced by the author. A straightforward extension of this coefficient to standard Borel spaces (which includes all Polish spaces), overlooked in the literature so far, is proposed at the end of the survey.

translated by 谷歌翻译

Centaur: Federated Learning for Constrained Edge Devices

Fan Mo , Mohammad Malekzadeh , Soumyajit Chatterjee , Fahim Kawsar , Akhil Mathur

分类：机器学习

2022-11-08

Federated learning (FL) on deep neural networks facilitates new applications at the edge, especially for wearable and Internet-of-Thing devices. Such devices capture a large and diverse amount of data, but they have memory, compute, power, and connectivity constraints which hinder their participation in FL. We propose Centaur, a multitier FL framework, enabling ultra-constrained devices to efficiently participate in FL on large neural nets. Centaur combines two major ideas: (i) a data selection scheme to choose a portion of samples that accelerates the learning, and (ii) a partition-based training algorithm that integrates both constrained and powerful devices owned by the same user. Evaluations, on four benchmark neural nets and three datasets, show that Centaur gains ~10% higher accuracy than local training on constrained devices with ~58% energy saving on average. Our experimental results also demonstrate the superior efficiency of Centaur when dealing with imbalanced data, client participation heterogeneity, and various network connection probabilities.

translated by 谷歌翻译

Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees

Đorđe Žikelić , Mathias Lechner , Thomas A. Henzinger , Krishnendu Chatterjee

分类：机器学习 | 人工智能

2022-10-11

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.

translated by 谷歌翻译

WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs

Hoang Thang Ta , Abu Bakar Siddiqur Rahman , Navonil Majumder , Amir Hussain , Lotfollah Najjar , Newton Howard , Soujanya Poria , Alexander Gelbukh

分类：自然语言处理

2022-09-27

由于免费的在线百科全书具有大量内容，因此Wikipedia和Wikidata是许多自然语言处理（NLP）任务的关键，例如信息检索，知识基础构建，机器翻译，文本分类和文本摘要。在本文中，我们介绍了Wikides，这是一个新颖的数据集，用于为文本摘要问题提供Wikipedia文章的简短描述。该数据集由6987个主题上的80K英语样本组成。我们设置了一种两阶段的摘要方法 - 描述生成（I阶段）和候选排名（II阶段）作为一种依赖于转移和对比学习的强大方法。对于描述生成，与其他小规模的预训练模型相比，T5和BART表现出了优越性。通过将对比度学习与Beam Search的不同输入一起应用，基于度量的排名模型优于直接描述生成模型，在主题独立拆分和独立于主题的独立拆分中，最高可达22个胭脂。此外，第II期中的结果描述得到了人类评估的支持，其中45.33％以上，而I阶段的23.66％则支持针对黄金描述。在情感分析方面，生成的描述无法有效地从段落中捕获所有情感极性，同时从黄金描述中更好地完成此任务。自动产生的新描述减少了人类为创建它们的努力，并丰富了基于Wikidata的知识图。我们的论文对Wikipedia和Wikidata产生了实际影响，因为有成千上万的描述。最后，我们预计Wikides将成为从短段落中捕获显着信息的相关作品的有用数据集。策划的数据集可公开可用：https：//github.com/declare-lab/wikides。

translated by 谷歌翻译

Multi-segmented Adaptive Feet for Versatile Legged Locomotion in Natural Terrain

Abhishek Chatterjee , An Mo , Bernadett Kiss , Emre Cemal Gonen , Alexander Badri-Spröwitz

分类：机器人

2022-09-18

大多数腿部机器人都是由串行安装链路和执行器的腿部结构构建的，并通过复杂的控制器和传感器反馈来控制。相比之下，动物发展了多段腿，关节之间的机械耦合以及多段的脚。它们在所有地形上运行敏捷，可以说是通过更简单的运动控制。在这里，我们专注于开发抗原在自然地形上也滑落和下沉的脚步机制。我们提出了安装在具有多接头机械肌腱耦合的鸟类灵感机器人腿上的多段脚的首先结果。我们的单段和两段机械自适应的脚显示在开始滑动之前，在多个软和硬质基材上显示了可行的水平力。我们还观察到，与球形和圆柱 - 脚相比，分割的脚减少了软底物上的下沉。我们报告了多段脚如何提供非常适合双皮亚机器人的可行压力点的范围范围，还适用于斜坡和自然地形上的四倍机器人。我们的结果还提供了对诸如级别鸟类等动物的分段脚的功能理解。

translated by 谷歌翻译