智能论文笔记

Reinforcement Learning for Agile Active Target Sensing with a UAV

Harsh Goel , Laura Jarin Lipschitz , Saurav Agarwal , Sandeep Manjanna , Vijay Kumar

分类：机器人 | 人工智能

2022-12-16

Active target sensing is the task of discovering and classifying an unknown number of targets in an environment and is critical in search-and-rescue missions. This paper develops a deep reinforcement learning approach to plan informative trajectories that increase the likelihood for an uncrewed aerial vehicle (UAV) to discover missing targets. Our approach efficiently (1) explores the environment to discover new targets, (2) exploits its current belief of the target states and incorporates inaccurate sensor models for high-fidelity classification, and (3) generates dynamically feasible trajectories for an agile UAV by employing a motion primitive library. Extensive simulations on randomly generated environments show that our approach is more efficient in discovering and classifying targets than several other baselines. A unique characteristic of our approach, in contrast to heuristic informative path planning approaches, is that it is robust to varying amounts of deviations of the prior belief from the true target distribution, thereby alleviating the challenge of designing heuristics specific to the application conditions.

translated by 谷歌翻译

Multi-Robot Coordination and Cooperation with Task Precedence Relationships

Walker Gosrich , Siddharth Mayya , Saaketh Narayan , Matthew Malencia , Saurav Agarwal , Vijay Kumar

分类：机器人

2022-09-28

我们为多机器人任务计划和分配问题提出了一种新的公式，该公式结合了（a）任务之间的优先关系；（b）任务的协调，允许多个机器人提高效率；（c）通过形成机器人联盟的任务合作，而单独的机器人不能执行。在我们的公式中，任务图指定任务和任务之间的关系。我们在任务图的节点和边缘上定义了一组奖励函数。这些功能对机器人联盟规模对任务绩效的影响进行建模，并结合一个任务的性能对依赖任务的影响。最佳解决此问题是NP-HARD。但是，使用任务图公式使我们能够利用最小成本的网络流量方法有效地获得近似解决方案。此外，我们还探索了一种混合整数编程方法，该方法为问题的小实例提供了最佳的解决方案，但计算上很昂贵。我们还开发了一种贪婪的启发式算法作为基准。我们的建模和解决方案方法导致任务计划，即使在与许多代理商的大型任务中，也利用任务优先关系的关系以及机器人的协调和合作来实现高级任务绩效。

translated by 谷歌翻译

ParaColorizer: Realistic Image Colorization using Parallel Generative Networks

Himanshu Kumar , Abeer Banerjee , Sumeet Saurav , Sanjay Singh

分类：计算机视觉

2022-08-17

灰度图像着色是AI在信息恢复中的引人入胜的应用。该问题的天生性质不良的性质使其更具挑战性，因为输出可能是多模式的。目前正在使用的基于学习的方法为直接情况产生可接受的结果，但在没有明确的图形分离的情况下通常无法恢复上下文信息。同样，由于在完整图像特征上训练的单个模型不足以学习各种数据模式，因此图像遭受了颜色出血和饱和背景。为了解决这些问题，我们提出了一个基于GAN的配色框架。在我们的方法中，每个量身定制的GAN管道都会使前景（使用对象级特征）或背景（使用全图像功能）着色。前景管道采用了一个具有自我注意事项的残留无UNET作为其发电机，使用了全图像功能和可可数据集中的相应对象级特征训练。背景管道依赖于该位置数据集的全图像功能和其他培训示例。我们设计了一个基于密集的融合网络，以通过基于特征的融合来获得最终的有色图像。我们显示了通常用于评估多模式问题（例如图像着色）并使用多个感知指标对我们的框架进行广泛的绩效评估的非感知评估指标的缺点。我们的方法的表现优于大多数基于学习的方法，并且产生的结果与最新的方法相当。此外，我们进行了运行时分析，并获得了每个图像的平均推理时间24ms。

translated by 谷歌翻译

"A Passage to India": Pre-trained Word Embeddings for Indian Languages

Kumar Saurav , Kumar Saunack , Diptesh Kanojia , Pushpak Bhattacharyya

分类：自然语言处理

2021-12-27

编码单词语义属性的密集词向量或“Word Embeddings”现在已成为机器翻译（MT），问题应答（QA），字感消解（WSD）和信息检索（IR）中的NLP任务的积分。在本文中，我们使用各种现有方法为14个印度语言创建多个单词嵌入。我们将这些嵌入的嵌入式为所有这些语言，萨姆萨姆，孟加拉，古吉拉蒂，印地教派，kannada，konkani，malayalam，marathi，尼泊尔，odiya，punjabi，梵语，泰米尔和泰雅古士在一个单一的存储库中。相对较新的方法，强调迎合上下文（BERT，ELMO等），表明了显着的改进，但需要大量资源来产生可用模型。我们释放使用上下文和非上下文方法生成的预训练嵌入。我们还使用Muse和XLM来培训所有上述语言的交叉语言嵌入。为了展示我们嵌入的效果，我们为所有这些语言评估了我们对XPOS，UPOS和NER任务的嵌入模型。我们使用8种不同的方法释放了436个型号。我们希望他们对资源受限的印度语言NLP有用。本文的标题是指最初在1924年出版的福斯特的着名小说“一段是印度”。

translated by 谷歌翻译

Controllable Response Generation for Assistive Use-cases

Shachi H Kumar , Hsuan Su , Ramesh Manuvinakurike , Saurav Sahay , Lama Nachman

分类：自然语言处理

2021-12-04

会话代理已成为简单任务允许情况的一般人群的组成部分。然而，这些系统尚未对各种和少数群体的任何社会影响，例如，帮助患有神经系统障碍的人，例如ALS和言语，语言和社交交流障碍的人。语言模型技术可以发挥巨大作用，以帮助这些用户进行日常沟通和社交互动。要启用此群体，我们构建了一个对话系统，可以使用CUES或关键字的用户控制。我们构建可以在用于控制响应生成的对话响应上下文中建立相关提示的模型，并可以加快通信。我们还介绍了一个关键字丢失来限制模型输出。我们在定性和定量上展示我们的模型可以有效地将关键字诱导到模型响应中，而不会降低响应的质量。在使用退行性障碍的人的使用情况的背景下，我们展示了对我们的提示或关键字预测器和可控对话系统的人类评估，并显示我们的模型比没有控制的模型更好地表现更好。我们的研究表明，在结束到结束响应生成模型的关键字控制是强大的，可以使用户能够与退行性疾病启用和赋予日常通信的日常沟通。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译