智能论文笔记

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Deep Convolutional Architectures for Extrapolative Forecast in Time-dependent Flow Problems

Pratyush Bhatt , Yash Kumar , Azzeddine Soulaimani

分类：机器学习

2022-09-18

动力学受部分微分方程（PDE）控制的物理系统在许多领域（从工程设计到天气预报）中找到了应用。从此类PDE中获取解决方案的过程对于大规模和参数化问题的计算昂贵。在这项工作中，使用LSTM和TCN等时间表预测开发的深度学习技术，或用于为CNN等空间功能提取而开发的，用于建模系统动力学，以占主导问题。这些模型将输入作为从PDE获得的连续时间步长的一系列高保真矢量解，并预测使用自动回归的后续时间步长的解决方案；从而减少获得此类高保真解决方案所需的计算时间和功率。这些模型经过数值基准测试（1D汉堡的方程式和Stoker的大坝断裂问题），以评估长期预测准确性，甚至在训练域之外（外推）。在向预测模型输入之前，使用非侵入性的降低订购建模技术（例如深度自动编码网络）来压缩高保真快照，以减少在线和离线阶段的复杂性和所需的计算。深层合奏被用来对预测模型进行不确定性量化，该模型提供了有关认知不确定性导致预测方差的信息。

translated by 谷歌翻译

JARVix at SemEval-2022 Task 2: It Takes One to Know One? Idiomaticity Detection using Zero and One Shot Learning

Yash Jakhotiya , Vaibhav Kumar , Ashwin Pathak , Raj Shah

分类：自然语言处理 | 人工智能 | 机器学习

2022-02-04

通过捕获文本表示的组成性，大型语言模型在各种自然语言处理任务中取得了成功。尽管它们取得了巨大的成功，但这些向量表示未能捕获惯用多字表达式（MWES）的含义。在本文中，我们专注于使用二进制分类检测惯用表达式。我们使用一个数据集，该数据集包括英语和葡萄牙语中MWE的字面用法和惯用性。此后，我们在两个不同的设置中执行分类：零射门和一个镜头，以确定给定的句子是否包含成语。 n个任务的n射击分类是由训练和测试集之间的n个常见成语数定义的。在本文中，我们在设置中训练多个大型语言模型，并在零射击设置中获得0.73的F1分数（宏），一个射击设置为0.85的F1分数（宏）。可以在https://github.com/ashwinpathak20/idiomation_detection_using_using_few_shot_learning上找到我们工作的实现。

translated by 谷歌翻译

Multi-Instance Training for Question Answering Across Table and Linked Text

Vishwajeet Kumar , Saneem Chemmengath , Yash Gupta , Jaydeep Sen , Samarth Bharadwaj , Soumen Chakrabarti

分类：自然语言处理 | 人工智能

2021-12-14

使用来自表格（TableQA）的信息回答自然语言问题是最近的兴趣。在许多应用程序中，表未孤立，但嵌入到非结构化文本中。通常，通过将其部分与表格单元格内容或非结构化文本跨度匹配，并从任一源中提取答案来最佳地回答问题。这导致了HybridQA数据集引入的TextableQA问题的新空间。现有的表格表示对基于变换器的阅读理解（RC）架构的适应性未通过单个系统解决两个表示的不同模式。培训此类系统因对遥远监督的需求而进一步挑战。为了降低认知负担，培训实例通常包括问题和答案，后者匹配多个表行和文本段。这导致嘈杂的多实例培训制度不仅涉及表的行，而且涵盖了链接文本的跨度。我们通过提出Mitqa来回应这些挑战，这是一个新的TextableQA系统，明确地模拟了表行选择和文本跨度选择的不同但密切相关的概率空间。与最近的基线相比，我们的实验表明了我们的方法的优越性。该方法目前在HybridQA排行榜的顶部，并进行了一个试验集，在以前公布的结果上实现了对em和f1的21％的绝对改善。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

JiangLong He , Mamatha N , Shiv Vignesh , Deepak Kumar , Akshay Uppal

分类：自然语言处理 | 人工智能

2022-12-30

We propose an ensemble approach to predict the labels in linear programming word problems. The entity identification and the meaning representation are two types of tasks to be solved in the NL4Opt competition. We propose the ensembleCRF method to identify the named entities for the first task. We found that single models didn't improve for the given task in our analysis. A set of prediction models predict the entities. The generated results are combined to form a consensus result in the ensembleCRF method. We present an ensemble text generator to produce the representation sentences for the second task. We thought of dividing the problem into multiple small tasks due to the overflow in the output. A single model generates different representations based on the prompt. All the generated text is combined to form an ensemble and produce a mathematical meaning of a linear programming problem.

translated by 谷歌翻译