智能论文笔记

Exploration of an End-to-End Automatic Number-plate Recognition neural network for Indian datasets

Sai Sirisha Nadiminti , Pranav Kant Gaur , Abhilash Bhardwaj

分类：计算机视觉

2022-07-14

印度车辆板在尺寸，字体，脚本和形状方面的种类繁多。因此，自动数板识别（ANPR）解决方案的开发是具有挑战性的，因此需要一个多样化的数据集作为示例集合。但是，缺少印度情景的全面数据集，从而阻碍了在公开可用和可重现的ANPR解决方案方面的进展。许多国家已经投入了努力，为中国和面向应用程序的车牌（AOLP）数据集开发诸如中国城市停车数据集（CCPD）等全面的ANPR数据集为我们提供了努力。在这项工作中，我们发布了一个扩展的数据集，该数据集目前由1.5K图像组成，以及可扩展且可重复的程序，以增强该数据集以开发印度条件的ANPR解决方案。我们利用此数据集探索了印度场景的端到端（E2E）ANPR体系结构，该架构最初是根据CCPD数据集为中国车辆号码板识别的。当我们为数据集定制体系结构时，我们遇到了见解，我们在本文中讨论了这一点。我们报告了CCPD作者提供的模型直接可重复使用性的障碍，因为印度数字板的极端多样性以及相对于CCPD数据集的分布差异。在将印度数据集的特性与中国数据集对齐后，在LP检测中观察到了42.86％的改善。在这项工作中，我们还将E2E数板检测模型的性能与Yolov5模型进行了比较，并在可可数据集上进行了预训练，并在印度车辆图像上进行了微调。鉴于用于微调检测模块和Yolov5的数量印度车辆图像是相同的，我们得出的结论是，基于COCO数据集而不是CCPD数据集开发针对印度条件的ANPR解决方案更有效。

translated by 谷歌翻译

PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields

Zhengfei Kuang , Fujun Luan , Sai Bi , Zhixin Shu , Gordon Wetzstein , Kalyan Sunkavalli

分类：计算机视觉

2022-12-21

Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases (i.e., 3D segmentations defined by a group of NeRF-type functions) that are shared across the scene. While our palette-based bases are view-independent, we also predict a view-dependent function to capture the color residual (e.g., specular shading). During training, we jointly optimize the basis functions and the color palettes, and we also introduce novel regularizers to encourage the spatial coherence of the decomposition. Our method allows users to efficiently edit the appearance of the 3D scene by modifying the color palettes. We also extend our framework with compressed semantic features for semantic-aware appearance editing. We demonstrate that our technique is superior to baseline methods both quantitatively and qualitatively for appearance editing of complex real-world scenes.

translated by 谷歌翻译

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

Ananya B. Sai , Vignesh Nagarajan , Tanay Dixit , Raj Dabre , Anoop Kunchukuttan , Pratyush Kumar , Mitesh M. Khapra

分类：自然语言处理

2022-12-20

The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.

translated by 谷歌翻译

Learning Inter-Annual Flood Loss Risk Models From Historical Flood Insurance Claims and Extreme Rainfall Data

Joaquin Salas , Anamitra Saha , Sai Ravela

分类：机器学习 | (统计)机器学习

2022-12-15

Flooding is one of the most disastrous natural hazards, responsible for substantial economic losses. A predictive model for flood-induced financial damages is useful for many applications such as climate change adaptation planning and insurance underwriting. This research assesses the predictive capability of regressors constructed on the National Flood Insurance Program (NFIP) dataset using neural networks (Conditional Generative Adversarial Networks), decision trees (Extreme Gradient Boosting), and kernel-based regressors (Gaussian Process). The assessment highlights the most informative predictors for regression. The distribution for claims amount inference is modeled with a Burr distribution permitting the introduction of a bias correction scheme and increasing the regressor's predictive capability. Aiming to study the interaction with physical variables, we incorporate Daymet rainfall estimation to NFIP as an additional predictor. A study on the coastal counties in the eight US South-West states resulted in an $R^2=0.807$. Further analysis of 11 counties with a significant number of claims in the NFIP dataset reveals that Extreme Gradient Boosting provides the best results, that bias correction significantly improves the similarity with the reference distribution, and that the rainfall predictor strengthens the regressor performance.

translated by 谷歌翻译

Online Real-time Learning of Dynamical Systems from Noisy Streaming Data

S. Sinha , Sai P. Nandanoori , David Barajas-Solano

分类：机器学习

2022-12-10

Recent advancements in sensing and communication facilitate obtaining high-frequency real-time data from various physical systems like power networks, climate systems, biological networks, etc. However, since the data are recorded by physical sensors, it is natural that the obtained data is corrupted by measurement noise. In this paper, we present a novel algorithm for online real-time learning of dynamical systems from noisy time-series data, which employs the Robust Koopman operator framework to mitigate the effect of measurement noise. The proposed algorithm has three main advantages: a) it allows for online real-time monitoring of a dynamical system; b) it obtains a linear representation of the underlying dynamical system, thus enabling the user to use linear systems theory for analysis and control of the system; c) it is computationally fast and less intensive than the popular Extended Dynamic Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the proposed algorithm by applying it to identify the Van der Pol oscillator, the IEEE 68 bus system, and a ring network of Van der Pol oscillators.

translated by 谷歌翻译

POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events

Sai Vallurupalli , Sayontan Ghosh , Katrin Erk , Niranjan Balasubramanian , Francis Ferraro

分类：自然语言处理

2022-12-05

Knowledge about outcomes is critical for complex event understanding but is hard to acquire. We show that by pre-identifying a participant in a complex event, crowd workers are able to (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground the outcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of 8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96 weighted Fleiss Kappa). Our dataset, POQue (Participant Outcome Questions), enables the exploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant's influence over the event culmination.

translated by 谷歌翻译

Downscaling Extreme Rainfall Using Physical-Statistical Generative Adversarial Learning

Anamitra Saha , Sai Ravela

分类：机器学习

2022-12-02

Modeling the risk of extreme weather events in a changing climate is essential for developing effective adaptation and mitigation strategies. Although the available low-resolution climate models capture different scenarios, accurate risk assessment for mitigation and adaption often demands detail that they typically cannot resolve. Here, we develop a dynamic data-driven downscaling (super-resolution) method that incorporates physics and statistics in a generative framework to learn the fine-scale spatial details of rainfall. Our method transforms coarse-resolution ($0.25^{\circ} \times 0.25^{\circ}$) climate model outputs into high-resolution ($0.01^{\circ} \times 0.01^{\circ}$) rainfall fields while efficaciously quantifying uncertainty. Results indicate that the downscaled rainfall fields closely match observed spatial fields and their risk distributions.

translated by 谷歌翻译

Deep Learning-Based Vehicle Speed Prediction for Ecological Adaptive Cruise Control in Urban and Highway Scenarios

Sai Krishna Chada , Daniel Görges , Achim Ebert , Roman Teutsch

分类：机器学习

2022-11-30

In a typical car-following scenario, target vehicle speed fluctuations act as an external disturbance to the host vehicle and in turn affect its energy consumption. To control a host vehicle in an energy-efficient manner using model predictive control (MPC), and moreover, enhance the performance of an ecological adaptive cruise control (EACC) strategy, forecasting the future velocities of a target vehicle is essential. For this purpose, a deep recurrent neural network-based vehicle speed prediction using long-short term memory (LSTM) and gated recurrent units (GRU) is studied in this work. Besides these, the physics-based constant velocity (CV) and constant acceleration (CA) models are discussed. The sequential time series data for training (e.g. speed trajectories of the target and its preceding vehicles obtained through vehicle-to-vehicle (V2V) communication, road speed limits, traffic light current and future phases collected using vehicle-to-infrastructure (V2I) communication) is gathered from both urban and highway networks created in the microscopic traffic simulator SUMO. The proposed speed prediction models are evaluated for long-term predictions (up to 10 s) of target vehicle future velocities. Moreover, the results revealed that the LSTM-based speed predictor outperformed other models in terms of achieving better prediction accuracy on unseen test datasets, and thereby showcasing better generalization ability. Furthermore, the performance of EACC-equipped host car on the predicted velocities is evaluated, and its energy-saving benefits for different prediction horizons are presented.

translated by 谷歌翻译

Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning

Sai Narayanan , Sathyanarayanan N. Aakur , Priyadharsini Ramamurthy , Arunkumar Bagavathi , Vishalini Ramnath , Akhilesh Ramachandran

分类：机器学习

2022-11-30

Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure across species, there is a need for robust, scalable analysis platforms to extract actionable knowledge such as the presence of potentially zoonotic pathogens. The emergence of zoonotic diseases from novel pathogens, such as the influenza virus in 1918 and SARS-CoV-2 in 2019 that can jump species barriers and lead to pandemic underscores the need for scalable metagenome analysis. In this work, we propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone, to learn robust features from raw metagenome sequences for downstream biomedical tasks such as targeted and generalized pathogen detection. Extensive experiments on four increasingly challenging, yet realistic diagnostic settings, show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples with minimal human supervision in the form of labels. Further, we demonstrate that the learned representations can generalize to completely unrelated pathogens across diseases and species for large-scale metagenome analysis. We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning and provide a way forward for extracting and using robust vector representations from low-cost next generation sequencing to develop generalizable diagnostic tools.

translated by 谷歌翻译

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation

Sai Shashank Kalakonda , Shubh Maheshwari , Ravi Kiran Sarvadevabhatla

分类：计算机视觉

2022-11-28

We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at https://actiongpt.github.io

translated by 谷歌翻译