智能论文笔记

Indian Commercial Truck License Plate Detection and Recognition for Weighbridge Automation

Siddharth Agrawal , Keyur D. Joshi

分类：计算机视觉

2022-11-23

Detection and recognition of a licence plate is important when automating weighbridge services. While many large databases are available for Latin and Chinese alphanumeric license plates, data for Indian License Plates is inadequate. In particular, databases of Indian commercial truck license plates are inadequate, despite the fact that commercial vehicle license plate recognition plays a profound role in terms of logistics management and weighbridge automation. Moreover, models to recognise license plates are not effectively able to generalise to such data due to its challenging nature, and due to the abundant frequency of handwritten license plates, leading to the usage of diverse font styles. Thus, a database and effective models to recognise and detect such license plates are crucial. This paper provides a database on commercial truck license plates, and using state-of-the-art models in real-time object Detection: You Only Look Once Version 7, and SceneText Recognition: Permuted Autoregressive Sequence Models, our method outperforms the other cited references where the maximum accuracy obtained was less than 90%, while we have achieved 95.82% accuracy in our algorithm implementation on the presented challenging license plate dataset. Index Terms- Automatic License Plate Recognition, character recognition, license plate detection, vision transformer.

translated by 谷歌翻译

Merged-GHCIDR: Geometrical Approach to Reduce Image Data

Devvrat Joshi , Janvi Thakkar , Siddharth Soni , Shril Mody , Rohan Patil , Nipun Batra

分类：机器学习

2022-09-06

自从深网的成立以来，训练模型所需的计算资源一直在增加。大规模数据集中的培训神经网络已成为一项具有挑战性且耗时的任务。因此，需要减少数据集而不损害准确性。在本文中，我们介绍了一种早期方法，即通过均匀聚类来减少数据集大小的新颖方法。所提出的方法基于将数据集划分为均匀簇的想法，并选择对准确性产生显着贡献的图像。我们提出了两种变体：用于图像数据降低的几何均匀聚类（GHCIDR）和合并GHCIDR在基线算法 - 通过均匀聚类（RHC）降低（RHC），以实现更好的准确性和训练时间。 GHCIDR背后的直觉涉及通过簇权重和训练集的几何分布选择数据点。合并GHCIDR涉及使用完整的链接聚类的群集合并相同的标签。我们使用了三个深度学习模型 - 完全连接的网络（FCN），VGG1和VGG16。我们在四个数据集中进行了两个变体 - MNIST，CIFAR10，Fashion-Mnist和Tiny-Imagenet。与RHC相同百分比的合并GHCIDR在MNIST，Fashion-Mnist，CIFAR10和Tiny-Imagenet上分别增加了2.8％，8.9％，7.6％和3.5％。

translated by 谷歌翻译

Geometrical Homogeneous Clustering for Image Data Reduction

Shril Mody , Janvi Thakkar , Devvrat Joshi , Siddharth Soni , Rohan Patil , Nipun Batra

分类：机器学习

2022-08-27

在本文中，我们介绍了一种早期方法的新颖变化，称为均质聚类算法，用于降低数据集大小。本文提出的方法背后的直觉是将数据集划分为均匀簇，并选择一些对准确性产生重大贡献的图像。选定的图像是训练数据的正确子集，因此是可读的。我们在基线算法RHC上提出了四个变体。第一种方法背后的直觉是，边界点有助于簇的代表。它涉及选择群集质心的最远的k和一个最近的邻居。在以下两种方法（KONCW和CWKC）中，我们介绍了簇权重的概念。它们是基于这样一个事实，即较大的簇贡献比较小的群集的贡献更多。最终变化是GHCIDR，它根据数据分布的几何方面选择点。我们在两个深度学习模型 - 完全连接的网络（FCN）和VGG1上进行了实验。我们在三个数据集中的四个变体中进行了实验：MNIST，CIFAR10和Fashion-Mnist。我们发现，GHCIDR的最佳准确度分别为99.35％，81.10％和91.66％，培训数据降低了87.27％，32.34％和76.80％，分别为MNIST，CIFAR10和时尚。

translated by 谷歌翻译

HTML版本

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

分类：机器学习 | 计算机视觉

2022-06-15

深神经网络（DNN）的庞大计算和记忆成本通常排除了它们在资源约束设备中的使用。将参数和操作量化为较低的位精确，为神经网络推断提供了可观的记忆和能量节省，从而促进了在边缘计算平台上使用DNN。量化DNN的最新努力采用了一系列技术，包括渐进式量化，步进尺寸的适应性和梯度缩放。本文提出了一种针对边缘计算的混合精度卷积神经网络（CNN）的新量化方法。我们的方法在模型准确性和内存足迹上建立了一个新的Pareto前沿，展示了一系列量化模型，可提供低于4.3 MB的权重（WGTS。）和激活（ACTS。）。我们的主要贡献是：（i）用张量学的学习精度，（ii）WGTS的靶向梯度修饰，（i）硬件感知的异质可区分量化。和行为。为了减轻量化错误，以及（iii）多相学习时间表，以解决从更新到学习的量化器和模型参数引起的学习不稳定性。我们证明了我们的技术在Imagenet数据集上的有效性，包括高效网络lite0（例如，WGTS。的4.14MB和ACTS。以67.66％的精度）和MobilenEtV2（例如3.51MB WGTS。％准确性）。

translated by 谷歌翻译

Selective Network Linearization for Efficient Private Inference

Minsu Cho , Ameya Joshi , Siddharth Garg , Brandon Reagen , Chinmay Hegde

分类：机器学习

2022-02-04

私人推论（PI）可以直接对密码安全的数据进行推断。虽然有望解决许多隐私问题，但由于极端的运行时间，它的使用有限。与明文推断不同，在PI非线性函数（即relu）中，延迟是由拖曳支配的，即瓶颈。因此，实用的PI需要新颖的恢复优化。为了减少PI潜伏期，我们提出了一种基于梯度的算法，该算法在维持预测准确性的同时选择性地线性地线性性地线性性地线性性性地线性性地线性性地线性性性地线性性性地线性化。我们评估了几种标准PI基准测试的算法。结果表明，比目前的最新水平（70 \％的ISO-ACCURACY \％），最高$ 4.25 \％$的准确性（ISO-RELU计数为50K）或$ 2.2 \ tims $少于$ $ $ $。 - 准确空间。为了补充经验结果，我们提出了一个“无免费午餐”定理，该定理阐明了如何以及何时进行网络线性化，同时保持预测准确性。公共代码可在\ url {https://github.com/nyu-dice-lab/selective_network_linearization}获得。

translated by 谷歌翻译

Weakly-Supervised Semantic Segmentation of Ships Using Thermal Imagery

Rushil Joshi , Ethan Adams , Matthew Ziemann , Christopher A. Metzler

分类：计算机视觉

2022-12-26

The United States coastline spans 95,471 miles; a distance that cannot be effectively patrolled or secured by manual human effort alone. Unmanned Aerial Vehicles (UAVs) equipped with infrared cameras and deep-learning based algorithms represent a more efficient alternative for identifying and segmenting objects of interest - namely, ships. However, standard approaches to training these algorithms require large-scale datasets of densely labeled infrared maritime images. Such datasets are not publicly available and manually annotating every pixel in a large-scale dataset would have an extreme labor cost. In this work we demonstrate that, in the context of segmenting ships in infrared imagery, weakly-supervising an algorithm with sparsely labeled data can drastically reduce data labeling costs with minimal impact on system performance. We apply weakly-supervised learning to an unlabeled dataset of 7055 infrared images sourced from the Naval Air Warfare Center Aircraft Division (NAWCAD). We find that by sparsely labeling only 32 points per image, weakly-supervised segmentation models can still effectively detect and segment ships, with a Jaccard score of up to 0.756.

translated by 谷歌翻译

Linear features segmentation from aerial images

Zhipeng Chang , Siddharth Jha , Yunfei Xia

分类：计算机视觉 | 人工智能

2022-12-23

The rapid development of remote sensing technologies have gained significant attention due to their ability to accurately localize, classify, and segment objects from aerial images. These technologies are commonly used in unmanned aerial vehicles (UAVs) equipped with high-resolution cameras or sensors to capture data over large areas. This data is useful for various applications, such as monitoring and inspecting cities, towns, and terrains. In this paper, we presented a method for classifying and segmenting city road traffic dashed lines from aerial images using deep learning models such as U-Net and SegNet. The annotated data is used to train these models, which are then used to classify and segment the aerial image into two classes: dashed lines and non-dashed lines. However, the deep learning model may not be able to identify all dashed lines due to poor painting or occlusion by trees or shadows. To address this issue, we proposed a method to add missed lines to the segmentation output. We also extracted the x and y coordinates of each dashed line from the segmentation output, which can be used by city planners to construct a CAD file for digital visualization of the roads.

translated by 谷歌翻译

Cross-Domain Consumer Review Analysis

Aditya Pandey , Kunal Joshi

分类：机器学习

2022-12-23

The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.

translated by 谷歌翻译

DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu , Julian Martin Eisenschlos , Francesco Piccinno , Syrine Krichene , Chenxi Pang , Kenton Lee , Mandar Joshi , Wenhu Chen , Nigel Collier , Yasemin Altun

分类：自然语言处理 | 人工智能 | 计算机视觉

2022-12-20

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

translated by 谷歌翻译

A Twitter BERT Approach for Offensive Language Detection in Marathi

Tanmay Chavan , Shantanu Patankar , Aditya Kane , Omkar Gokhale , Raviraj Joshi

分类：自然语言处理

2022-12-20

Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.

translated by 谷歌翻译