智能论文笔记

Real Time Action Recognition from Video Footage

Tasnim Sakib Apon , Mushfiqul Islam Chowdhury , MD Zubair Reza , Arpita Datta , Syeda Tanjina Hasan , MD. Golam Rabiul Alam

分类：计算机视觉

2021-12-13

犯罪率与人口的增加率成比例地增加。最突出的方法是引入基于闭路电视（CCTV）相机的监视以解决问题。视频监控摄像机增加了一个新的维度来检测犯罪。目前正在进行自动安全摄像机监控的几项研究工作，基本目标是从视频饲料发现暴力活动。从技术方面来看，这是一个具有挑战性的问题，因为分析了一组帧，即时间维度的视频，以检测暴力可能需要仔细的机器学习模型训练，以减少错误的结果。本研究通过整合最先进的深度学习方法来重点介绍该问题，以确保用于检测暴力活动的自主监测的强大管道，例如，踢，冲压和拍打。最初，我们设计了这种特定兴趣的数据集，其中包含600个视频（每个动作200个）。稍后，我们已经利用现有的预先训练的模型架构来提取特征，后来使用深度学习网络进行分类。此外，我们在不同预先训练的架构上分类了我们的模型'准确性，以及像VGG16，Inceptionv3，Reset50，七峰和MobileNet V2的不同预先训练的架构中的混淆矩阵，其中VGG16和MobileNet V2更好。

translated by 谷歌翻译

BIO-CXRNET: A Robust Multimodal Stacking Machine Learning Technique for Mortality Risk Prediction of COVID-19 Patients using Chest X-Ray Images and Clinical Data

Tawsifur Rahman , Muhammad E. H. Chowdhury , Amith Khandakar , Zaid Bin Mahbub , Md Sakib Abrar Hossain , Abraham Alhatou , Eynas Abdalla , Sreekumar Muthiyal , Khandaker Farzana Islam , Saad Bin Abul Kashem

分类：计算机视觉 | 机器学习

2022-06-15

快速准确地检测该疾病可以大大帮助减少任何国家医疗机构对任何大流行期间死亡率降低死亡率的压力。这项工作的目的是使用新型的机器学习框架创建多模式系统，该框架同时使用胸部X射线（CXR）图像和临床数据来预测COVID-19患者的严重程度。此外，该研究还提出了一种基于nom图的评分技术，用于预测高危患者死亡的可能性。这项研究使用了25种生物标志物和CXR图像，以预测意大利第一波Covid-19（3月至6月2020年3月至6月）在930名Covid-19患者中的风险。提出的多模式堆叠技术分别产生了89.03％，90.44％和89.03％的精度，灵敏度和F1分数，以识别低风险或高危患者。与CXR图像或临床数据相比，这种多模式方法可提高准确性6％。最后，使用多元逻辑回归的列线图评分系统 - 用于对第一阶段确定的高风险患者的死亡风险进行分层。使用随机森林特征选择模型将乳酸脱氢酶（LDH），O2百分比，白细胞（WBC）计数，年龄和C反应蛋白（CRP）鉴定为有用的预测指标。开发了五个预测因素参数和基于CXR图像的列函数评分，以量化死亡的概率并将其分为两个风险组：分别存活（<50％）和死亡（> = 50％）。多模式技术能够预测F1评分为92.88％的高危患者的死亡概率。开发和验证队列曲线下的面积分别为0.981和0.939。

translated by 谷歌翻译

A Shallow U-Net Architecture for Reliably Predicting Blood Pressure (BP) from Photoplethysmogram (PPG) and Electrocardiogram (ECG) Signals

Sakib Mahmud , Nabil Ibtehaz , Amith Khandakar , Anas Tahir , Tawsifur Rahman , Khandaker Reajul Islam , Md Shafayet Hossain , M. Sohel Rahman , Mohammad Tariqul Islam , Muhammad E. H. Chowdhury

分类：机器学习

2021-11-12

心血管疾病是世界各地最常见的死亡原因。为了检测和治疗心脏相关的疾病，需要连续血压（BP）监测以及许多其他参数。为此目的开发了几种侵入性和非侵入性方法。用于持续监测BP的医院中使用的大多数现有方法是侵入性的。相反，基于袖带的BP监测方法，可以预测收缩压（SBP）和舒张压（DBP），不能用于连续监测。几项研究试图从非侵入性可收集信号（例如光学肌谱（PPG）和心电图（ECG））预测BP，其可用于连续监测。在这项研究中，我们探讨了自动化器在PPG和ECG信号中预测BP的适用性。在12,000岁的MIMIC-II数据集中进行了调查，发现了一个非常浅的一维AutoEncoder可以提取相关功能，以预测与最先进的SBP和DBP在非常大的数据集上的性能。从模拟-II数据集的一部分的独立测试分别为SBP和DBP提供了2.333和0.713的MAE。在40个主题的外部数据集上，模型在MIMIC-II数据集上培训，分别为SBP和DBP提供2.728和1.166的MAE。对于这种情况来说，结果达到了英国高血压协会（BHS）A级并超越了目前文学的研究。

translated by 谷歌翻译

Automatic Signboard Detection and Localization in Densely Populated Developing Cities

Md. Sadrul Islam Toaha , Sakib Bin Asad , Chowdhury Rafeed Rahman , S. M. Shahriar Haque , Mahfuz Ara Proma , Md. Ahsan Habib Shuvo , Tashin Ahmed , Md. Amimul Basher

分类：计算机视觉

2020-03-04

由于缺乏自动注释系统，大多数发展城市的城市机构都是数字未标记的。因此，在此类城市中，位置和轨迹服务（例如Google Maps，Uber等）仍然不足。自然场景图像中的准确招牌检测是从此类城市街道检索无错误的信息的最重要任务。然而，开发准确的招牌本地化系统仍然是尚未解决的挑战，因为它的外观包括文本图像和令人困惑的背景。我们提出了一种新型的对象检测方法，该方法可以自动检测招牌，适合此类城市。我们通过合并两种专业预处理方法和一种运行时效高参数值选择算法来使用更快的基于R-CNN的定位。我们采用了一种增量方法，通过使用我们构造的SVSO（Street View Signboard对象）签名板数据集，通过详细评估和与基线进行比较，以达到最终提出的方法，这些方法包含六个发展中国家的自然场景图像。我们在SVSO数据集和Open Image数据集上展示了我们提出的方法的最新性能。我们提出的方法可以准确地检测招牌（即使图像包含多种形状和颜色的多种嘈杂背景的招牌）在SVSO独立测试集上达到0.90 MAP（平均平均精度）得分。我们的实施可在以下网址获得：https：//github.com/sadrultoaha/signboard-detection

translated by 谷歌翻译

Internet of Things: Digital Footprints Carry A Device Identity

Rajarshi Roy Chowdhury , Azam Che Idris , Pg Emeroylariffion Abas

分类：机器学习

2023-01-01

The usage of technologically advanced devices has seen a boom in many domains, including education, automation, and healthcare; with most of the services requiring Internet connectivity. To secure a network, device identification plays key role. In this paper, a device fingerprinting (DFP) model, which is able to distinguish between Internet of Things (IoT) and non-IoT devices, as well as uniquely identify individual devices, has been proposed. Four statistical features have been extracted from the consecutive five device-originated packets, to generate individual device fingerprints. The method has been evaluated using the Random Forest (RF) classifier and different datasets. Experimental results have shown that the proposed method achieves up to 99.8% accuracy in distinguishing between IoT and non-IoT devices and over 97.6% in classifying individual devices. These signify that the proposed method is useful in assisting operators in making their networks more secure and robust to security breaches and unauthorized access.

translated by 谷歌翻译

Blind Restoration of Real-World Audio by 1D Operational GANs

Turker Ince , Serkan Kiranyaz , Ozer Can Devecioglu , Muhammad Salman Khan , Muhammad Chowdhury , Moncef Gabbouj

分类：机器学习

2022-12-30

Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.

translated by 谷歌翻译

Efficient Movie Scene Detection using State-Space Transformers

Md Mohaiminul Islam , Mahmudul Hasan , Kishan Shamsundar Athrey , Tony Braskich , Gedas Bertasius

分类：计算机视觉

2022-12-29

The ability to distinguish between different movie scenes is critical for understanding the storyline of a movie. However, accurately detecting movie scenes is often challenging as it requires the ability to reason over very long movie segments. This is in contrast to most existing video recognition models, which are typically designed for short-range video analysis. This work proposes a State-Space Transformer model that can efficiently capture dependencies in long movie videos for accurate movie scene detection. Our model, dubbed TranS4mer, is built using a novel S4A building block, which combines the strengths of structured state-space sequence (S4) and self-attention (A) layers. Given a sequence of frames divided into movie shots (uninterrupted periods where the camera position does not change), the S4A block first applies self-attention to capture short-range intra-shot dependencies. Afterward, the state-space operation in the S4A block is used to aggregate long-range inter-shot cues. The final TranS4mer model, which can be trained end-to-end, is obtained by stacking the S4A blocks one after the other multiple times. Our proposed TranS4mer outperforms all prior methods in three movie scene detection datasets, including MovieNet, BBC, and OVSD, while also being $2\times$ faster and requiring $3\times$ less GPU memory than standard Transformer models. We will release our code and models.

translated by 谷歌翻译

Offline Policy Optimization in RL with Variance Regularizaton

Riashat Islam , Samarth Sinha , Homanga Bharadhwaj , Samin Yeasar Arnob , Zhuoran Yang , Animesh Garg , Zhaoran Wang , Lihong Li , Doina Precup

分类：机器学习

2022-12-29

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing state-of-the-art algorithms.

translated by 谷歌翻译

Representation Learning in Deep RL via Discrete Information Bottleneck

Riashat Islam , Hongyu Zang , Manan Tomar , Aniket Didolkar , Md Mofijul Islam , Samin Yeasar Arnob , Tariq Iqbal , Xin Li , Anirudh Goyal , Nicolas Heess

分类：机器学习

2022-12-28

Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.

translated by 谷歌翻译

Packing Privacy Budget Efficiently

Pierre Tholoniat , Kelly Kostopoulou , Mosharaf Chowdhury , Asaf Cidon , Roxana Geambasu , Mathias Lécuyer , Junfeng Yang

分类：机器学习

2022-12-26

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.

translated by 谷歌翻译