User-specific future activity prediction in the healthcare domain based on previous activities can drastically improve the services provided by the nurses. It is challenging because, unlike other domains, activities in healthcare involve both nurses and patients, and they also vary from hour to hour. In this paper, we employ various data processing techniques to organize and modify the data structure and an LSTM-based multi-label classifier for a novel 2-stage training approach (user-agnostic pre-training and user-specific fine-tuning). Our experiment achieves a validation accuracy of 31.58\%, precision 57.94%, recall 68.31%, and F1 score 60.38%. We concluded that proper data pre-processing and a 2-stage training process resulted in better performance. This experiment is a part of the "Fourth Nurse Care Activity Recognition Challenge" by our team "Not A Fan of Local Minima".
translated by 谷歌翻译
自动许可板识别系统旨在提供从视频帧中出现的车辆检测,本地化和识别车牌字符的解决方案。但是,在现实世界中部署此类系统需要在低资源环境中实时性能。在我们的论文中,我们提出了一种双级检测管线与视觉API配对,提供实时推理速度以及始终如一的准确检测和识别性能。我们使用Haar-Cascade分类器作为骨干MobileNet SSDv2检测模型顶部的过滤器。这仅通过专注于高置信度检测并使用它们来识别来减少推理时间。我们还施加了一个时间帧分离策略,以区分同一夹子中的多个车辆牌照。此外,没有公开的Bangla许可证板数据集,我们创建了一个图像数据集和野外包含许可板的视频数据集。我们在图像数据集上培训了模型,并达到了86%的AP(0.5)得分,并在视频数据集上测试了我们的管道,并观察到合理的检测和识别性能(82.7%的检测率,60.8%OCR F1得分)具有真实 - 时间处理速度(每秒27.2帧)。
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
Compared to regular cameras, Dynamic Vision Sensors or Event Cameras can output compact visual data based on a change in the intensity in each pixel location asynchronously. In this paper, we study the application of current image-based SLAM techniques to these novel sensors. To this end, the information in adaptively selected event windows is processed to form motion-compensated images. These images are then used to reconstruct the scene and estimate the 6-DOF pose of the camera. We also propose an inertial version of the event-only pipeline to assess its capabilities. We compare the results of different configurations of the proposed algorithm against the ground truth for sequences of two publicly available event datasets. We also compare the results of the proposed event-inertial pipeline with the state-of-the-art and show it can produce comparable or more accurate results provided the map estimate is reliable.
translated by 谷歌翻译
With Twitter's growth and popularity, a huge number of views are shared by users on various topics, making this platform a valuable information source on various political, social, and economic issues. This paper investigates English tweets on the Russia-Ukraine war to analyze trends reflecting users' opinions and sentiments regarding the conflict. The tweets' positive and negative sentiments are analyzed using a BERT-based model, and the time series associated with the frequency of positive and negative tweets for various countries is calculated. Then, we propose a method based on the neighborhood average for modeling and clustering the time series of countries. The clustering results provide valuable insight into public opinion regarding this conflict. Among other things, we can mention the similar thoughts of users from the United States, Canada, the United Kingdom, and most Western European countries versus the shared views of Eastern European, Scandinavian, Asian, and South American nations toward the conflict.
translated by 谷歌翻译
The performance of the Deep Learning (DL) models depends on the quality of labels. In some areas, the involvement of human annotators may lead to noise in the data. When these corrupted labels are blindly regarded as the ground truth (GT), DL models suffer from performance deficiency. This paper presents a method that aims to learn a confident model in the presence of noisy labels. This is done in conjunction with estimating the uncertainty of multiple annotators. We robustly estimate the predictions given only the noisy labels by adding entropy or information-based regularizer to the classifier network. We conduct our experiments on a noisy version of MNIST, CIFAR-10, and FMNIST datasets. Our empirical results demonstrate the robustness of our method as it outperforms or performs comparably to other state-of-the-art (SOTA) methods. In addition, we evaluated the proposed method on the curated dataset, where the noise type and level of various annotators depend on the input image style. We show that our approach performs well and is adept at learning annotators' confusion. Moreover, we demonstrate how our model is more confident in predicting GT than other baselines. Finally, we assess our approach for segmentation problem and showcase its effectiveness with experiments.
translated by 谷歌翻译
This paper deals with the problem of statistical and system heterogeneity in a cross-silo Federated Learning (FL) framework where there exist a limited number of Consumer Internet of Things (CIoT) devices in a smart building. We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed ``G-Fedfilt''. The proposed aggregator enables a structured flow of information based on the graph's topology. This behavior allows capturing the interconnection of CIoT devices and training domain-specific models. The embedded graph filter is equipped with a tunable parameter which enables a continuous trade-off between domain-agnostic and domain-specific FL. In the case of domain-agnostic, it forces G-Fedfilt to act similar to the conventional Federated Averaging (FedAvg) aggregation rule. The proposed G-Fedfilt also enables an intrinsic smooth clustering based on the graph connectivity without explicitly specified which further boosts the personalization of the models in the framework. In addition, the proposed scheme enjoys a communication-efficient time-scheduling to alleviate the system heterogeneity. This is accomplished by adaptively adjusting the amount of training data samples and sparsity of the models' gradients to reduce communication desynchronization and latency. Simulation results show that the proposed G-Fedfilt achieves up to $3.99\% $ better classification accuracy than the conventional FedAvg when concerning model personalization on the statistically heterogeneous local datasets, while it is capable of yielding up to $2.41\%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
translated by 谷歌翻译
Learning models are highly dependent on data to work effectively, and they give a better performance upon training on big datasets. Massive research exists in the literature to address the dataset adequacy issue. One promising approach for solving dataset adequacy issues is the data augmentation (DA) approach. In DA, the amount of training data instances is increased by making different transformations on the available data instances to generate new correct and representative data instances. DA increases the dataset size and its variability, which enhances the model performance and its prediction accuracy. DA also solves the class imbalance problem in the classification learning techniques. Few studies have recently considered DA in the Arabic language. These studies rely on traditional augmentation approaches, such as paraphrasing by using rules or noising-based techniques. In this paper, we propose a new Arabic DA method that employs the recent powerful modeling technique, namely the AraGPT-2, for the augmentation process. The generated sentences are evaluated in terms of context, semantics, diversity, and novelty using the Euclidean, cosine, Jaccard, and BLEU distances. Finally, the AraBERT transformer is used on sentiment classification tasks to evaluate the classification performance of the augmented Arabic dataset. The experiments were conducted on four sentiment Arabic datasets, namely AraSarcasm, ASTD, ATT, and MOVIE. The selected datasets vary in size, label number, and unbalanced classes. The results show that the proposed methodology enhanced the Arabic sentiment text classification on all datasets with an increase in F1 score by 4% in AraSarcasm, 6% in ASTD, 9% in ATT, and 13% in MOVIE.
translated by 谷歌翻译
Investigation and analysis of patient outcomes, including in-hospital mortality and length of stay, are crucial for assisting clinicians in determining a patient's result at the outset of their hospitalization and for assisting hospitals in allocating their resources. This paper proposes an approach based on combining the well-known gray wolf algorithm with frequent items extracted by association rule mining algorithms. First, original features are combined with the discriminative extracted frequent items. The best subset of these features is then chosen, and the parameters of the used classification algorithms are also adjusted, using the gray wolf algorithm. This framework was evaluated using a real dataset made up of 2816 patients from the Imam Ali Kermanshah Hospital in Iran. The study's findings indicate that low Ejection Fraction, old age, high CPK values, and high Creatinine levels are the main contributors to patients' mortality. Several significant and interesting rules related to mortality in hospitals and length of stay have also been extracted and presented. Additionally, the accuracy, sensitivity, specificity, and auroc of the proposed framework for the diagnosis of mortality in the hospital using the SVM classifier were 0.9961, 0.9477, 0.9992, and 0.9734, respectively. According to the framework's findings, adding frequent items as features considerably improves classification accuracy.
translated by 谷歌翻译