由于模型列出是现代NLP的核心,因此我们着手提高其效率。通过训练示例的动机通常是多余的,我们设计了一种以流媒体方式过滤示例的算法。我们的关键技术是两个:(1)自动确定跳过向后传播的训练损失阈值;(2)维护一个元预测指标,以进一步跳过正向传播。在各种基准测试的基准上,我们的算法将所需的训练示例降低了5 $ \ times $,而平均仅看到轻微的降级,因此将其化为三阶段的过程。我们的方法即使在一个训练时期也很少有效,每个训练示例一次遇到一次。它易于实现,并且与现有的模型列出优化(例如层冻结)兼容。
translated by 谷歌翻译
The usage of technologically advanced devices has seen a boom in many domains, including education, automation, and healthcare; with most of the services requiring Internet connectivity. To secure a network, device identification plays key role. In this paper, a device fingerprinting (DFP) model, which is able to distinguish between Internet of Things (IoT) and non-IoT devices, as well as uniquely identify individual devices, has been proposed. Four statistical features have been extracted from the consecutive five device-originated packets, to generate individual device fingerprints. The method has been evaluated using the Random Forest (RF) classifier and different datasets. Experimental results have shown that the proposed method achieves up to 99.8% accuracy in distinguishing between IoT and non-IoT devices and over 97.6% in classifying individual devices. These signify that the proposed method is useful in assisting operators in making their networks more secure and robust to security breaches and unauthorized access.
translated by 谷歌翻译
Neglected tropical diseases (NTDs) continue to affect the livelihood of individuals in countries in the Southeast Asia and Western Pacific region. These diseases have been long existing and have caused devastating health problems and economic decline to people in low- and middle-income (developing) countries. An estimated 1.7 billion of the world's population suffer one or more NTDs annually, this puts approximately one in five individuals at risk for NTDs. In addition to health and social impact, NTDs inflict significant financial burden to patients, close relatives, and are responsible for billions of dollars lost in revenue from reduced labor productivity in developing countries alone. There is an urgent need to better improve the control and eradication or elimination efforts towards NTDs. This can be achieved by utilizing machine learning tools to better the surveillance, prediction and detection program, and combat NTDs through the discovery of new therapeutics against these pathogens. This review surveys the current application of machine learning tools for NTDs and the challenges to elevate the state-of-the-art of NTDs surveillance, management, and treatment.
translated by 谷歌翻译
Cement is the most used construction material. The performance of cement hydrate depends on the constituent phases, viz. alite, belite, aluminate, and ferrites present in the cement clinker, both qualitatively and quantitatively. Traditionally, clinker phases are analyzed from optical images relying on a domain expert and simple image processing techniques. However, the non-uniformity of the images, variations in the geometry and size of the phases, and variabilities in the experimental approaches and imaging methods make it challenging to obtain the phases. Here, we present a machine learning (ML) approach to detect clinker microstructure phases automatically. To this extent, we create the first annotated dataset of cement clinker by segmenting alite and belite particles. Further, we use supervised ML methods to train models for identifying alite and belite regions. Specifically, we finetune the image detection and segmentation model Detectron-2 on the cement microstructure to develop a model for detecting the cement phases, namely, Cementron. We demonstrate that Cementron, trained only on literature data, works remarkably well on new images obtained from our experiments, demonstrating its generalizability. We make Cementron available for public use.
translated by 谷歌翻译
With recent developments in Social Computing, Natural Language Processing and Clinical Psychology, the social NLP research community addresses the challenge of automation in mental illness on social media. A recent extension to the problem of multi-class classification of mental health issues is to identify the cause behind the user's intention. However, multi-class causal categorization for mental health issues on social media has a major challenge of wrong prediction due to the overlapping problem of causal explanations. There are two possible mitigation techniques to solve this problem: (i) Inconsistency among causal explanations/ inappropriate human-annotated inferences in the dataset, (ii) in-depth analysis of arguments and stances in self-reported text using discourse analysis. In this research work, we hypothesise that if there exists the inconsistency among F1 scores of different classes, there must be inconsistency among corresponding causal explanations as well. In this task, we fine tune the classifiers and find explanations for multi-class causal categorization of mental illness on social media with LIME and Integrated Gradient (IG) methods. We test our methods with CAMS dataset and validate with annotated interpretations. A key contribution of this research work is to find the reason behind inconsistency in accuracy of multi-class causal categorization. The effectiveness of our methods is evident with the results obtained having category-wise average scores of $81.29 \%$ and $0.906$ using cosine similarity and word mover's distance, respectively.
translated by 谷歌翻译
现有的自我监督学习策略被限制在有限的目标或主要针对单峰应用程序的通用下游任务。对于复杂性和域亲和力(例如模因分析)而言,这对命令性的多模式应用有了孤立的进展。在这里,我们介绍了两种自我监督的预训练方法,即ext-pie-net和mm-simclr(i)在预训练期间使用现成的多模式仇恨语音数据,并且(ii)执行自我 - 通过合并多个专业借口任务,有效地迎合模因分析所需的复杂多模式表示学习,从而有效地迎合了学习。我们实验不同的自我实验策略,包括可以帮助学习丰富的跨模式表示并使用流行的线性探测来评估可恨模因任务的潜在变体。拟议的解决方案通过标签有效的培训与完全监督的基线竞争,同时在梅诺特挑战的所有三个任务上明显优于他们,分别为0.18%,23.64%和0.93%的绩效增长。此外,我们通过在Harmeme任务上报告竞争性能来证明所提出的解决方案的普遍性。最后,我们通过分析特定于任务的学习,使用更少的标记培训样本来建立学习表现的质量,并争辩说,自主策略和手头下游任务的复杂性是相关的。我们的努力强调了更好的多模式自学方法的要求,涉及有效的微调和可推广性能的专业借口任务。
translated by 谷歌翻译
神经网络是众多远期过程的强大代孕。这种代理人的反转在科学和工程中非常有价值。成功的神经反向方法的最重要属性是在现实世界中(即在本地远期过程(不仅是学识渊博的替代)中部署在现实世界中时的解决方案的性能。我们建议自动化,这是一种高度自动化的神经网络代理的方法。我们的主要见解是在可靠数据附近寻求反向解决方案,这些解决方案已被取样形式,并用于训练替代模型。自动信息通过考虑替代物的预测不确定性并在反转过程中最小化,从而找到了这种解决方案。除了高精度外,自动验证液可以实现溶液的可行性,并带有嵌入式正规化,并且不含初始化。我们通过解决控制,制造和设计中的一系列现实世界问题来验证我们的方法。
translated by 谷歌翻译
与汽车和其他公路车辆相比,公共汽车和重型车辆由于其尺寸较大而具有更多的盲点。因此,这些重型车辆造成的事故更具致命性,并给其他道路使用者造成严重伤害。这些可能的盲点碰撞可以使用基于视觉的对象检测方法来尽早确定。然而,现有的基于最新视觉的对象检测模型在很大程度上依赖于单个功能描述符来做出决策。在这项研究中,提出了基于高级功能描述符的两个卷积神经网络(CNN)的设计,并提出了它们与更快的R-CNN的集成,以检测重型车辆的盲点碰撞。此外,提出了一种融合方法,以整合两个预训练的网络(即Resnet 50和Resnet 101),用于提取高水平的特征以进行盲点车辆检测。功能的融合显着提高了更快的R-CNN的性能,并优于现有的最新方法。两种方法均在公共汽车的自我录制的盲点车辆检测数据集和用于车辆检测的在线LISA数据集上进行了验证。对于两种提出的方​​法,对于自记录的数据集,可获得3.05%和3.49%的虚假检测率(FDR),使这些方法适用于实时应用。
translated by 谷歌翻译
可转移的对抗性攻击优化了从验证的替代模型和已知标签空间中的对手,以欺骗未知的黑盒模型。因此,这些攻击受到有效的替代模型的可用性受到限制。在这项工作中,我们放宽了这一假设,并提出了对抗像素的恢复,作为一种自制的替代方案,可以在无标签和很少的数据样本的条件下从头开始训练有效的替代模型。我们的培训方法是基于一个最小目标的目标,该目标通过对抗目标减少过度拟合,从而为更概括的替代模型进行了优化。我们提出的攻击是对对抗性像素恢复的补充,并且独立于任何特定任务目标,因为它可以以自我监督的方式启动。我们成功地证明了我们对视觉变压器方法的对抗性可传递性以及卷积神经网络,用于分类,对象检测和视频分割的任务。我们的代码和预培训的代理模型可在以下网址找到:https://github.com/hashmatshadab/apr
translated by 谷歌翻译
在本文中,我们提出了一个新的基于聚类的主动学习框架,即使用基于聚类的采样(ALCS)的主动学习,以解决标记数据的短缺。ALCS采用基于密度的聚类方法来探索数据集群结构,而无需详尽的参数调整。引入了基于双簇边界的样本查询过程,以提高对高度重叠类分类的学习绩效。此外,我们制定了一种有效的多样性探索策略,以解决查询样品之间的冗余。我们的实验结果证明了ALCS方法的疗效。
translated by 谷歌翻译