智能论文笔记

Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News

Mohammad Majid Akhtar , Bibhas Sharma , Ishan Karunanayake , Rahat Masood , Muhammad Ikram , Salil S. Kanhere

分类：机器学习

2022-09-07

Covid-19影响了世界各地，尽管对爆发的错误信息的传播速度比病毒更快。错误的信息通过在线社交网络（OSN）传播，通常会误导人们遵循正确的医疗实践。特别是，OSN机器人一直是传播虚假信息和发起网络宣传的主要来源。现有工作忽略了机器人的存在，这些机器人在传播中充当催化剂，并专注于“帖子中共享的文章”而不是帖子（文本）内容中的假新闻检测。大多数关于错误信息检测的工作都使用手动标记的数据集，这些数据集很难扩展以构建其预测模型。在这项研究中，我们通过在Twitter数据集上使用经过验证的事实检查的陈述来标记数据来克服这一数据稀缺性挑战。此外，我们将文本功能与用户级功能（例如关注者计数和朋友计数）和推文级功能（例如Tweet中的提及，主题标签和URL）结合起来，以充当检测错误信息的其他指标。此外，我们分析了推文中机器人的存在，并表明机器人随着时间的流逝改变了其行为，并且在错误信息中最活跃。我们收集了1022万个Covid-19相关推文，并使用我们的注释模型来构建一个广泛的原始地面真实数据集以进行分类。我们利用各种机器学习模型来准确检测错误信息，我们的最佳分类模型达到了精度（82％），召回（96％）和假阳性率（3.58％）。此外，我们的机器人分析表明，机器人约为错误信息推文的10％。我们的方法可以实质性地暴露于虚假信息，从而改善了通过社交媒体平台传播的信息的可信度。

translated by 谷歌翻译

PhishClone: Measuring the Efficacy of Cloning Evasion Attacks

Arthur Wong , Alsharif Abuadbba , Mahathir Almashor , Salil Kanhere

分类：机器学习

2022-09-04

基于Web的网络钓鱼占数据泄露的90％以上，大多数Web浏览器和安全供应商都依靠机器学习（ML）模型作为缓解。尽管如此，还显示出在抗钓鱼聚合物（例如网络和Virustotal）上定期发布的链接可轻松绕过现有的探测器。先前的艺术表明，随着光突变的自动网站克隆正在吸引攻击者。这在当前文献中的暴露量有限，并导致基于ML的优势对策。这里的工作进行了第一项经验研究，该研究在广泛的循环中汇编和评估了各种最先进的克隆技术。我们收集了13,394个样品，发现了8,566个确认的网络钓鱼页面，使用7种不同的克隆机制针对4个流行网站。这些样品在受控平台中以防止意外访问的预防措施进行了删除的恶意代码复制。然后，我们将站点报告给Virustotal和其他平台，并定期对结果进行7天的调查，以确定每种克隆技术的功效。结果表明，没有安全供应商检测到我们的克隆，证明了对更有效的检测器的迫切需求。最后，我们提出了4项建议，以帮助网络开发人员和基于ML的防御能力减轻克隆攻击的风险。

translated by 谷歌翻译

Deception for Cyber Defence: Challenges and Opportunities

David Liebowitz , Surya Nepal , Kristen Moore , Cody J. Christopher , Salil S. Kanhere , David Nguyen , Roelien C. Timmer , Michael Longland , Keerth Rathakumar

分类：机器学习

2022-08-15

作为网络防御的重要工具，欺骗正在迅速发展，并补充了现有的周边安全措施，以迅速检测出漏洞和数据盗窃。限制欺骗使用的因素之一是手工生成逼真的人工制品的成本。但是，机器学习的最新进展为可扩展的，自动化的现实欺骗创造了机会。本愿景论文描述了开发模型所涉及的机会和挑战，以模仿IT堆栈的许多共同元素以造成欺骗效应。

translated by 谷歌翻译

Transferable Graph Backdoor Attack

Shuiqiao Yang , Bao Gia Doan , Paul Montague , Olivier De Vel , Tamas Abraham , Seyit Camtepe , Damith C. Ranasinghe , Salil S. Kanhere

分类：人工智能 | 机器学习

2022-06-21

图形神经网络（GNNS）在许多图形挖掘任务中取得了巨大的成功，这些任务从消息传递策略中受益，该策略融合了局部结构和节点特征，从而为更好的图表表示学习。尽管GNN成功，并且与其他类型的深神经网络相似，但发现GNN容易受到图形结构和节点特征的不明显扰动。已经提出了许多对抗性攻击，以披露在不同的扰动策略下创建对抗性例子的GNN的脆弱性。但是，GNNS对成功后门攻击的脆弱性直到最近才显示。在本文中，我们披露了陷阱攻击，这是可转移的图形后门攻击。核心攻击原则是用基于扰动的触发器毒化训练数据集，这可以导致有效且可转移的后门攻击。图形的扰动触发是通过通过替代模型的基于梯度的得分矩阵在图形结构上执行扰动动作来生成的。与先前的作品相比，陷阱攻击在几种方面有所不同：i）利用替代图卷积网络（GCN）模型来生成基于黑盒的后门攻击的扰动触发器； ii）它产生了没有固定模式的样品特异性扰动触发器； iii）在使用锻造中毒训练数据集训练时，在GNN的背景下，攻击转移到了不同的GNN模型中。通过对四个现实世界数据集进行广泛的评估，我们证明了陷阱攻击使用四个现实世界数据集在四个不同流行的GNN中构建可转移的后门的有效性

translated by 谷歌翻译

PublicCheck: Public Integrity Verification for Services of Run-time Deep Models

Shuo Wang , Sharif Abuadbba , Sidharth Agarwal , Kristen Moore , Ruoxi Sun , Minhui Xue , Surya Nepal , Seyit Camtepe , Salil Kanhere

分类：人工智能

2022-03-21

Existing integrity verification approaches for deep models are designed for private verification (i.e., assuming the service provider is honest, with white-box access to model parameters). However, private verification approaches do not allow model users to verify the model at run-time. Instead, they must trust the service provider, who may tamper with the verification results. In contrast, a public verification approach that considers the possibility of dishonest service providers can benefit a wider range of users. In this paper, we propose PublicCheck, a practical public integrity verification solution for services of run-time deep models. PublicCheck considers dishonest service providers, and overcomes public verification challenges of being lightweight, providing anti-counterfeiting protection, and having fingerprinting samples that appear smooth. To capture and fingerprint the inherent prediction behaviors of a run-time model, PublicCheck generates smoothly transformed and augmented encysted samples that are enclosed around the model's decision boundary while ensuring that the verification queries are indistinguishable from normal queries. PublicCheck is also applicable when knowledge of the target model is limited (e.g., with no knowledge of gradients or model parameters). A thorough evaluation of PublicCheck demonstrates the strong capability for model integrity breach detection (100% detection accuracy with less than 10 black-box API queries) against various model integrity attacks and model compression attacks. PublicCheck also demonstrates the smooth appearance, feasibility, and efficiency of generating a plethora of encysted samples for fingerprinting.

translated by 谷歌翻译

Is this IoT Device Likely to be Secure? Risk Score Prediction for IoT Devices Using Gradient Boosting Machines

Carlos A. Rivera A. , Arash Shaghaghi , David D. Nguyen , Salil S. Kanhere

分类：机器学习

2021-11-23

安全风险评估和预测对于部署事物互联网（IOT）设备的组织至关重要。企业的绝对最低要求是验证IoT设备的安全风险，用于报告的国家漏洞数据库（NVD）中报告的漏洞。本文提出了基于关于它们的公开信息的IOT设备的新风险预测。我们的解决方案为所有尺寸的企业提供了一种简单且具有成本效益的解决方案，以预测部署新的IOT设备的安全风险。在过去的八年内对NVD记录进行了广泛的分析后，我们为易受攻击的物联网设备创建了一个唯一，系统和平衡的数据集，包括辅以公共资源可用功能和描述性功能的关键技术功能。然后，我们使用机器学习分类模型，例如渐变提升决策树（GBDT）在此数据集上，并在分类设备漏洞分数的严重性方面实现71％的预测准确性。

translated by 谷歌翻译

From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning

Pranav Kulkarni , Adway Kanhere , Paul H. Yi , Vishwa S. Parekh

分类：计算机视觉 | 机器学习

2022-11-11

Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient. In this work, we demonstrate how federated learning (FL) can be used to make these toy CXR datasets from Kaggle clinically useful. Specifically, we train a single FL classification model (`global`) using two separate CXR datasets -- one annotated for presence of pneumonia and the other for presence of pneumothorax (two common and life-threatening conditions) -- capable of diagnosing both. We compare the performance of the global FL model with models trained separately on both datasets (`baseline`) for two different model architectures. On a standard, naive 3-layer CNN architecture, the global FL model achieved AUROC of 0.84 and 0.81 for pneumonia and pneumothorax, respectively, compared to 0.85 and 0.82, respectively, for both baseline models (p>0.05). Similarly, on a pretrained DenseNet121 architecture, the global FL model achieved AUROC of 0.88 and 0.91 for pneumonia and pneumothorax, respectively, compared to 0.89 and 0.91, respectively, for both baseline models (p>0.05). Our results suggest that FL can be used to create global `meta` models to make toy datasets from Kaggle clinically useful, a step forward towards bridging the gap from bench to bedside.

translated by 谷歌翻译

Improving GNSS Positioning using Neural Network-based Corrections

Ashwin V. Kanhere , Shubh Gupta , Akshay Shetty , Grace Gao

分类：机器人

2021-10-18

深神经网络（DNNS）是在存在多路径和非线视线错误的情况下定位全局导航卫星系统（GNSS）的有前途的工具，这是由于它们使用数据建模复杂错误的能力。但是，为GNSS定位开发DNN提出了各种挑战，例如1）由于卫星可见性的变化和，在全球范围内测量和位置值的差异很大而导致的数值和位置值差异很大，数量和位置值差。 3）过度适合可用数据。在这项工作中，我们解决了上述挑战，并通过将基于DNN的校正应用于初始位置猜测，提出了GNSS定位的方法。我们的DNN学会了使用伪残留物和卫星视线向量作为输入来输出位置校正。这些输入和输出值的有限变化可改善我们DNN的数值条件。我们设计了DNN体系结构，以结合可用GNSS测量的信息，这些信息通过利用基于设定的深度学习方法的最新进步，在数量和顺序上不同。此外，我们提出了一种数据增强策略，用于通过随机将初始位置猜测随机减少DNN中的过度拟合。我们首先执行模拟，并在应用基于DNN的校正时显示出初始定位误差的改进。此后，我们证明我们的方法在现实世界数据上的表现优于WLS基线。我们的实施可在github.com/stanford-navlab/deep_gnss上获得。

translated by 谷歌翻译