智能论文笔记

Streaming Machine Learning and Online Active Learning for Automated Visual Inspection

Jože M. Rožanec , Elena Trajkova , Paulien Dam , Blaž Fortuna , Dunja Mladenić

分类：计算机视觉

2021-10-15

质量控制是制造公司进行的关键活动，以验证产品一致性的要求和规范。标准化质量控制可确保所有产品在相同的标准下进行评估。传感器和连接成本降低，使得制造的数字化增加，提供了更大的数据可用性。这些数据可用性促使人工智能模型的开发，允许在检查产品时更高的自动化程度和减少偏差。此外，增加的检查速度降低了缺陷检查所需的总成本和时间。在这项研究中，我们比较五个流式机器学习算法，应用于利用飞利浦消费者生活方式BV提供的真实数据的视觉缺陷检查。此外，我们将它们与流在流动的主动学习背景中进行比较，这减少了真实环境中的数据标签工作。我们的研究结果表明，对于最坏情况，主动学习将数据标签努力降低了近15％，同时保持可接受的分类性能。使用机器学习模型进行自动化视野预计将加快高达40％的质量检验。

translated by 谷歌翻译

Active Learning and Approximate Model Calibration for Automated Visual Inspection in Manufacturing

Jože M. Rožanec , Luka Bizjak , Elena Trajkova , Patrik Zajec , Jelle Keizer , Blaž Fortuna , Dunja Mladenić

分类：机器学习 | 人工智能 | 计算机视觉

2022-09-12

质量控制是制造业企业进行的至关重要的活动，以确保其产品符合质量标准并避免对品牌声誉的潜在损害。传感器成本下降和连接性使制造业数字化增加。此外，人工智能可实现更高的自动化程度，减少缺陷检查所需的总体成本和时间。这项研究将三种活跃的学习方法（与单一和多个牙齿）与视觉检查进行了比较。我们提出了一种新颖的方法，用于对分类模型的概率校准和两个新的指标，以评估校准的性能而无需地面真相。我们对飞利浦消费者生活方式BV提供的现实数据进行了实验。我们的结果表明，考虑到p = 0.95的阈值，探索的主动学习设置可以将数据标签的工作减少3％至4％，而不会损害总体质量目标。此外，我们表明所提出的指标成功捕获了相关信息，否则仅通过地面真实数据最适合使用的指标可用。因此，所提出的指标可用于估计模型概率校准的质量，而无需进行标签努力以获取地面真相数据。

translated by 谷歌翻译

Synthetic Data Augmentation Using GAN For Improved Automated Visual Inspection

Jože M. Rožanec , Patrik Zajec , Spyros Theodoropoulos , Erik Koehorst , Blaž Fortuna , Dunja Mladenić

分类：计算机视觉 | 人工智能

2022-12-19

Quality control is a crucial activity performed by manufacturing companies to ensure their products conform to the requirements and specifications. The introduction of artificial intelligence models enables to automate the visual quality inspection, speeding up the inspection process and ensuring all products are evaluated under the same criteria. In this research, we compare supervised and unsupervised defect detection techniques and explore data augmentation techniques to mitigate the data imbalance in the context of automated visual inspection. Furthermore, we use Generative Adversarial Networks for data augmentation to enhance the classifiers' discriminative performance. Our results show that state-of-the-art unsupervised defect detection does not match the performance of supervised models but can be used to reduce the labeling workload by more than 50%. Furthermore, the best classification performance was achieved considering GAN-based data generation with AUC ROC scores equal to or higher than 0,9898, even when increasing the dataset imbalance by leaving only 25\% of the images denoting defective products. We performed the research with real-world data provided by Philips Consumer Lifestyle BV.

translated by 谷歌翻译

Robust Anomaly Map Assisted Multiple Defect Detection with Supervised Classification Techniques

Jože M. Rožanec , Patrik Zajec , Spyros Theodoropoulos , Erik Koehorst , Blaž Fortuna , Dunja Mladenić

分类：计算机视觉 | 机器学习

2022-12-19

Industry 4.0 aims to optimize the manufacturing environment by leveraging new technological advances, such as new sensing capabilities and artificial intelligence. The DRAEM technique has shown state-of-the-art performance for unsupervised classification. The ability to create anomaly maps highlighting areas where defects probably lie can be leveraged to provide cues to supervised classification models and enhance their performance. Our research shows that the best performance is achieved when training a defect detection model by providing an image and the corresponding anomaly map as input. Furthermore, such a setting provides consistent performance when framing the defect detection as a binary or multiclass classification problem and is not affected by class balancing policies. We performed the experiments on three datasets with real-world data provided by Philips Consumer Lifestyle BV.

translated by 谷歌翻译

An adaptive human-in-the-loop approach to emission detection of Additive Manufacturing processes and active learning with computer vision

Xiao Liu , Alan F. Smeaton , Alessandra Mileo

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-12

Recent developments in in-situ monitoring and process control in Additive Manufacturing (AM), also known as 3D-printing, allows the collection of large amounts of emission data during the build process of the parts being manufactured. This data can be used as input into 3D and 2D representations of the 3D-printed parts. However the analysis and use, as well as the characterization of this data still remains a manual process. The aim of this paper is to propose an adaptive human-in-the-loop approach using Machine Learning techniques that automatically inspect and annotate the emissions data generated during the AM process. More specifically, this paper will look at two scenarios: firstly, using convolutional neural networks (CNNs) to automatically inspect and classify emission data collected by in-situ monitoring and secondly, applying Active Learning techniques to the developed classification model to construct a human-in-the-loop mechanism in order to accelerate the labeling process of the emission data. The CNN-based approach relies on transfer learning and fine-tuning, which makes the approach applicable to other industrial image patterns. The adaptive nature of the approach is enabled by uncertainty sampling strategy to automatic selection of samples to be presented to human experts for annotation.

translated by 谷歌翻译

Label-Efficient Interactive Time-Series Anomaly Detection

Hong Guo , Yujing Wang , Jieyu Zhang , Zhengjie Lin , Yunhai Tong , Lei Yang , Luoxing Xiong , Congrui Huang

分类：机器学习 | 人工智能

2022-12-30

Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.

translated by 谷歌翻译

Machine Beats Machine: Machine Learning Models to Defend Against Adversarial Attacks

Jože M. Rožanec , Dimitrios Papamartzivanos , Entso Veliou , Theodora Anastasiou , Jelle Keizer , Blaž Fortuna , Dunja Mladenić

分类：机器学习 | 人工智能

2022-09-28

我们建议使用两层机器学习模型的部署来防止对抗性攻击。第一层确定数据是否被篡改，而第二层解决了域特异性问题。我们探索三组功能和三个数据集变体来训练机器学习模型。我们的结果表明，聚类算法实现了有希望的结果。特别是，我们认为通过将DBSCAN算法应用于图像和白色参考图像之间计算的结构化结构相似性指数测量方法获得了最佳结果。

translated by 谷歌翻译

Deep Active Learning for Computer Vision: Past and Future

Rinyoichi Takezoe , Xu Liu , Shunan Mao , Marco Tianyu Chen , Zhanpeng Feng , Shiliang Zhang , Xiaoyu Wang

分类：机器学习 | 计算机视觉

2022-11-27

As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model. It becomes even more critical given the dominance of deep neural network based models, which are composed of a large number of parameters and data hungry, in application. Despite its indispensable role for developing AI models, research on active learning is not as intensive as other research directions. In this paper, we present a review of active learning through deep active learning approaches from the following perspectives: 1) technical advancements in active learning, 2) applications of active learning in computer vision, 3) industrial systems leveraging or with potential to leverage active learning for data iteration, 4) current limitations and future research directions. We expect this paper to clarify the significance of active learning in a modern AI model manufacturing process and to bring additional research attention to active learning. By addressing data automation challenges and coping with automated machine learning systems, active learning will facilitate democratization of AI technologies by boosting model production at scale.

translated by 谷歌翻译

An overview of active learning methods for insurance with fairness appreciation

Romuald Elie , Caroline Hillairet , François Hu , Marc Juillard

分类： (统计)机器学习 | 机器学习

2021-12-17

本文解决了在水模型部署民主化中采用了机器学习的一些挑战。第一个挑战是减少了在主动学习的帮助下减少了标签努力（因此关注数据质量），模型推断与Oracle之间的反馈循环：如在保险中，未标记的数据通常丰富，主动学习可能会成为一个重要的资产减少标签成本。为此目的，本文在研究其对合成和真实数据集的实证影响之前，阐述了各种古典主动学习方法。保险中的另一个关键挑战是模型推论中的公平问题。我们将在此主动学习框架中介绍和整合一个用于多级任务的后处理公平，以解决这两个问题。最后对不公平数据集的数值实验突出显示所提出的设置在模型精度和公平性之间存在良好的折衷。

translated by 谷歌翻译

Stream-based active learning with linear models

Davide Cacciarelli , Murat Kulahci , John Sølve Tyssedal

分类： (统计)机器学习 | 机器学习

2022-07-20

自动数据收集方案的扩散和传感器的进步正在增加我们能够实时监控的数据量。但是，鉴于高注册成本和质量检查所需的时间，数据通常以未标记的形式获得。这正在促进使用主动学习来开发软传感器和预测模型。在生产中，通过评估未标记数据的信息内容来收集标签，而不是进行随机检查以获取产品信息。文献中已经提出了一些有关回归的查询策略框架，但大多数重点都专门用于基于静态池的场景。在这项工作中，我们为基于流的方案提出了一种新的策略，在该方案中，将实例顺序提供给学习者，该实例必须立即决定是否执行质量检查以获取标签或丢弃实例。该方法受到最佳实验设计理论的启发，决策过程的迭代方面是通过对未标记数据点的信息设定阈值来解决的。使用数值模拟和田纳西州伊士曼工艺模拟器评估所提出的方法。结果证实，选择提出的算法建议的示例可以更快地减少预测误差。

translated by 谷歌翻译

Mining Drifting Data Streams on a Budget: Combining Active Learning with Self-Labeling

Łukasz Korycki , Bartosz Krawczyk

分类：机器学习

2021-12-21

挖掘数据流姿势存在许多挑战，包括数据的连续和非静止性质，待处理的大量信息和限制计算资源。虽然在文献中提出了一些针对这个问题的监督解决方案，但大多数人都假定访问地面真理（以类标签的形式）是无限的，并且在更新学习系统时可以立即使用此类信息。这远非现实，因为必须考虑获取标签的基本成本。因此，需要解决流方案中实际真相要求的解决方案。在本文中，通过组合来自主动学习和自我标签的信息，提出了一种用于预算的挖水数据流的新框架。我们介绍了几种策略，可以利用智能实例选择和半监督程序，同时考虑到概念漂移的潜在存在。这种混合方法允许有效的探索和利用在现实标记预算中的流数据结构。由于我们的框架工作为包装器，因此它可以应用于不同的学习算法。实验研究，在具有各种类型的概念漂移的多样化现实数据流中进行的实验研究，证明了在处理对类标签的高度限制时拟议的策略的有用性。当一个人不能增加标签或更换低效分类器的预算时，呈现的混合方法尤其可行。我们为我们的战略提供了一套关于适用性领域的建议。

translated by 谷歌翻译

When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development

Nghia Duong-Trung , Stefan Born , Jong Woo Kim , Marie-Therese Schermeyer , Katharina Paulick , Maxim Borisyak , Ernesto Martinez , Mariano Nicolas Cruz-Bournazou , Thorben Werner , Randolf Scholz

分类：机器学习

2022-09-02

机器学习（ML）为生物处理工程的发展做出了重大贡献，但其应用仍然有限，阻碍了生物过程自动化的巨大潜力。用于模型构建自动化的ML可以看作是引入另一种抽象水平的一种方式，将专家的人类集中在生物过程开发的最认知任务中。首先，概率编程用于预测模型的自动构建。其次，机器学习会通过计划实验来测试假设并进行调查以收集信息性数据来自动评估替代决策，以收集基于模型预测不确定性的模型选择的信息数据。这篇评论提供了有关生物处理开发中基于ML的自动化的全面概述。一方面，生物技术和生物工程社区应意识到现有ML解决方案在生物技术和生物制药中的应用的限制。另一方面，必须确定缺失的链接，以使ML和人工智能（AI）解决方案轻松实施在有价值的生物社区解决方案中。我们总结了几个重要的生物处理系统的ML实施，并提出了两个至关重要的挑战，这些挑战仍然是生物技术自动化的瓶颈，并减少了生物技术开发的不确定性。没有一个合适的程序；但是，这项综述应有助于确定结合生物技术和ML领域的潜在自动化。

translated by 谷歌翻译

HTML版本

Smart Active Sampling to enhance Quality Assurance Efficiency

Clemens Heistracher , Stefan Stricker , Pedro Casas , Daniel Schall , Jana Kemnitz

分类：机器学习

2022-09-23

我们提出了一种新的抽样策略，称为Smart Active Sapling，以在生产线之外进行质量检查。根据主动学习的原则，机器学习模型决定将哪些样品发送到质量检查。一方面，由于较早发现质量违规行为，这可以最大程度地减少废料零件的产生。另一方面，质量检查成本降低了，以进行平稳运行。

translated by 谷歌翻译

ALRt: An Active Learning Framework for Irregularly Sampled Temporal Data

Ronald Moore , Rishikesan Kamaleswaran

分类：机器学习

2022-12-13

Sepsis is a deadly condition affecting many patients in the hospital. Recent studies have shown that patients diagnosed with sepsis have significant mortality and morbidity, resulting from the body's dysfunctional host response to infection. Clinicians often rely on the use of Sequential Organ Failure Assessment (SOFA), Systemic Inflammatory Response Syndrome (SIRS), and the Modified Early Warning Score (MEWS) to identify early signs of clinical deterioration requiring further work-up and treatment. However, many of these tools are manually computed and were not designed for automated computation. There have been different methods used for developing sepsis onset models, but many of these models must be trained on a sufficient number of patient observations in order to form accurate sepsis predictions. Additionally, the accurate annotation of patients with sepsis is a major ongoing challenge. In this paper, we propose the use of Active Learning Recurrent Neural Networks (ALRts) for short temporal horizons to improve the prediction of irregularly sampled temporal events such as sepsis. We show that an active learning RNN model trained on limited data can form robust sepsis predictions comparable to models using the entire training dataset.

translated by 谷歌翻译

Practical Active Learning with Model Selection for Small Data

Maryam Pardakhti , Nila Mandal , Anson W. K. Ma , Qian Yang

分类：机器学习

2021-12-21

积极学习对于许多实际应用，特别是在工业和物理科学方面具有很大的兴趣，在那里有很强的需要最小化培训预测模型所需的昂贵实验的数量。然而，在许多实际应用中采用主动学习方法存在重大挑战。一个重要的挑战是许多方法假设一个固定模型，其中选择了模型超参数先验。在实践中，很少确实是提前知道的好模型。使用模型选择的主动学习方法通常取决于中型标签预算。在这项工作中，我们专注于拥有非常小的标签预算的情况，大约几十个数据点的顺序，并利用模型选择开发了一种简单而快速的实际主动学习方法。我们的方法基于基于底层池的活动学习者，用于使用带有径向基函数内核的支持向量分类的二进制分类。首先，我们凭经验展示了我们的方法能够找到与在不太可分离的oracle模型中相比，我们的方法能够找到最佳性能，难以对数据集进行分类，并且在更可分离的数据集中的合理性能和更容易分类。然后，我们证明可以使用权重方法来改进我们的模型选择方法，在实现易于分类的数据集上实现最佳性能之间的权衡，而难以对数据集进行分类，可以基于先前域进行调整关于数据集的知识。

translated by 谷歌翻译

Deep Active Learning Using Barlow Twins

Jaya Krishna Mandivarapu , Blake Camp , Rolando Estrada

分类：计算机视觉 | 人工智能

2022-12-30

The generalisation performance of a convolutional neural networks (CNN) is majorly predisposed by the quantity, quality, and diversity of the training images. All the training data needs to be annotated in-hand before, in many real-world applications data is easy to acquire but expensive and time-consuming to label. The goal of the Active learning for the task is to draw most informative samples from the unlabeled pool which can used for training after annotation. With total different objective, self-supervised learning which have been gaining meteoric popularity by closing the gap in performance with supervised methods on large computer vision benchmarks. self-supervised learning (SSL) these days have shown to produce low-level representations that are invariant to distortions of the input sample and can encode invariance to artificially created distortions, e.g. rotation, solarization, cropping etc. self-supervised learning (SSL) approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of active learning using self-supervised learning mainfold and propose Deep Active Learning using BarlowTwins(DALBT), an active learning method for all the datasets using combination of classifier trained along with self-supervised loss framework of Barlow Twins to a setting where the model can encode the invariance of artificially created distortions, e.g. rotation, solarization, cropping etc.

translated by 谷歌翻译

Active Transfer Prototypical Network: An Efficient Labeling Algorithm for Time-Series Data

Yuqicheng Zhu , Mohamed-Ali Tnani , Timo Jahnz , Klaus Diepold

分类：机器学习

2022-09-28

在汽车行业中，标记数据的匮乏是典型的挑战。注释的时间序列测量需要固体域知识和深入的探索性数据分析，这意味着高标签工作。传统的主动学习（AL）通过根据估计的分类概率积极查询最有用的实例来解决此问题，并在迭代中重新审视该模型。但是，学习效率强烈依赖于初始模型，从而导致初始数据集和查询编号的大小之间的权衡。本文提出了一个新颖的几杆学习（FSL）基于AL框架，该框架通过将原型网络（Protonet）纳入AL迭代来解决权衡问题。一方面，结果表明了对初始模型的鲁棒性，另一方面，通过在每种迭代中的支持设置的主动选择方面的学习效率。该框架已在UCI HAR/HAPT数据集和现实世界制动操纵数据集上进行了验证。学习绩效在两个数据集上都显着超过了传统的AL算法，分别以10％和5％的标签工作实现了90％的分类精度。

translated by 谷歌翻译

Improving Probabilistic Models in Text Classification via Active Learning

Mitchell Bosley , Saki Kuzushima , Ted Enamorado , Yuki Shiraito

分类：自然语言处理

2022-02-05

社会科学家经常将文本文档分类为使用结果标签作为实证研究的结果或预测指标。自动化文本分类已成为标准工具，因为它需要较少的人体编码。但是，学者们仍然需要许多人类标记的文件来培训自动分类器。为了降低标签成本，我们提出了一种新的文本分类算法，将概率模型与主动学习结合在一起。概率模型同时使用标记和未标记的数据，而主动学习集中在难以分类的文件上标记工作。我们的验证研究表明，我们的算法的分类性能与最先进的方法相当，而计算成本的一部分。此外，我们复制了两篇最近发表的文章，并得出相同的实质性结论，其中仅占这些研究中使用的原始标记数据的一小部分。我们提供ActiveText，一种开源软件来实现我们的方法。

translated by 谷歌翻译

Learning under Concept Drift: A Review

Jie Lu , Anjin Liu , Fan Dong , Feng Gu , Joao Gama , Guangquan Zhang

分类：

2020-04-13

Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift research involves the development of methodologies and techniques for drift detection, understanding and adaptation. Data analysis has revealed that machine learning in a concept drift environment will result in poor learning results if the drift is not addressed. To help researchers identify which research topics are significant and how to apply related techniques in data analysis tasks, it is necessary that a high quality, instructive review of current research developments and trends in the concept drift field is conducted. In addition, due to the rapid development of concept drift in recent years, the methodologies of learning under concept drift have become noticeably systematic, unveiling a framework which has not been mentioned in literature. This paper reviews over 130 high quality publications in concept drift related research areas, analyzes up-to-date developments in methodologies and techniques, and establishes a framework of learning under concept drift including three main components: concept drift detection, concept drift understanding, and concept drift adaptation. This paper lists and discusses 10 popular synthetic datasets and 14 publicly available benchmark datasets used for evaluating the performance of learning algorithms aiming at handling concept drift. Also, concept drift related research directions are covered and discussed. By providing state-of-the-art knowledge, this survey will directly support researchers in their understanding of research developments in the field of learning under concept drift.

translated by 谷歌翻译

A Survey of Deep Active Learning

Pengzhen Ren , Yun Xiao , Xiaojun Chang , Po-Yao Huang , Zhihui Li , Brij B. Gupta , Xiaojiang Chen , Xin Wang

分类：机器学习 | (统计)机器学习

2020-08-30

主动学习（al）试图通过标记最少的样本来最大限度地提高模型的性能增益。深度学习（DL）是贪婪的数据，需要大量的数据电源来优化大量参数，因此模型了解如何提取高质量功能。近年来，由于互联网技术的快速发展，我们处于信息种类的时代，我们有大量的数据。通过这种方式，DL引起了研究人员的强烈兴趣，并已迅速发展。与DL相比，研究人员对Al的兴趣相对较低。这主要是因为在DL的崛起之前，传统的机器学习需要相对较少的标记样品。因此，早期的Al很难反映其应得的价值。虽然DL在各个领域取得了突破，但大多数这一成功都是由于大量现有注释数据集的宣传。然而，收购大量高质量的注释数据集消耗了很多人力，这在某些领域不允许在需要高专业知识，特别是在语音识别，信息提取，医学图像等领域中， al逐渐受到适当的关注。自然理念是AL是否可用于降低样本注释的成本，同时保留DL的强大学习能力。因此，已经出现了深度主动学习（DAL）。虽然相关的研究非常丰富，但它缺乏对DAL的综合调查。本文要填补这一差距，我们为现有工作提供了正式的分类方法，以及全面和系统的概述。此外，我们还通过申请的角度分析并总结了DAL的发展。最后，我们讨论了DAL中的混乱和问题，为DAL提供了一些可能的发展方向。

translated by 谷歌翻译