本文介绍了一种基于条件生成的对抗网络的碰撞频率数据增强方法,以改善碰撞频率模型。通过比较基本SPF(使用原始数据开发)和增强SPF(使用原始数据加合成数据开发)的性能来评估所提出的方法,以便在热点识别性能,模型预测精度和色散参数估计精度方面。使用模拟和现实世界崩溃数据集进行实验。结果表明,通过CGAN的合成崩溃数据具有与原始数据相同的分布,并且在分散参数低时,在几乎所有方面都占据了基础SPF的增强SPF。
translated by 谷歌翻译
在本文中,提出了一种称为Cgan-EB的新非参数型经验贝叶斯方法,用于近似经验贝叶斯(EB)估计,这些经验贝叶斯(例如,道路段)中的估计是深度神经网络的建模优势,其性能基于负二项式模型(NB-EB)的传统方法模拟研究比较。 NB-EB使用负二项式模型来模拟崩溃数据,并且是实践中最常见的方法。为了在所提出的Cgan-EB中模拟崩溃数据,使用条件生成的对抗网络,这是一种强大的深度神经网络的方法,可以模拟任何类型的分布。设计并进行了许多仿真实验,以评估不同条件下的Cgan-EB性能,并将其与NB-EB进行比较。结果表明,当条件有利于NB-EB模型时,Cgan-EB执行以及NB-EB的表现(即数据符合NB模型的假设),并且在实验中的实验中占NB-EB的胜度,特别是低于实际遇到的条件样本方式,当碰撞频率不遵循与协变量的对数线性关系。
translated by 谷歌翻译
基于参数统计模型的经验贝叶斯(EB)方法如负二项式(NB)已广泛用于道路网络安全筛选过程中的排名位点。本文是提出基于条件生成对冲网络(CGAN)的新型非参数EB方法的新型研究,其中提出了一种基于条件生成的对冲网络(CGAN)的模拟频率数据数据。与参数方法不同,在提议的CGAN-EB中,无所决的和独立变量之间不需要预先指定的底层关系,他们能够建模任何类型的分布。该拟议的方法现在应用于从2012年至2017年在华盛顿州的道路段收集的真实数据集。与模型拟合,预测性能和网络筛查结果的Cgan-EB的性能与作为基准的传统方法(NB-EB)进行比较。结果表明,在预测权力和热点识别测试方面,所提出的Cgan-EB方法优于NB-EB。
translated by 谷歌翻译
Classification using supervised learning requires annotating a large amount of classes-balanced data for model training and testing. This has practically limited the scope of applications with supervised learning, in particular deep learning. To address the issues associated with limited and imbalanced data, this paper introduces a sample-efficient co-supervised learning paradigm (SEC-CGAN), in which a conditional generative adversarial network (CGAN) is trained alongside the classifier and supplements semantics-conditioned, confidence-aware synthesized examples to the annotated data during the training process. In this setting, the CGAN not only serves as a co-supervisor but also provides complementary quality examples to aid the classifier training in an end-to-end fashion. Experiments demonstrate that the proposed SEC-CGAN outperforms the external classifier GAN (EC-GAN) and a baseline ResNet-18 classifier. For the comparison, all classifiers in above methods adopt the ResNet-18 architecture as the backbone. Particularly, for the Street View House Numbers dataset, using the 5% of training data, a test accuracy of 90.26% is achieved by SEC-CGAN as opposed to 88.59% by EC-GAN and 87.17% by the baseline classifier; for the highway image dataset, using the 10% of training data, a test accuracy of 98.27% is achieved by SEC-CGAN, compared to 97.84% by EC-GAN and 95.52% by the baseline classifier.
translated by 谷歌翻译
In data-driven systems, data exploration is imperative for making real-time decisions. However, big data is stored in massive databases that are difficult to retrieve. Approximate Query Processing (AQP) is a technique for providing approximate answers to aggregate queries based on a summary of the data (synopsis) that closely replicates the behavior of the actual data, which can be useful where an approximate answer to the queries would be acceptable in a fraction of the real execution time. In this paper, we discuss the use of Generative Adversarial Networks (GANs) for generating tabular data that can be employed in AQP for synopsis construction. We first discuss the challenges associated with constructing synopses in relational databases and then introduce solutions to those challenges. Following that, we organized statistical metrics to evaluate the quality of the generated synopses. We conclude that tabular data complexity makes it difficult for algorithms to understand relational database semantics during training, and improved versions of tabular GANs are capable of constructing synopses to revolutionize data-driven decision-making systems.
translated by 谷歌翻译
尽管在文本,图像和视频上生成的对抗网络(GAN)取得了显着的成功,但由于一些独特的挑战,例如捕获不平衡数据中的依赖性,因此仍在开发中,生成高质量的表格数据仍在开发中,从而优化了合成患者数据的质量。保留隐私。在本文中,我们提出了DP-CGAN,这是一个由数据转换,采样,条件和网络培训组成的差异私有条件GAN框架,以生成现实且具有隐私性的表格数据。 DP-Cgans区分分类和连续变量,并将它们分别转换为潜在空间。然后,我们将条件矢量构建为附加输入,不仅在不平衡数据中介绍少数族裔类,还可以捕获变量之间的依赖性。我们将统计噪声注入DP-CGAN的网络训练过程中的梯度,以提供差异隐私保证。我们通过统计相似性,机器学习绩效和隐私测量值在三个公共数据集和两个现实世界中的个人健康数据集上使用最先进的生成模型广泛评估了我们的模型。我们证明,我们的模型优于其他可比模型,尤其是在捕获变量之间的依赖性时。最后,我们在合成数据生成中介绍了数据实用性与隐私之间的平衡,考虑到现实世界数据集的不同数据结构和特征,例如不平衡变量,异常分布和数据的稀疏性。
translated by 谷歌翻译
数据通常以表格格式存储。几个研究领域(例如,生物医学,断层/欺诈检测),容易出现不平衡的表格数据。由于阶级失衡,对此类数据的监督机器学习通常很困难,从而进一步增加了挑战。合成数据生成,即过采样是一种用于提高分类器性能的常见补救措施。最先进的线性插值方法,例如洛拉斯和普罗拉斯,可用于从少数族裔类的凸空间中生成合成样本,以在这种情况下提高分类器的性能。生成的对抗网络(GAN)是合成样本生成的常见深度学习方法。尽管GAN被广泛用于合成图像生成,但在不平衡分类的情况下,它们在表格数据上的范围没有充分探索。在本文中,我们表明,与线性插值方法相比,现有的深层生成模型的性能较差,该方法从少数族裔类的凸空间中生成合成样本,对于小规模的表格数据集中的分类问题不平衡。我们提出了一个深厚的生成模型,将凸出空间学习和深层生成模型的思想结合在一起。 Convgen了解了少数族类样品的凸组合的系数,因此合成数据与多数类的不同。我们证明,与现有的深层生成模型相比,我们提出的模型Convgen在与现有的线性插值方法相当的同时,改善了此类小数据集的不平衡分类。此外,我们讨论了如何将模型用于一般的综合表格数据生成,甚至超出了数据不平衡的范围,从而提高了凸空间学习的整体适用性。
translated by 谷歌翻译
随着技术的发展,信用卡欺诈的数量一直在增长,人们可以利用它。因此,实施一种健壮有效的方法来检测此类欺诈非常重要。机器学习算法适合这些任务,因为它们试图最大程度地提高预测的准确性,因此可以依靠。但是,在机器学习模型中存在一个即将到来的缺陷,由于样本集中的类别分布之间存在不平衡的原因,因此可能表现不佳。因此,在许多相关任务中,数据集有少数观察到的欺诈案件(有时发现了1%的正欺诈实例)。因此,这种不平衡的存在可能会通过将所有标签视为多数级别来影响任何学习模型的行为,因此在模型做出的预测中不允许概括概括。我们培训了生成对抗网络(GAN),以产生大量令人信服的(可靠)的少数族裔典范,可用于缓解培训集中的类失衡,从而更有效地学习数据。
translated by 谷歌翻译
Electronic Health Records (EHRs) are a valuable asset to facilitate clinical research and point of care applications; however, many challenges such as data privacy concerns impede its optimal utilization. Deep generative models, particularly, Generative Adversarial Networks (GANs) show great promise in generating synthetic EHR data by learning underlying data distributions while achieving excellent performance and addressing these challenges. This work aims to review the major developments in various applications of GANs for EHRs and provides an overview of the proposed methodologies. For this purpose, we combine perspectives from healthcare applications and machine learning techniques in terms of source datasets and the fidelity and privacy evaluation of the generated synthetic datasets. We also compile a list of the metrics and datasets used by the reviewed works, which can be utilized as benchmarks for future research in the field. We conclude by discussing challenges in GANs for EHRs development and proposing recommended practices. We hope that this work motivates novel research development directions in the intersection of healthcare and machine learning.
translated by 谷歌翻译
Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such as GANs. There is valuable information in those patterns that machine learning models can use to better classify, predict or perform other downstream tasks. We propose a novel framework that takes time series' common origin into account and favors channel/feature relationships preservation. The two key points of our method are: 1) the individual time series are generated from a common point in latent space and 2) a central discriminator favors the preservation of inter-channel/feature dynamics. We demonstrate empirically that our method helps preserve channel/feature correlations and that our synthetic data performs very well in downstream tasks with medical and financial data.
translated by 谷歌翻译
“轨迹”是指由地理空间中的移动物体产生的迹线,通常由一系列按时间顺序排列的点表示,其中每个点由地理空间坐标集和时间戳组成。位置感应和无线通信技术的快速进步使我们能够收集和存储大量的轨迹数据。因此,许多研究人员使用轨迹数据来分析各种移动物体的移动性。在本文中,我们专注于“城市车辆轨迹”,这是指城市交通网络中车辆的轨迹,我们专注于“城市车辆轨迹分析”。城市车辆轨迹分析提供了前所未有的机会,可以了解城市交通网络中的车辆运动模式,包括以用户为中心的旅行经验和系统范围的时空模式。城市车辆轨迹数据的时空特征在结构上相互关联,因此,许多先前的研究人员使用了各种方法来理解这种结构。特别是,由于其强大的函数近似和特征表示能力,深度学习模型是由于许多研究人员的注意。因此,本文的目的是开发基于深度学习的城市车辆轨迹分析模型,以更好地了解城市交通网络的移动模式。特别是,本文重点介绍了两项研究主题,具有很高的必要性,重要性和适用性:下一个位置预测,以及合成轨迹生成。在这项研究中,我们向城市车辆轨迹分析提供了各种新型模型,使用深度学习。
translated by 谷歌翻译
系外行星的检测为发现新的可居住世界的发现打开了大门,并帮助我们了解行星的形成方式。 NASA的目的是寻找类似地球的宜居行星,推出了开普勒太空望远镜及其后续任务K2。观察能力的进步增加了可用于研究的新鲜数据的范围,并且手动处理它们既耗时又困难。机器学习和深度学习技术可以极大地帮助降低人类以经济和公正的方式处理这些系外行星计划的现代工具所产生的大量数据的努力。但是,应注意精确地检测所有系外行星,同时最大程度地减少对非外界星星的错误分类。在本文中,我们利用了两种生成对抗网络的变体,即半监督的生成对抗网络和辅助分类器生成对抗网络,在K2数据中检测传播系外行星。我们发现,这些模型的用法可能有助于用系外行星的恒星分类。我们的两种技术都能够在测试数据上以召回和精度为1.00的光曲线分类。我们的半监督技术有益于解决创建标签数据集的繁琐任务。
translated by 谷歌翻译
本文提出了有条件生成对抗性网络(CGANS)的两个重要贡献,以改善利用此架构的各种应用。第一个主要贡献是对CGANS的分析表明它们没有明确条件。特别地,将显示鉴别者和随后的Cgan不会自动学习输入之间的条件。第二种贡献是一种新方法,称为逆时针,该方法通过新颖的逆损失明确地模拟了对抗架构的两部分的条件,涉及培训鉴别者学习无条件(不利)示例。这导致了用于GANS(逆学习)的新型数据增强方法,其允许使用不利示例将发电机的搜索空间限制为条件输出。通过提出概率分布分析,进行广泛的实验以评估判别符的条件。与不同应用的CGAN架构的比较显示了众所周知的数据集的性能的显着改进,包括使用不同度量的不同度量的语义图像合成,图像分割,单眼深度预测和“单个标签” - 图像(FID) ),平均联盟(Miou)交叉口,根均线误差日志(RMSE日志)和统计上不同的箱数(NDB)。
translated by 谷歌翻译
在本文中,我们介绍了一种生成的对抗性网络(GaN)机器学习模型,用于在空间域中插入不规则分布的测量,以构造平滑的射频图(RFMAP),然后使用深神经网络进行定位。在空间,时间和频域中监控无线频谱将成为促进超出-5G和6G通信技术的动态频谱访问(DSA)的关键特性。本地化,无线信号检测和频谱策略制作是分布式频谱感测的几个应用程序将发挥重要作用。无线发射器的检测和定位是在大谱和空间区域中非常具有挑战性的任务。为了构建平滑的RFMAP数据库,需要大量测量,这可能非常昂贵且耗时。一种帮助实现这些系统的一种方法是在给定区域中收集有限的局部测量,然后将测量值插入以构造数据库。文献中的当前方法采用信道建模来构建射频图,其缺乏用于精确定位的粒度,而我们所提出的方法重建了新的广义RFMAP。将本地化结果与传统信道模型进行了呈现和比较。
translated by 谷歌翻译
设计精确预测的设计模型是机器学习的基本目标。这项工作提出了表明,当可以从感兴趣的过程中提取目标变量相对于输入的衍生物,可以利用它们以提高可视机器学习模型的准确性。探索了四个关键思路:(1)提高线性回归模型和前馈神经网络(NNS)的预测精度;(2)使用培训的前馈NNS的性能之间的差异,没有梯度信息来调谐NN复杂度(以隐藏节点号的形式);(3)使用梯度信息来正规化线性回归;(4)使用梯度信息来改善生成图像模型。在这种应用中,梯度信息显示为增强每个预测模型,展示其对各种应用的价值。
translated by 谷歌翻译
Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.
translated by 谷歌翻译
我们使用生成的对抗网络(GaN)展示了一种数学上良好的湍流模型的合成建模方法。基于对遍历性的混沌,确定性系统的分析,我们概述了一个数学证据,即GaN实际上可以学习采样状态快照,从而形成混沌系统的不变度量。基于该分析,我们研究了从Lorenz吸引子开始的混沌系统的层次,然后继续与GaN的湍流模拟。作为培训数据,我们使用从大型涡流模拟(LES)获得的速度波动领域。详细研究了两种建筑:我们使用深卷积的GaN(DCGAN)来合成圆柱周围的湍流。我们还使用PIX2PIXHD架构模拟低压涡轮定子围绕的流量,用于条件DCGAN在定子前方的旋转唤醒位置上调节。解释了对抗性培训的设置和使用特定GAN架构的影响。从而表明,GaN在技术上挑战流动问题的基础上的训练日期是有效的模拟湍流。与经典的数值方法,特别是LES相比,GaN训练和推理时间显着下降,同时仍然在高分辨率下提供湍流流动。
translated by 谷歌翻译
In recent years, applying deep learning (DL) to assess structural damages has gained growing popularity in vision-based structural health monitoring (SHM). However, both data deficiency and class-imbalance hinder the wide adoption of DL in practical applications of SHM. Common mitigation strategies include transfer learning, over-sampling, and under-sampling, yet these ad-hoc methods only provide limited performance boost that varies from one case to another. In this work, we introduce one variant of the Generative Adversarial Network (GAN), named the balanced semi-supervised GAN (BSS-GAN). It adopts the semi-supervised learning concept and applies balanced-batch sampling in training to resolve low-data and imbalanced-class problems. A series of computer experiments on concrete cracking and spalling classification were conducted under the low-data imbalanced-class regime with limited computing power. The results show that the BSS-GAN is able to achieve better damage detection in terms of recall and $F_\beta$ score than other conventional methods, indicating its state-of-the-art performance.
translated by 谷歌翻译
轴承是容易出乎意料断层的旋转机的重要组成部分之一。因此,轴承诊断和状况监测对于降低众多行业的运营成本和停机时间至关重要。在各种生产条件下,轴承可以在一系列载荷和速度下进行操作,这会导致与每种故障类型相关的不同振动模式。正常数据很足够,因为系统通常在所需条件下工作。另一方面,故障数据很少见,在许多情况下,没有记录故障类别的数据。访问故障数据对于开发数据驱动的故障诊断工具至关重要,该工具可以提高操作的性能和安全性。为此,引入了基于条件生成对抗网络(CGAN)的新型算法。该算法对任何实际故障条件的正常和故障数据进行培训,从目标条件的正常数据中生成故障数据。所提出的方法在现实世界中的数据集上进行了验证,并为不同条件生成故障数据。实施了几种最先进的分类器和可视化模型,以评估合成数据的质量。结果证明了所提出的算法的功效。
translated by 谷歌翻译
Supervised classification methods have been widely utilized for the quality assurance of the advanced manufacturing process, such as additive manufacturing (AM) for anomaly (defects) detection. However, since abnormal states (with defects) occur much less frequently than normal ones (without defects) in the manufacturing process, the number of sensor data samples collected from a normal state outweighs that from an abnormal state. This issue causes imbalanced training data for classification models, thus deteriorating the performance of detecting abnormal states in the process. It is beneficial to generate effective artificial sample data for the abnormal states to make a more balanced training set. To achieve this goal, this paper proposes a novel data augmentation method based on a generative adversarial network (GAN) using additive manufacturing process image sensor data. The novelty of our approach is that a standard GAN and classifier are jointly optimized with techniques to stabilize the learning process of standard GAN. The diverse and high-quality generated samples provide balanced training data to the classifier. The iterative optimization between GAN and classifier provides the high-performance classifier. The effectiveness of the proposed method is validated by both open-source data and real-world case studies in polymer and metal AM processes.
translated by 谷歌翻译