流程的执行留下了信息系统中事件数据的痕迹。这些事件数据可以通过过程挖掘技术进行分析。对于传统的流程​​挖掘技术,必须将每个事件与一个对象(例如公司的客户)相关联。与一个对象相关的事件形成一个称为案例的事件序列。一个案例描述了通过流程进行的端到端运行。事件数据中包含的案例可用于发现过程模型,检测频繁的瓶颈或学习预测模型。但是,在现实生活中遇到的事件,例如ERP系统通常可以与多个对象关联。传统的顺序案例概念缺少这些以对象为中心的事件数据,因为这些数据显示了图形结构。一个人可能会通过使其变色将以对象为中心的事件数据迫使传统案例概念。但是,扁平化操纵数据并删除信息。因此,与传统事件日志的案例概念相似的概念对于启用以对象为中心的事件数据应用不同的过程挖掘任务是必要的。在本文中,我们介绍了以对象为中心的过程挖掘的案例概念:过程执行。这些是基于图形的案例概括,如传统过程采矿中所考虑的。此外,我们提供了提取过程执行的技术。基于这些执行,我们确定了使用图同构的属性相对于属性的等效过程行为。关于事件活动的等效过程执行是以对象为中心的变体,即传统过程挖掘中变体的概括。我们为以对象为中心的变体提供了可视化技术。贡献的可伸缩性和效率得到了广泛的评估。此外,我们提供了一个案例研究,显示了现实生活中最常见的以对象为中心的变体。
translated by 谷歌翻译
Many business workflows require extracting important fields from form-like documents (e.g. bank statements, bills of lading, purchase orders, etc.). Recent techniques for automating this task work well only when trained with large datasets. In this work we propose a novel data augmentation technique to improve performance when training data is scarce, e.g. 10-250 documents. Our technique, which we call FieldSwap, works by swapping out the key phrases of a source field with the key phrases of a target field to generate new synthetic examples of the target field for use in training. We demonstrate that this approach can yield 1-7 F1 point improvements in extraction performance.
translated by 谷歌翻译
Image-text multimodal representation learning aligns data across modalities and enables important medical applications, e.g., image classification, visual grounding, and cross-modal retrieval. In this work, we establish a connection between multimodal representation learning and multiple instance learning. Based on this connection, we propose a generic framework for constructing permutation-invariant score functions with many existing multimodal representation learning approaches as special cases. Furthermore, we use the framework to derive a novel contrastive learning approach and demonstrate that our method achieves state-of-the-art results on a number of downstream tasks.
translated by 谷歌翻译
Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model.
translated by 谷歌翻译
Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting flight routes could be an inexpensive and effective way to reduce their impact. An accurate, automated, and reliable detection algorithm is required to develop and evaluate contrail avoidance strategies. Advancement in contrail detection has been severely limited due to several factors, primarily due to a lack of quality-labeled data. Recently, proposed a large human-labeled Landsat-8 contrails dataset. Each contrail is carefully labeled with various inputs in various scenes of Landsat-8 satellite imagery. In this work, we benchmark several popular segmentation models with combinations of different loss functions and encoder backbones. This work is the first to apply state-of-the-art segmentation techniques to detect contrails in low-orbit satellite imagery. Our work can also be used as an open benchmark for contrail segmentation and is publicly available.
translated by 谷歌翻译
As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals. While such recourses can be immensely beneficial to affected individuals, potential adversaries could also exploit these recourses to compromise privacy. In this work, we make the first attempt at investigating if and how an adversary can leverage recourses to infer private information about the underlying model's training data. To this end, we propose a series of novel membership inference attacks which leverage algorithmic recourse. More specifically, we extend the prior literature on membership inference attacks to the recourse setting by leveraging the distances between data instances and their corresponding counterfactuals output by state-of-the-art recourse methods. Extensive experimentation with real world and synthetic datasets demonstrates significant privacy leakage through recourses. Our work establishes unintended privacy leakage as an important risk in the widespread adoption of recourse methods.
translated by 谷歌翻译
Deep Ensemble Convolutional Neural Networks has become a methodology of choice for analyzing medical images with a diagnostic performance comparable to a physician, including the diagnosis of Diabetic Retinopathy. However, commonly used techniques are deterministic and are therefore unable to provide any estimate of predictive uncertainty. Quantifying model uncertainty is crucial for reducing the risk of misdiagnosis. A reliable architecture should be well-calibrated to avoid over-confident predictions. To address this, we propose a UATTA-ENS: Uncertainty-Aware Test-Time Augmented Ensemble Technique for 5 Class PIRC Diabetic Retinopathy Classification to produce reliable and well-calibrated predictions.
translated by 谷歌翻译
流行模型是理解传染病的强大工具。但是,随着它们的大小和复杂性的增加,它们可以迅速在计算上棘手。建模方法的最新进展表明,替代模型可用于模拟具有高维参数空间的复杂流行模型。我们表明,深层序列到序列(SEQ2SEQ)模型可以作为具有基于序列模型参数的复杂流行病模型的准确替代物,从而有效地复制了季节性和长期传播动力学。一旦受过培训,我们的代理人可以预测场景比原始模型快几千倍,从而使其非常适合策略探索。我们证明,用博学的模拟器代替传统的流行模型有助于强大的贝叶斯推断。
translated by 谷歌翻译
虽然在线社交媒体提供了一种忽略或窒息的声音的方式,但它还使用户可以平台传播可恨的言论。这种讲话通常起源于边缘社区,但它可以溢出到主流渠道中。在本文中,我们衡量加入边缘仇恨社区的影响,以仇恨言论传播到社交网络的其余部分。我们利用Reddit的数据来评估加入一种回声室的效果:一个志趣相投的用户,表现出仇恨行为的数字社区。我们在成为积极参与者之前和之后衡量成员在研究社区之外的仇恨言论的用法。使用中断的时间序列(ITS)分析作为因果推理方法,我们衡量了溢出效应,其中某个社区内的可恨语言可以通过使用社区外的仇恨单词用作代理,可以通过使用社区的层次来传播该社区之外的效果对于博学的仇恨。我们研究了涵盖仇恨言论的三个领域的四个不同的Reddit子社区(子红):种族主义,厌女症和脂肪欺骗。在所有三种情况下,我们发现在原始社区之外的仇恨言论都在增加,这意味着加入此类社区会导致仇恨言论在整个平台中传播。此外,在最初加入社区后的几个月后,发现用户可以在几个月内接受这种新的仇恨演讲。我们表明,有害的言论不保留在社区中。我们的结果提供了回声室有害影响的新证据,以及调节它们以减少仇恨言论的潜在好处。
translated by 谷歌翻译
我们提出了一种确定性等效方案,以自适应控制标量线性系统,约为I.I.D.高斯干扰和有限的控制输入约束,而无需先验系统参数的界限,也不需要控制方向。假设该系统处于偏差稳定的范围内,则证明了闭环系统状态的均方根界。最后,提出了数值示例,以说明我们的结果。
translated by 谷歌翻译