现有的分布(OOD)检测方法通常在具有平衡的类别分布的培训集中进行基准测试。但是,在实际应用程序中,培训集具有长尾分配是很常见的。在这项工作中,我们首先证明现有的OOD检测方法通常会在训练集分布式分布时遭受重大性能降解。通过分析,我们认为这是因为模型难以区分少数尾巴级分配样本与真实的OOD样本,从而使尾巴类更容易被错误地检测为OOD。为了解决这个问题,我们提出了部分和不对称的监督对比学习(PASCL),该学习明确鼓励该模型区分尾级分配样本和OOD样品。为了进一步提高分布分类的准确性,我们提出了辅助分支列式,该辅助分支列出了BN的两个单独分支和分类层分别用于异常检测和分布分类。直觉是,分布和OOD异常数据具有不同的基础分布。我们的方法的表现优于先前的最新方法$ 1.29 \%$,$ 1.45 \%$,$ 0.69 \%$ $ $ $ $ $ $异常检测误报(FPR)和$ 3.24 \%\%$,$ 4.06 \%$,$ 7.89 \%$ $ CIFAR10-LT,CIFAR100-LT和IMAGENET-LT的分布分类精度。代码和预培训模型可在https://github.com/amazon-research/long-tailed-ood-detection上找到。
translated by 谷歌翻译
只给出了积极的例子和未标记的例子(来自正面和负数),我们可能希望估计准确的正面与负分类器。正式地,该任务分为两个子任务:(i)混合比例估计(MPE) - 确定未标记数据中的正例的分数; (ii)PU-Learning - 鉴于这样的估计,学习所需的正面与负分类器。不幸的是,两个问题的古典方法在高维设置中分解。与此同时,最近提出的启发式缺乏理论一致性,并效力依赖于近双车调谐。在本文中,我们提出了两种简单的技术:最好的箱估计(BBE)(用于MPE);而有条件的价值忽略风险(CVIR),对PU学习的简单目标。这两种方法都主导了先前的方法,并且对于BBE,我们建立正式保证,每当我们可以培训模型来干净地分离出一小部分积极示例的担保。我们的最终算法(TED)$ ^ N $,两种程序之间交替,显着改善了我们的混合比例估计器和分类器
translated by 谷歌翻译
Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution p(y). BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE's consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate estimates and improved prediction, even on high-dimensional datasets of natural images.
translated by 谷歌翻译
This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
translated by 谷歌翻译
We derive a set of causal deep neural networks whose architectures are a consequence of tensor (multilinear) factor analysis. Forward causal questions are addressed with a neural network architecture composed of causal capsules and a tensor transformer. The former estimate a set of latent variables that represent the causal factors, and the latter governs their interaction. Causal capsules and tensor transformers may be implemented using shallow autoencoders, but for a scalable architecture we employ block algebra and derive a deep neural network composed of a hierarchy of autoencoders. An interleaved kernel hierarchy preprocesses the data resulting in a hierarchy of kernel tensor factor models. Inverse causal questions are addressed with a neural network that implements multilinear projection and estimates the causes of effects. As an alternative to aggressive bottleneck dimension reduction or regularized regression that may camouflage an inherently underdetermined inverse problem, we prescribe modeling different aspects of the mechanism of data formation with piecewise tensor models whose multilinear projections are well-defined and produce multiple candidate solutions. Our forward and inverse neural network architectures are suitable for asynchronous parallel computation.
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
We propose reconstruction probing, a new analysis method for contextualized representations based on reconstruction probabilities in masked language models (MLMs). This method relies on comparing the reconstruction probabilities of tokens in a given sequence when conditioned on the representation of a single token that has been fully contextualized and when conditioned on only the decontextualized lexical prior of the model. This comparison can be understood as quantifying the contribution of contextualization towards reconstruction -- the difference in the reconstruction probabilities can only be attributed to the representational change of the single token induced by contextualization. We apply this analysis to three MLMs and find that contextualization boosts reconstructability of tokens that are close to the token being reconstructed in terms of linear and syntactic distance. Furthermore, we extend our analysis to finer-grained decomposition of contextualized representations, and we find that these boosts are largely attributable to static and positional embeddings at the input layer.
translated by 谷歌翻译
Diffusion models have achieved justifiable popularity by attaining state-of-the-art performance in generating realistic objects from seemingly arbitrarily complex data distributions, including when conditioning generation on labels. Unfortunately, however, their iterative nature renders them very computationally inefficient during the sampling process. For the multi-class conditional generation problem, we propose a novel, structurally unique framework of diffusion models which are hierarchically branched according to the inherent relationships between classes. In this work, we demonstrate that branched diffusion models offer major improvements in efficiently generating samples from multiple classes. We also showcase several other advantages of branched diffusion models, including ease of extension to novel classes in a continual-learning setting, and a unique interpretability that offers insight into these generative models. Branched diffusion models represent an alternative paradigm to their traditional linear counterparts, and can have large impacts in how we use diffusion models for efficient generation, online learning, and scientific discovery.
translated by 谷歌翻译
The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 35K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.
translated by 谷歌翻译