半监督域适应性(SSDA)中的主要挑战之一是标记源和目标样本数量之间的偏差比,导致该模型偏向源域。 SSDA中的最新作品表明,仅将标记的目标样品与源样本对齐可能导致目标域与源域的不完全域对齐。在我们的方法中,为了使两个域对齐,我们利用对比的损失,使用来自两个域的监督样本学习语义上有意义的域不可知特征空间。为了减轻偏斜标签比率引起的挑战,我们通过将其特征表示形式与来自源和目标域的标记样品的特征表示形式进行比较,为未标记的目标样本进行了伪造。此外,为了增加目标域的支持,在训练过程中,这些潜在的嘈杂的伪标签逐渐被逐渐注入标记的目标数据集中。具体而言,我们使用温度缩放的余弦相似性度量将软伪标签分配给未标记的目标样品。此外,我们计算每个未标记样品的软伪标签的指数移动平均值。这些伪标签逐渐注入或删除)(从)基于置信阈值(以补充源和目标分布的比对)(从)中(从)中。最后,我们在标记和伪标记的数据集上使用有监督的对比损失来对齐源和目标分布。使用我们提出的方法,我们在SSDA基准测试中展示了最先进的性能-Office-Home,Domainnet和Office-31。
translated by 谷歌翻译
Recent advances in open-domain question answering (ODQA) have demonstrated impressive accuracy on standard Wikipedia style benchmarks. However, it is less clear how robust these models are and how well they perform when applied to real-world applications in drastically different domains. While there has been some work investigating how well ODQA models perform when tested for out-of-domain (OOD) generalization, these studies have been conducted only under conservative shifts in data distribution and typically focus on a single component (ie. retrieval) rather than an end-to-end system. In response, we propose a more realistic and challenging domain shift evaluation setting and, through extensive experiments, study end-to-end model performance. We find that not only do models fail to generalize, but high retrieval scores often still yield poor answer prediction accuracy. We then categorize different types of shifts and propose techniques that, when presented with a new dataset, predict if intervention methods are likely to be successful. Finally, using insights from this analysis, we propose and evaluate several intervention methods which improve end-to-end answer F1 score by up to 24 points.
translated by 谷歌翻译
The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language summarization datasets. The ISUM 2022 dataset consists of news articles written in Indian English, Hindi, and Gujarati respectively, and their ground-truth summarizations. In our work, we explore different pre-trained seq2seq models and fine-tune those with the ILSUM 2022 datasets. In our case, the fine-tuned SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and ROUGE-4 as the evaluation metrics.
translated by 谷歌翻译
Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce ``Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate a synthetic dataset which can be used to bootstrap a model's ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement of ~5% absolute F1 on a few-shot version of the DROP dataset when compared with a state-of-the-art model with the same supervision.
translated by 谷歌翻译
Federated Deep Learning frameworks can be used strategically to monitor Land Use locally and infer environmental impacts globally. Distributed data from across the world would be needed to build a global model for Land Use classification. The need for a Federated approach in this application domain would be to avoid transfer of data from distributed locations and save network bandwidth to reduce communication cost. We use a Federated UNet model for Semantic Segmentation of satellite and street view images. The novelty of the proposed architecture is the integration of Knowledge Distillation to reduce communication cost and response time. The accuracy obtained was above 95% and we also brought in a significant model compression to over 17 times and 62 times for street View and satellite images respectively. Our proposed framework has the potential to be a game-changer in real-time tracking of climate change across the planet.
translated by 谷歌翻译
语义上有意义的句子嵌入对于自然语言处理中的许多任务都很重要。为了获得此类嵌入,最近的研究探讨了利用验证语言模型(PLM)作为训练语料库的合成生成数据的想法。但是,PLM通常会产生与人类写的句子大不相同的句子。我们假设将所有这些合成示例同样地用于训练深层神经网络可能会对学习语义上有意义的嵌入产生不利影响。为了分析这一点,我们首先训练一个分类器来识别机器编写的句子,并观察到机器编写的句子的语言特征与人写的句子的语言特征大不相同。基于此,我们提出了一种新颖的方法,该方法首先训练分类器来衡量每个句子的重要性。然后,分类器的蒸馏信息用于训练可靠的句子嵌入模型。通过对四个现实世界数据集的广泛评估,我们证明了我们的合成数据训练的模型可以很好地概括并表现优于现有基线。我们的实现可在https://github.com/ddehun/coling2022_reweighting_sts上公开获得。
translated by 谷歌翻译
尽管最近的分布(OOD)检测,异常检测和不确定性估计任务的最新进展,但并不存在任务不合时宜的和事后方法。为了解决此限制,我们设计了一种基于聚类的新型结合方法,称为任务不可知和事后看不见的分布检测(TAPUDD),该方法利用了从对特定任务进行训练的模型中提取的功能。它明确地包括Tap-Mahalanobis,该曲线簇起训练数据集的特征,并确定了所有群集的测试样品的最小Mahalanobis距离。此外,我们提出了一个结合模块,该模块汇总了对不同数量簇的迭代TAP-MAHALANOBIS的计算,以提供可靠,有效的群集计算。通过对合成和现实世界数据集进行的广泛实验,我们观察到我们的方法可以在各种任务中有效地检测出看不见的样本,并与现有基线进行更好的或与现有基线相比。为此,我们消除了确定簇数量的最佳价值的必要性,并证明我们的方法对于大规模分类任务更可行。
translated by 谷歌翻译
可解释的人工智能(XAI)方法缺乏地面真理。代替方法,方法开发人员依靠公理来确定其解释行为的理想特性。对于需要解释性的机器学习的高利益使用,因此依靠公理作为实现或使用不足是不足以实现理想的。结果,对验证XAI方法的性能进行了积极的研究。在依赖XAI的域中,对验证的需求特别放大。一项消融研究,经常用于评估其效用并在某种程度上评估其效用的程序。通过在重要性等级顺序上扰动输入变量,目标是评估模型性能的敏感性。扰动重要变量应与模型能力度量的降低相关,而不是扰动不太重要的特征。尽管意图很明确,但实际实施细节尚未针对表格数据进行严格研究。使用五个数据集,三种XAI方法,四个基线和三个扰动,我们的目的是表明1)不同的扰动和添加简单的护栏如何有助于避免可能有缺陷的结论,2)分类变量的处理是如何在两个帖子中都重要的考虑因素。 - HOC解释性和消融研究,以及3)如何识别XAI方法的有用基准,以及用于消融研究的可行扰动。
translated by 谷歌翻译
联合学习(FL)是一个活跃的研究领域。采用FL的最合适区域之一是医疗领域,必须尊重患者隐私。但是,先前的研究并未完全考虑谁最有可能在医疗领域使用FL。渴望采用FL的不是医院,而是想要开发具有真实患者记录的机器学习模型的服务提供商。此外,服务提供商希望以最低成本的可能性来最大程度地提高模型的性能。在这项工作中,我们提出了FL方法的经验基准,考虑了三个现实世界数据集的性能和货币成本:电子健康记录,皮肤癌图像和心电图数据集。我们还建议使用近端正则化的联合学习,除了局部归一化(FEDPXN),该学习使用FEDPROX和FEDBN的简单组合优于所有其他FL算法,而仅消耗比最高效率的方法稍大一些。
translated by 谷歌翻译
尽管在最近的文献中提出了几种类型的事后解释方法(例如,特征归因方法),但在系统地以有效且透明的方式进行系统基准测试这些方法几乎没有工作。在这里,我们介绍了OpenXai,这是一个全面且可扩展的开源框架,用于评估和基准测试事后解释方法。 OpenXAI由以下关键组件组成:(i)灵活的合成数据生成器以及各种现实世界数据集,预训练的模型和最新功能属性方法的集合,(ii)开源实现22个定量指标,用于评估忠诚,稳定性(稳健性)和解释方法的公平性,以及(iii)有史以来第一个公共XAI XAI排行榜对基准解释。 OpenXAI很容易扩展,因为用户可以轻松地评估自定义说明方法并将其纳入我们的排行榜。总体而言,OpenXAI提供了一种自动化的端到端管道,该管道不仅简化并标准化了事后解释方法的评估,而且还促进了基准这些方法的透明度和可重复性。 OpenXAI数据集和数据加载程序,最先进的解释方法的实现和评估指标以及排行榜,可在https://open-xai.github.io/上公开获得。
translated by 谷歌翻译