当源和目标域之间存在较大差异时,无监督域适应性的有效性会降低。通过利用逐渐从源到目标转移的其他未标记数据,逐渐的域适应(GDA)是减轻此问题的一种有希望的方法。通过依次沿“索引”中间域调整模型,GDA显着提高了整体适应性性能。但是,实际上,额外的未标记数据可能不会分离为中间域并正确索引,从而限制了GDA的适用性。在本文中,我们研究了如何在尚未可用时发现中间域的序列。具体而言,我们提出了一个粗到精细的框架,该框架从通过渐进域鉴别训练的粗域发现步骤开始。然后,这种粗糙的域序列通过新的周期矛盾损失进行了精细的索引步骤,这鼓励下一个中间域,以保留对当前中间域的足够判别知识。然后可以通过GDA算法使用所得的域序列。在GDA的基准数据集上,我们表明,我们将其命名为中间域标签(偶像)的方法可以导致与预定义的域序列相比,可相当甚至更好的适应性性能,使GDA更适合质量,使GDA更适用和强大域序列。代码可从https://github.com/hongyouc/idol获得。
translated by 谷歌翻译
在大多数有关联合学习(FL)的文献中,神经网络都是随机重量初始化的。在本文中,我们介绍了一项关于预训练对FL的影响的实证研究。具体而言,我们旨在调查当客户的分散数据是非IID时,预训练是否可以减轻急剧精度下降。我们专注于FedAvg,这是基本和最广泛使用的FL算法。我们发现,在非IID数据下,预培训确实在很大程度上缩小了FedAvg和集中学习之间的差距,但这并不是由于减轻了FedAvg的本地培训中众所周知的模型漂移问题。相反,预培训如何通过使FedAvg的全球聚合更加稳定来帮助FedAvg。当使用真实数据的预训练对于FL不可行时,我们提出了一种新型的方法,可以预先培训合成数据。在各种图像数据集(包括用于分割的一个)上,我们使用合成预训练的方法导致了显着的增益,这实质上是为扩大现实世界应用程序的联合学习而迈出的关键步骤。
translated by 谷歌翻译
Federated Learning有望在不访问数据的情况下与多个客户进行协作培训模型的能力,但是当客户的数据分布彼此差异时脆弱。这种差异进一步导致了困境:“我们是否应该优先考虑学习模型的通用性能(用于服务器的将来使用)或其个性化绩效(对于每个客户端)?”这两个看似竞争的目标使社区分裂了专注于一个或另一个,但在本文中,我们表明可以同时实现这两者。具体而言,我们提出了一个新颖的联邦学习框架,该框架将模型的双重职责与两个预测任务相结合。一方面,我们介绍了一个损失家族,这些损失家庭对非相同的班级分布,使客户能够培训一个通用的预测指标,并以一致的目标培训。另一方面,我们将个性化预测变量作为一种轻巧的自适应模块,以最大程度地减少每个客户在通用预测指标上的经验风险。借助我们将联合强大的脱钩(FED-ROD)命名的两个损失的两次挑战框架,学识渊博的模型可以同时实现最先进的通用和个性化的性能,从而实质上弥补了这两个任务。
translated by 谷歌翻译
已知经过类不平衡数据培训的分类器在“次要”类的测试数据上表现不佳,我们的培训数据不足。在本文中,我们调查在这种情况下学习Convnet分类器。我们发现,Convnet显着夸大了次要类别,这与通常拟合的次要类别的传统机器学习算法完全相反。我们进行了一系列分析,并发现了特征偏差现象 - 学识渊博的Convnet在次要类别的训练和测试数据之间产生了偏差的特征 - 这解释了过度拟合的情况。为了补偿特征偏差的影响,将测试数据推向低决策价值区域,我们建议将依赖类的温度(CDT)纳入训练convnet。 CDT在训练阶段模拟特征偏差,迫使Convnet扩大次级数据的决策值,从而可以在测试阶段克服实际特征偏差。我们在基准数据集上验证我们的方法并实现有希望的性能。我们希望我们的见解能够激发解决阶级失去平衡深度学习的新思维方式。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译