开放型域适应(OSDA)假设目标域包含未知类,这些类未在源域中发现。现有的域对抗学习方法不适合OSDA,因为与\ textit {Unknown}类匹配的分布会导致负转移。以前的OSDA方法仅通过使用\ textit {已知}类而着重于匹配源和目标分布。但是,此\ textit {已知} - 仅匹配可能无法学习目标 - \ textit {unknown}特征空间。因此,我们提出了不知名的域对抗学习(uadal),\ textit {aligns} source and targe- \ textit {已知{已知{已知{已知{已知{功能对齐过程。我们提供了有关提出的\ textIt {unknown-ware}特征对齐的优化状态的理论分析,因此我们可以保证\ textit {Alignment}和\ textit {segregation}理论上。从经验上讲,我们在基准数据集上评估了Uadal,该数据集表明Uadal通过报告最先进的性能来优于其他具有更好特征对齐方式的方法。
translated by 谷歌翻译
尽管存在扩散模型的各种变化,但将线性扩散扩散到非线性扩散过程中仅由几项作品研究。非线性效应几乎没有被理解,但是直觉上,将有更多有希望的扩散模式来最佳地训练生成分布向数据分布。本文介绍了基于分数扩散模型的数据自适应和非线性扩散过程。提出的隐式非线性扩散模型(INDM)通过结合归一化流量和扩散过程来学习非线性扩散过程。具体而言,INDM通过通过流网络利用\ textIt {litex {litex {littent Space}的线性扩散来隐式构建\ textIt {data Space}的非线性扩散。由于非线性完全取决于流网络,因此该流网络是形成非线性扩散的关键。这种灵活的非线性是针对DDPM ++的非MLE训练,将INDM的学习曲线提高到了几乎最大的似然估计(MLE)训练,事实证明,这是具有身份流量的INDM的特殊情况。同样,训练非线性扩散可以通过离散的步骤大小产生采样鲁棒性。在实验中,INDM实现了Celeba的最新FID。
translated by 谷歌翻译
通过提供语义来改进字符序列,语言知识对现场文本识别带来了很大的好处。然而,由于语言知识已经单独应用于输出序列,因此之前的方法没有充分利用语义来理解文本识别的视觉线索。本文介绍了一种名为多模态文本识别网络(MITRN)的新方法,其能够实现视觉和语义特征之间的相互作用以获得更好的识别性能。具体地,Matrn识别视觉和语义特征对并将空间信息进行编码为语义特征。基于空间编码,通过参考其他模态的相关特征提高了视觉和语义特征。此外,通过隐藏与训练阶段中的角色相关的视觉线程来刺激基质特征将语义特征组合成视觉特征。我们的实验表明,在具有大边缘的七个基准上实现了最先进的表演,而两个方式的天真组合显示了边缘改善。进一步消融研究证明了我们所提出的组件的有效性。我们的实施将公开提供。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to the poor generalization ability, which limits the real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle such domain shift in various visual tasks. Specifically, SHADE is constructed based on two consistency constraints, Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the source situations and encourages the model to learn consistent representation across style-diversified samples. RC leverages general visual knowledge to prevent the model from overfitting to source data and thus largely keeps the representation consistent between the source and general visual models. Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning. SHM selects basis styles from the source distribution, enabling the model to dynamically generate diverse and realistic samples during training. Extensive experiments demonstrate that our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection, with different models, i.e., ConvNets and Transformer.
translated by 谷歌翻译
Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle in the face of uncertainty for exploration. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models. Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings. Empirically, we demonstrate superior performance over current state-of-the-art algorithms across various benchmarks.
translated by 谷歌翻译
Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.
translated by 谷歌翻译
Semantic segmentation in 3D indoor scenes has achieved remarkable performance under the supervision of large-scale annotated data. However, previous works rely on the assumption that the training and testing data are of the same distribution, which may suffer from performance degradation when evaluated on the out-of-distribution scenes. To alleviate the annotation cost and the performance degradation, this paper introduces the synthetic-to-real domain generalization setting to this task. Specifically, the domain gap between synthetic and real-world point cloud data mainly lies in the different layouts and point patterns. To address these problems, we first propose a clustering instance mix (CINMix) augmentation technique to diversify the layouts of the source data. In addition, we augment the point patterns of the source data and introduce non-parametric multi-prototypes to ameliorate the intra-class variance enlarged by the augmented point patterns. The multi-prototypes can model the intra-class variance and rectify the global classifier in both training and inference stages. Experiments on the synthetic-to-real benchmark demonstrate that both CINMix and multi-prototypes can narrow the distribution gap and thus improve the generalization ability on real-world datasets.
translated by 谷歌翻译
Open world object detection aims at detecting objects that are absent in the object classes of the training data as unknown objects without explicit supervision. Furthermore, the exact classes of the unknown objects must be identified without catastrophic forgetting of the previous known classes when the corresponding annotations of unknown objects are given incrementally. In this paper, we propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR. In the first stage, we pre-train a model on the current annotated data to detect objects from the current known classes, and concurrently train an additional binary classifier to classify predictions into foreground or background classes. This helps the model to build an unbiased feature representations that can facilitate the detection of unknown classes in subsequent process. In the second stage, we fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint. Furthermore, we alleviate catastrophic forgetting when the annotations of the unknown classes becomes available incrementally by using knowledge distillation and exemplar replay. Experimental results on PASCAL VOC and MS-COCO show that our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
translated by 谷歌翻译