现有的少量学习(FSL)方法依赖于具有大型标记数据集的培训,从而阻止它们利用丰富的未标记数据。从信息理论的角度来看,我们提出了一种有效的无监督的FSL方法,并以自学意义进行学习表示。遵循信息原理,我们的方法通过捕获数据的内在结构来学习全面的表示。具体而言,我们以低偏置的MI估计量来最大化实例及其表示的相互信息(MI),以执行自我监督的预训练。我们的自我监督模型对所见类别的可区分特征的监督预训练没有针对可见的阶级的偏见,从而对看不见的类别进行了更好的概括。我们解释说,受监督的预训练和自我监督的预训练实际上正在最大化不同的MI目标。进一步进行了广泛的实验,以通过各种训练环境分析其FSL性能。令人惊讶的是,结果表明,在适当条件下,自我监管的预训练可以优于监督预训练。与最先进的FSL方法相比,我们的方法在没有基本类别的任何标签的情况下,在广泛使用的FSL基准上实现了可比的性能。
translated by 谷歌翻译
大量网络视频的杠杆作用以及搜索的查询或周围文本(例如标题)提供了一种经济且可扩展的替代方案,可用于监督视频表示学习。然而,由于查询多义(即查询的许多可能的含义)和文本同构(即不同文本的相同句法结构),对这种弱视文的连接进行建模并不是微不足道的。在本文中,我们介绍了查询和文本之间相互校准的新设计,以增强弱监督视频表示的学习。具体而言,我们提出了双重校准网络(BCN),这些网络在新颖地融合了两个校准,以学习从文本到查询的修正案,反之亦然。从技术上讲,BCN在通过相同查询搜索的视频的所有标题上执行聚类,并将每个集群的质心作为文本原型。查询词汇直接建立在查询单词上。对文本原型/查询词汇的视频对文本/视频对话预测,然后启动文本或查询到文本校准,以估算修正案以查询或文本。我们还设计了一个选择方案来平衡两个校正。两个大规模的网络视频数据集与查询和每个视频的标题配对,新收集到弱监督视频表示的学习中,分别命名为Yovo-3M和Yovo-10m。 BCN在3M Web视频上学习的视频功能在下游任务的线性模型协议下获得了卓越的结果。更值得注意的是,BCN在较大的10m网络视频中培训,进一步的微调导致1.6%,而动力学400的TOP-1准确性获得1.8%,而在最先进的情况下,一些v2数据集的v2数据集则是1.6%。 - ART TDN和ImageNet预训练的动作网方法。源代码和数据集可在\ url {https://github.com/fuchenustc/bcn}上获得。
translated by 谷歌翻译
近年来,在实际场景中,单图(SID)引起了人们的关注。由于难以获得真实世界/清洁图像对,因此以前的真实数据集遭受了低分辨率图像,均匀的雨条,背景变化有限,甚至对图像对的不对准,从而对SID方法进行了不可思议的评估。为了解决这些问题,我们建立了一个名为Realrain-1K的新的高质量数据集,该数据集分别由1,120美元的高分辨率配对的清洁和高雨图像组成,分别具有低密度和高密度降雨条纹。 Realrain-1K中的图像是通过简单而有效的降雨密度可控制的过滤方法自动从大量现实世界中的雨滴剪辑中生成结盟。 Realrain-1K还提供丰富的雨条层作为副产品,使我们能够通过将雨条层粘贴在丰富的自然图像上,从而构建一个名为Synrain-13K的大规模合成数据集。基于它们和现有数据集,我们在三个曲目上基准了10种代表性的SID方法:(1)对Realrain-1K的全面监督学习,(2)域对真实数据集进行概括,以及(3)SYN-to-eal Toth-to to real Transvers Learning 。实验结果(1)显示了图像恢复性能和模型复杂性中代表性方法的差异,(2)验证所提出的数据集在模型概括中的重要性,(3)提供了有关从不同领域和从不同领域和学习的优越性的有用见解。关于现实世界中SID的未来研究的灯光。数据集将在https://github.com/hiker-lw/realrain-1k上发布
translated by 谷歌翻译
主流最先进的域泛化算法倾向于优先考虑跨域语义不变性的假设。同时,固有的域内风格不变性通常被低估并放在架子上。在本文中,我们揭示了利用域内风格的不变性,在提高域泛化效率方面也具有关键重要性。我们验证了网络对域功能不变并在实例之间共享的内容至关重要,以便网络锐化其理解并提高其语义判别能力。相应地,我们还提出了一种新颖的“陪审团”机制,在域之间学习有用的语义特征共性特别有效。我们的完整型号称为Steam可以被解释为新颖的概率图形模型,该图形模型需要方便的两种内存库的方便结构:语义特征银行和风格的功能库。经验结果表明,我们的拟议框架通过清晰的边缘超越了最先进的方法。
translated by 谷歌翻译
自我监督的学习(SSL)最近成为特征学习方法中的最爱。因此,它可以吸引域适应方法来考虑结合SSL。直觉是强制执行实例级别一致性,使得预测器在域中变得不变。但是,域适应制度中的大多数现有SSL方法通常被视为独立的辅助组件,使域自适应的签名无人看管。实际上,域间隙消失的最佳区域和SSL PERUSES的实例级别约束可能根本不一致。从这一点来看,我们向一个特定的范式的自我监督学习量身定制,用于域适应,即可转让的对比学习(TCL),这与SSL和所需的跨域转移性相一致地联系起来。我们发现对比学习本质上是一个合适的域适应候选者,因为它的实例不变性假设可以方便地促进由域适应任务青睐的跨域类级不变性。基于特定的记忆库结构和伪标签策略,TCL然后通过清洁和新的对比损失来惩罚源头和靶之间的跨域内域差异。免费午餐是由于纳入对比学习,TCL依赖于移动平均的关键编码器,自然地实现了用于目标数据的伪标签的暂停标签,这避免了无额外的成本。因此,TCL有效地减少了跨域间隙。通过对基准(Office-Home,Visda-2017,Diamet-Five,PACS和Domainnet)进行广泛的实验,用于单源和多源域适配任务,TCL已经证明了最先进的性能。
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
translated by 谷歌翻译
Video semantic segmentation (VSS) is beneficial for dealing with dynamic scenes due to the continuous property of the real-world environment. On the one hand, some methods alleviate the predicted inconsistent problem between continuous frames. On the other hand, other methods employ the previous frame as the prior information to assist in segmenting the current frame. Although the previous methods achieve superior performances on the independent and identically distributed (i.i.d) data, they can not generalize well on other unseen domains. Thus, we explore a new task, the video generalizable semantic segmentation (VGSS) task that considers both continuous frames and domain generalization. In this paper, we propose a class-wise non-salient region generalized (CNSG) framework for the VGSS task. Concretely, we first define the class-wise non-salient feature, which describes features of the class-wise non-salient region that carry more generalizable information. Then, we propose a class-wise non-salient feature reasoning strategy to select and enhance the most generalized channels adaptively. Finally, we propose an inter-frame non-salient centroid alignment loss to alleviate the predicted inconsistent problem in the VGSS task. We also extend our video-based framework to the image-based generalizable semantic segmentation (IGSS) task. Experiments demonstrate that our CNSG framework yields significant improvement in the VGSS and IGSS tasks.
translated by 谷歌翻译