We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of taskspecific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-toimage predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/ lorenmt/mtan.
translated by 谷歌翻译
我们提出了一个统一的查看,即通过通用表示,一个深层神经网络共同学习多个视觉任务和视觉域。同时学习多个问题涉及最大程度地减少具有不同幅度和特征的多个损失函数的加权总和,从而导致一个损失的不平衡状态,与学习每个问题的单独模型相比,一个损失的不平衡状态主导了优化和差的结果。为此,我们提出了通过小容量适配器将多个任务/特定于域网络的知识提炼到单个深神经网络中的知识。我们严格地表明,通用表示在学习NYU-V2和CityScapes中多个密集的预测问题方面实现了最新的表现,来自视觉Decathlon数据集中的不同域中的多个图像分类问题以及MetadataSet中的跨域中的几个域中学习。最后,我们还通过消融和定性研究进行多次分析。
translated by 谷歌翻译
多任务学习(MTL)范式着重于共同学习两个或多个任务,旨在重大改进W.R.T模型的通用性,性能和培训/推理记忆足迹。对于与视觉相关的{\ bf密集}的预测任务的联合培训,上述好处是必不可少的。在这项工作中,我们解决了两个密集任务的MTL问题,即\ ie,语义细分和深度估计,并提出了一个新颖的注意模块,称为跨通道注意模块({CCAM}),可促进沿着每个频道之间的有效特征共享这两个任务,导致相互绩效增长,可训练的参数可忽略不计。然后,我们以一种真正的共生精神,使用称为{affinemix}的预测深度为语义分割任务制定新的数据增强,并使用称为{coloraug}的预测语义进行了简单的深度增强。最后,我们验证了CityScapes数据集上提出的方法的性能增益,这有助于我们基于深度和语义分割的半监督联合模型实现最新结果。
translated by 谷歌翻译
多任务学习(MTL)是深度学习中的一个活动字段,其中我们通过利用任务之间的关系来共同学习多项任务。已经证明,与独立学习每个任务时,MTL有助于该模型共享任务之间的学习功能并增强预测。我们为2任务MTL问题提出了一个新的学习框架,它使用一个任务的预测作为另一个网络的输入来预测其他任务。我们定义了由循环一致性损失和对比学习,对齐和跨任务一致性损失的两个新的损失术语。这两个损耗都旨在实施模型以对准多个任务的预测,以便模型一致地预测。理论上我们证明,两次损失都帮助模型更有效地学习,并且在与直接预测的对齐方面更好地了解跨任务一致性损失。实验结果还表明,我们的拟议模型在基准城市景观和NYU数据集上实现了显着性能。
translated by 谷歌翻译
多任务学习最近已成为对复杂场景的全面理解的有前途的解决方案。不仅具有适当设计的记忆效率,多任务模型都可以跨任务交换互补信号。在这项工作中,我们共同解决了2D语义分割,以及两个与几何相关的任务,即密集的深度,表面正常估计以及边缘估计,显示了它们对室内和室外数据集的好处。我们提出了一种新颖的多任务学习体系结构,该体系结构通过相关引导的注意力和自我注意力来利用配对的交叉任务交换,以增强所有任务的平均表示学习。我们考虑了三个多任务设置的广泛实验,与合成基准和真实基准中的竞争基准相比,我们的提案的好处。我们还将方法扩展到新型的多任务无监督域的适应设置。我们的代码可在https://github.com/cv-rits/densemtl上找到。
translated by 谷歌翻译
Numerous deep learning applications benefit from multitask learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.
translated by 谷歌翻译
尽管最近的密集预测问题的多任务学习的进步,但大多数方法都依赖于昂贵的标记数据集。在本文中,我们介绍了一个标签有效的方法,并在部分注释的数据上关注多密集预测任务,我们称之为多任务部分监督学习。我们提出了一种多任务培训程序,该程序成功利用任务关系在数据部分注释时监督其多任务学习。特别地,我们学会将每个任务对映射到联合成对任务空间,这使得通过在任务对上的另一个网络通过另一个网络以计算有效的方式共享信息,并通过保留高级信息来避免学习琐碎的交叉任务关系关于输入图像。我们严格证明,我们的提出方法有效利用了未标记的任务的图像,并且在三个标准基准测试中优于现有的半监督学习方法和相关方法。
translated by 谷歌翻译
多任务学习(MTL)在各种领域取得了巨大的成功,但是如何平衡不同的任务以避免负面影响仍然是一个关键问题。为实现任务平衡,存在许多有效的工作来平衡任务丢失或渐变。在本文中,我们统一了八个代表性的任务平衡方法,从损失加权的角度统一,并提供一致的实验比较。此外,我们令人惊讶地发现,培训具有从分配中采样的随机重量的MTL模型可以实现与最先进的基线相比的性能。基于此发现,我们提出了一种称为随机损失加权(RLW)的简单且有效的加权策略,其可以仅在现有工作中仅​​在一个附加的代码中实现。从理论上讲,我们分析了RLW的融合,并揭示了RLW的概率比具有固定任务权重的现有模型逃脱局部最小值,从而产生更好的概括能力。经验上,我们在六个图像数据集中广泛评估了所提出的RLW方法,以及来自Xtreme基准测试的四个多语言任务,以显示与最先进的策略相比所提出的RLW战略的有效性。
translated by 谷歌翻译
Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multitask learning. Specifically, we propose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.
translated by 谷歌翻译
多任务学习(MTL)通过在任务之间共享参数共同学习一组任务。这是降低存储成本的一种有希望的方法,同时提高许多计算机视觉任务的任务准确性。 MTL的有效采用面临两个主要挑战。第一个挑战是确定在任务中共享哪些参数,以优化内存效率和任务准确性。第二个挑战是在不需要耗时的手动重新实现和重要的域专业知识的情况下自动将MTL算法应用于任意CNN主链。本文通过开发第一个编程框架AutoMTL来应对挑战,该框架自动化有效的MTL模型开发为视觉任务。 AUTOMTL作为输入作为任意的骨干卷积神经网络(CNN)以及一组学习的任务,并自动生成一个多任务模型,该模型同时实现了高精度和较小的记忆足迹。在三个流行的MTL基准测试(CityScapes,NYUV2,Tiny-Taskonomy)上进行的实验证明了AutoMTL对最先进方法的有效性以及在CNN跨CNN的AutoMTL的普遍性。 AutOmtl是开源的,可在https://github.com/zhanglijun95/automtl上找到。
translated by 谷歌翻译
Semantic segmentation is a classic computer vision problem dedicated to labeling each pixel with its corresponding category. As a basic task for advanced tasks such as industrial quality inspection, remote sensing information extraction, medical diagnostic aid, and autonomous driving, semantic segmentation has been developed for a long time in combination with deep learning, and a lot of works have been accumulated. However, neither the classic FCN-based works nor the popular Transformer-based works have attained fine-grained localization of pixel labels, which remains the main challenge in this field. Recently, with the popularity of autonomous driving, the segmentation of road scenes has received increasing attention. Based on the cross-task consistency theory, we incorporate edge priors into semantic segmentation tasks to obtain better results. The main contribution is that we provide a model-agnostic method that improves the accuracy of semantic segmentation models with zero extra inference runtime overhead, verified on the datasets of road and non-road scenes. From our experimental results, our method can effectively improve semantic segmentation accuracy.
translated by 谷歌翻译
编码器 - 解码器模型已广泛用于RGBD语义分割,并且大多数通过双流网络设计。通常,共同推理RGBD的颜色和几何信息是有益的对语义分割。然而,大多数现有方法都无法全面地利用编码器和解码器中的多模式信息。在本文中,我们提出了一种用于RGBD语义细分的新型关注的双重监督解码器。在编码器中,我们设计一个简单但有效的关注的多模式融合模块,以提取和保险丝深度多级成对的互补信息。要了解更强大的深度表示和丰富的多模态信息,我们介绍了一个双分支解码器,以有效利用不同任务的相关性和互补线。在Nyudv2和Sun-RGBD数据集上的广泛实验表明,我们的方法达到了最先进的方法的卓越性能。
translated by 谷歌翻译
This work proposes Multi-task Meta Learning (MTML), integrating two learning paradigms Multi-Task Learning (MTL) and meta learning, to bring together the best of both worlds. In particular, it focuses simultaneous learning of multiple tasks, an element of MTL and promptly adapting to new tasks with fewer data, a quality of meta learning. It is important to highlight that we focus on heterogeneous tasks, which are of distinct kind, in contrast to typically considered homogeneous tasks (e.g., if all tasks are classification or if all tasks are regression tasks). The fundamental idea is to train a multi-task model, such that when an unseen task is introduced, it can learn in fewer steps whilst offering a performance at least as good as conventional single task learning on the new task or inclusion within the MTL. By conducting various experiments, we demonstrate this paradigm on two datasets and four tasks: NYU-v2 and the taskonomy dataset for which we perform semantic segmentation, depth estimation, surface normal estimation, and edge detection. MTML achieves state-of-the-art results for most of the tasks. Although semantic segmentation suffers quantitatively, our MTML method learns to identify segmentation classes absent in the pseudo labelled ground truth of the taskonomy dataset.
translated by 谷歌翻译
There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to perform well uniformly.
translated by 谷歌翻译
单眼深度估计和语义分割是场景理解的两个基本目标。由于任务交互的优点,许多作品研究了联合任务学习算法。但是,大多数现有方法都无法充分利用语义标签,忽略提供的上下文结构,并且仅使用它们来监督分段拆分的预测,这限制了两个任务的性能。在本文中,我们提出了一个网络注入了上下文信息(CI-Net)来解决问题。具体而言,我们在编码器中引入自我关注块以产生注意图。通过由语义标签创建的理想注意图的监督,网络嵌入了上下文信息,使得它可以更好地理解场景并利用相关特征来进行准确的预测。此外,构造了一个特征共享模块,以使任务特征深入融合,并且设计了一致性损耗,以使特征相互引导。我们在NYU-Deaft-V2和Sun-RGBD数据集上评估所提出的CI-Net。实验结果验证了我们所提出的CI-Net可以有效提高语义分割和深度估计的准确性。
translated by 谷歌翻译
多任务密集的场景理解是一个蓬勃发展的研究领域,需要同时对与像素预测的一系列相关任务进行推理。由于卷积操作的大量利用,大多数现有作品都会遇到当地建模的严重限制,而在全球空间位置和多任务背景中学习相互作用和推断对于此问题至关重要。在本文中,我们提出了一种新颖的端到端倒立金字塔多任务变压器(Invpt),以在统一框架中对空间位置和多个任务进行同时建模。据我们所知,这是探索设计变压器结构的第一项工作,以用于多任务密集的预测以进行场景理解。此外,人们广泛证明,较高的空间分辨率对密集的预测非常有益,而对于现有的变压器来说,由于对大空间大小的巨大复杂性,现有变形金刚更深入地采用更高的分辨率。 Invpt提出了一个有效的上移动器块,以逐渐增加分辨率学习多任务特征交互,这还结合了有效的自我发言消息传递和多规模特征聚合,以高分辨率产生特定于任务的预测。我们的方法分别在NYUD-V2和PASCAL-CONTEXT数据集上实现了卓越的多任务性能,并且显着优于先前的最先前。该代码可在https://github.com/prismformore/invpt上获得
translated by 谷歌翻译
多任务学习是通过在任务中传输和利用共同知识来提高模型的性能。现有的MTL主要关注多个任务(MTS)之间标签集的场景通常是相同的,因此它们可以用于跨任务学习。虽然几乎罕见的作品探索了每个任务只有少量训练样本的情况,而其标签集只是部分重叠甚至不是。由于这些任务之间可用的相关信息,学习此类MTS更具挑战性。为此,我们提出了一个框架来通过共同利用来自学习的辅助大任务的大量信息,以足够多的类来涵盖所有这些任务的富力信息以及在部分重叠的任务中共享的信息。在我们实现使用所学习辅助任务的相同神经网络架构来学习各个任务的情况下,关键的想法是利用可用的标签信息来自适应地修剪辅助网络的隐藏层神经元,以构造每个任务的相应网络,同时伴随各个任务的联合学习。我们的实验结果表明其与最先进的方法相比其有效性。
translated by 谷歌翻译
探测和火灾中的图像和视频的定位是在应对火灾事故的重要。虽然语义分割方法可以用来表示在图像火像素的位置,他们的预测是局部的,他们往往没有考虑到火图像中的存在,这是在图像标签隐含的全局信息。我们提出了一个卷积神经网络(CNN)联合分类和图像火的分割提高了防火分割的性能。我们使用的空间自注意机制来捕获其使用分类概率作为关注重量的新信道注意模块的像素之间的远程相关性,和。该网络联合训练既分割和分类,从而提高了的单任务的图像分割方法的性能,并提出了防火分割以前的方法。
translated by 谷歌翻译
Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.
translated by 谷歌翻译
跨不同层的特征的聚合信息是密集预测模型的基本操作。尽管表现力有限,但功能级联占主导地位聚合运营的选择。在本文中,我们引入了细分特征聚合(AFA),以融合不同的网络层,具有更具表现力的非线性操作。 AFA利用空间和渠道注意,以计算层激活的加权平均值。灵感来自神经体积渲染,我们将AFA扩展到规模空间渲染(SSR),以执行多尺度预测的后期融合。 AFA适用于各种现有网络设计。我们的实验表明了对挑战性的语义细分基准,包括城市景观,BDD100K和Mapillary Vistas的一致而显着的改进,可忽略不计的计算和参数开销。特别是,AFA改善了深层聚集(DLA)模型在城市景观上的近6%Miou的性能。我们的实验分析表明,AFA学会逐步改进分割地图并改善边界细节,导致新的最先进结果对BSDS500和NYUDV2上的边界检测基准。在http://vis.xyz/pub/dla-afa上提供代码和视频资源。
translated by 谷歌翻译