在本文中,我们为RSI(名为Superyolo)提出了一种准确而快速的小对象检测方法,该方法融合了多模式数据并通过利用辅助超级分辨率(SR)学习并考虑既有辅助的超级分辨率(SR)对象进行高分辨率(HR)对象检测检测准确性和计算成本。首先,我们通过删除焦点模块来保持人力资源特征并显着克服小物体缺失的误差来构建紧凑的基线。其次,我们利用像素级的多模式融合(MF)从各种数据中提取信息,以促进RSI中的小物体更合适和有效的功能。此外,我们设计了一个简单且灵活的SR分支来学习HR特征表示,可以区分具有低分辨率(LR)输入的庞大背景的小物体,从而进一步提高了检测准确性。此外,为避免引入其他计算,SR分支在推理阶段被丢弃,并且由于LR输入而减少了网络模型的计算。实验结果表明,在广泛使用的Vedai RS数据集上,Superyolo的精度为73.61%(在MAP50方面),比SOTA大型模型(例如Yolov5L,Yolov5X和RS设计的Yolors)高10%以上。同时,Superyolo的Gfolps和参数大小比Yolov5X少约18.1倍,4.2倍。我们提出的模型显示出与最新模型相比,具有良好的准确性速度权衡。该代码将在https://github.com/icey-zhang/superyolo上开放。
translated by 谷歌翻译
多模式情感分析和抑郁估计是两个重要的研究主题,旨在使用多模式数据预测人类精神状态。先前的研究重点是制定有效的融合策略,以交换和整合不同模式的与思想有关的信息。一些基于MLP的技术最近在各种计算机视觉任务中取得了巨大的成功。受到这一点的启发,我们探索了本研究中具有混合视角的多模式方法。为此,我们介绍了完全基于MLP的多模式特征处理框架CubeMLP。 CUBEMLP由三个独立的MLP单元组成,每个单元都有两个仿射转换。 CUBEMLP接受所有相关的模态特征作为输入,并在三个轴上混合它们。使用CubeMLP提取特性后,将混合的多模式特征扁平以进行任务预测。我们的实验是在情感分析数据集上进行的:CMU-MOSI和CMU-MOSEI,以及抑郁估计数据集:AVEC2019。结果表明,CUBEMLP可以以低得多的计算成本来实现最先进的性能。
translated by 谷歌翻译
及时调整是将预训练的语言模型调整为下游任务的一种新兴方法。但是,现有的研究主要是为输入序列增加提示。由于中间多头自我注意和馈送网络计算,因此这种方式无法正常工作,从而使模型优化不是很好。因此,我们提出了一种称为“图层调整”的新颖调整方式,旨在在变压器层中添加可学习的参数。具体而言,我们专注于变压器中的馈电网络的图层调整,即FLANing。它将其他单元引入每个馈送网络的隐藏层。我们对公共线索基准进行了广泛的实验。结果表明:1)在几乎所有情况下,我们的FL-tuning tospormports促进了全数据和少量设置下的调整方法。特别是,它在WSC 1.0上的准确性提高了17.93%(全数据设置),而F1上的精度则提高了P-Tuning V2上的Cluener上的精度(几乎没有射击设置)。 2)我们的FL-调整更稳定,收敛速度比P-Tuning V2快约1.17倍。 3)只有大约3%的变压器参数要训练,因此在大多数数据集中进行了微调,并且在几个数据集上的微调(例如,WSC 1.1上的准确性提高了12.9%)。源代码可从https://github.com/genggui001/fl-tuning获得。
translated by 谷歌翻译
知识嵌入(KE)通过将实体和关系嵌入连续的向量空间来表示知识图(kg)。现有方法主要基于结构或基于描述。基于结构的方法学习保留KGS固有结构的表示。它们不能很好地代表具有有限结构信息的现实世界中的丰富长尾实体。基于描述的方法利用文本信息和语言模型。朝这个方向迈出的先前方法几乎不能胜过基于结构的结构,并且遇到了昂贵的负面抽样和限制性描述需求等问题。在本文中,我们提出了LMKE,该LMKE采用语言模型来得出知识嵌入,旨在既富集了长尾实体的表示形式又旨在解决先前的基于描述的方法的问题。我们通过对比度学习框架制定基于描述的KE学习,以提高培训和评估的效率。实验结果表明,LMKE在链接预测和三重分类的KE基准上实现了最先进的性能,尤其是对于长尾实体。
translated by 谷歌翻译
神经MWP求解器很难处理小型本地差异。在MWP任务中,一些本地更改节省原始语义,而其他本地更改可能完全更改底层逻辑。目前,MWP任务的现有数据集包含有限的样本,这些样本是神经模型的关键,用于学会消除问题的不同类型的差异并正确解决问题。在本文中,我们提出了一套新型数据增强方法,可以通过不同类型的局部差异增强此类数据来补充现有数据集,并有助于提高当前神经模型的泛化能力。新样本由知识导向实体替换,逻辑引导问题重组产生。确保增强方法保持新数据与其标签之间的一致性。实验结果表明了我们方法的必要性和有效性。
translated by 谷歌翻译
Geometric camera calibration is often required for applications that understand the perspective of the image. We propose perspective fields as a representation that models the local perspective properties of an image. Perspective Fields contain per-pixel information about the camera view, parameterized as an up vector and a latitude value. This representation has a number of advantages as it makes minimal assumptions about the camera model and is invariant or equivariant to common image editing operations like cropping, warping, and rotation. It is also more interpretable and aligned with human perception. We train a neural network to predict Perspective Fields and the predicted Perspective Fields can be converted to calibration parameters easily. We demonstrate the robustness of our approach under various scenarios compared with camera calibration-based methods and show example applications in image compositing.
translated by 谷歌翻译
Open world object detection aims at detecting objects that are absent in the object classes of the training data as unknown objects without explicit supervision. Furthermore, the exact classes of the unknown objects must be identified without catastrophic forgetting of the previous known classes when the corresponding annotations of unknown objects are given incrementally. In this paper, we propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR. In the first stage, we pre-train a model on the current annotated data to detect objects from the current known classes, and concurrently train an additional binary classifier to classify predictions into foreground or background classes. This helps the model to build an unbiased feature representations that can facilitate the detection of unknown classes in subsequent process. In the second stage, we fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint. Furthermore, we alleviate catastrophic forgetting when the annotations of the unknown classes becomes available incrementally by using knowledge distillation and exemplar replay. Experimental results on PASCAL VOC and MS-COCO show that our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
translated by 谷歌翻译
Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works often use hand-crafted features and predefined rule-based functions to detect headings and resolve the hierarchical relationship between headings. Both the benchmark and research based on deep learning are still limited. Accordingly, in this paper, we first introduce a standard dataset, HierDoc, including image samples from 650 documents of scientific papers with their content labels. Then we propose a novel end-to-end model by using the multimodal tree decoder (MTD) for ToC as a benchmark for HierDoc. The MTD model is mainly composed of three parts, namely encoder, classifier, and decoder. The encoder fuses the multimodality features of vision, text, and layout information for each entity of the document. Then the classifier recognizes and selects the heading entities. Next, to parse the hierarchical relationship between the heading entities, a tree-structured decoder is designed. To evaluate the performance, both the metric of tree-edit-distance similarity (TEDS) and F1-Measure are adopted. Finally, our MTD approach achieves an average TEDS of 87.2% and an average F1-Measure of 88.1% on the test set of HierDoc. The code and dataset will be released at: https://github.com/Pengfei-Hu/MTD.
translated by 谷歌翻译
Recent aerial object detection models rely on a large amount of labeled training data, which requires unaffordable manual labeling costs in large aerial scenes with dense objects. Active learning is effective in reducing the data labeling cost by selectively querying the informative and representative unlabelled samples. However, existing active learning methods are mainly with class-balanced setting and image-based querying for generic object detection tasks, which are less applicable to aerial object detection scenario due to the long-tailed class distribution and dense small objects in aerial scenes. In this paper, we propose a novel active learning method for cost-effective aerial object detection. Specifically, both object-level and image-level informativeness are considered in the object selection to refrain from redundant and myopic querying. Besides, an easy-to-use class-balancing criterion is incorporated to favor the minority objects to alleviate the long-tailed class distribution problem in model training. To fully utilize the queried information, we further devise a training loss to mine the latent knowledge in the undiscovered image regions. Extensive experiments are conducted on the DOTA-v1.0 and DOTA-v2.0 benchmarks to validate the effectiveness of the proposed method. The results show that it can save more than 75% of the labeling cost to reach the same performance compared to the baselines and state-of-the-art active object detection methods. Code is available at https://github.com/ZJW700/MUS-CDB
translated by 谷歌翻译
A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.
translated by 谷歌翻译