We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests). Experiments over two annotation tasks suggest that POTATO improves labeling speed through its specially-designed productivity features, especially for long documents and complex tasks. POTATO is available at https://github.com/davidjurgens/potato and will continue to be updated.
translated by 谷歌翻译
Stairs are common building structures in urban environment, and stair detection is an important part of environment perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a neural network architecture with inputs of both RGB map and depth map. Specifically, we design the selective module which can make the network learn the complementary relationship between RGB map and depth map and effectively combine the information from RGB map and depth map in different scenes. In addition, we also design a line clustering algorithm for the post-processing of detection results, which can make full use of the detection results to obtain the geometric parameters of stairs. Experiments on our dataset show that our method can achieve better accuracy and recall compared with the previous state-of-the-art deep learning method, which are 5.64% and 7.97%, respectively. Our method also has extremely fast detection speed, and a lightweight version can achieve 300 + frames per second with the same resolution, which can meet the needs of most real-time detection scenes.
translated by 谷歌翻译
Graph anomaly detection (GAD) is a vital task in graph-based machine learning and has been widely applied in many real-world applications. The primary goal of GAD is to capture anomalous nodes from graph datasets, which evidently deviate from the majority of nodes. Recent methods have paid attention to various scales of contrastive strategies for GAD, i.e., node-subgraph and node-node contrasts. However, they neglect the subgraph-subgraph comparison information which the normal and abnormal subgraph pairs behave differently in terms of embeddings and structures in GAD, resulting in sub-optimal task performance. In this paper, we fulfill the above idea in the proposed multi-view multi-scale contrastive learning framework with subgraph-subgraph contrast for the first practice. To be specific, we regard the original input graph as the first view and generate the second view by graph augmentation with edge modifications. With the guidance of maximizing the similarity of the subgraph pairs, the proposed subgraph-subgraph contrast contributes to more robust subgraph embeddings despite of the structure variation. Moreover, the introduced subgraph-subgraph contrast cooperates well with the widely-adopted node-subgraph and node-node contrastive counterparts for mutual GAD performance promotions. Besides, we also conduct sufficient experiments to investigate the impact of different graph augmentation approaches on detection performance. The comprehensive experimental results well demonstrate the superiority of our method compared with the state-of-the-art approaches and the effectiveness of the multi-view subgraph pair contrastive strategy for the GAD task.
translated by 谷歌翻译
Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).
translated by 谷歌翻译
Deep convolutional neural networks have proven their effectiveness, and have been acknowledged as the most dominant method for image classification. However, a severe drawback of deep convolutional neural networks is poor explainability. Unfortunately, in many real-world applications, users need to understand the rationale behind the predictions of deep convolutional neural networks when determining whether they should trust the predictions or not. To resolve this issue, a novel genetic algorithm-based method is proposed for the first time to automatically evolve local explanations that can assist users to assess the rationality of the predictions. Furthermore, the proposed method is model-agnostic, i.e., it can be utilised to explain any deep convolutional neural network models. In the experiments, ResNet is used as an example model to be explained, and the ImageNet dataset is selected as the benchmark dataset. DenseNet and MobileNet are further explained to demonstrate the model-agnostic characteristic of the proposed method. The evolved local explanations on four images, randomly selected from ImageNet, are presented, which show that the evolved local explanations are straightforward to be recognised by humans. Moreover, the evolved explanations can explain the predictions of deep convolutional neural networks on all four images very well by successfully capturing meaningful interpretable features of the sample images. Further analysis based on the 30 runs of the experiments exhibits that the evolved local explanations can also improve the probabilities/confidences of the deep convolutional neural network models in making the predictions. The proposed method can obtain local explanations within one minute, which is more than ten times faster than LIME (the state-of-the-art method).
translated by 谷歌翻译
Recent advances in neural approaches greatly improve task-oriented dialogue (TOD) systems which assist users to accomplish their goals. However, such systems rely on costly manually labeled dialogs which are not available in practical scenarios. In this paper, we present our models for Track 2 of the SereTOD 2022 challenge, which is the first challenge of building semi-supervised and reinforced TOD systems on a large-scale real-world Chinese TOD dataset MobileCS. We build a knowledge-grounded dialog model to formulate dialog history and local KB as input and predict the system response. And we perform semi-supervised pre-training both on the labeled and unlabeled data. Our system achieves the first place both in the automatic evaluation and human interaction, especially with higher BLEU (+7.64) and Success (+13.6\%) than the second place.
translated by 谷歌翻译
图像文本聚类(ITC)的目标是通过整合这些异质样品的多模式的互补和一致信息来找到正确的簇。但是,目前的大多数研究都根据理想的前提分析了ITC,即每种模式中的样本都是完整的。但是,在现实情况下,这种推定并不总是有效的。缺少的数据问题使图像文本特征学习性能退化,并最终会影响ITC任务中的概括能力。尽管已经提出了一系列方法来解决此不完整的图像文本群集问题(IITC),但仍然存在以下问题:1)大多数现有方法几乎不考虑异质特征域之间的明显差距。 2)对于缺少数据,很少保证由现有方法生成的表示形式适合聚类任务。 3)现有方法不利用内部和内部模式的潜在连接。在本文中,我们提出了一个聚类引起的生成不完整的图像文本聚类(CIGIT-C)网络,以应对上述挑战。更具体地说,我们首先使用特定于模态的编码器将原始功能映射到更独特的子空间。通过使用对抗生成网络在另一种模态上产生一种方式,可以彻底探索内部内部和模式之间的潜在连接。最后,我们使用两个KL DiverGence损失更新相应的模态特异性编码器。公共图像文本数据集的实验结果表明,建议的方法优于IITC作业更有效。
translated by 谷歌翻译
深度学习取得了长足的进步,用于图像中的对象检测。对象检测的检测准确性和计算成本取决于图像的空间分辨率,这可能会受到相机和存储注意事项的约束。压缩通常是通过减少空间或幅度分辨率或有时两者都对性能的众所周知的影响来实现的。检测精度还取决于感兴趣的对象与摄像机的距离。我们的工作研究了空间和振幅分辨率以及对象距离对物体检测准确性和计算成本的影响。我们开发了Yolov5(ra-Yolo)的分辨率 - 自适应变体,该变体基于输入图像的空间分辨率,它在特征金字塔和检测头中变化。为了训练和评估这种新方法,我们通过结合TJU和Eurocity数据集的图像来创建具有不同空间和振幅分辨率的图像数据集,并通过应用空间调整和压缩来生成不同的分辨率。我们首先表明Ra-Yolo在各种空间分辨率上实现了检测准确性和推理时间之间的良好权衡。然后,我们使用拟议的RA-YOLO模型评估空间和振幅分辨率对物体检测准确性的影响。我们证明,导致最高检测精度的最佳空间分辨率取决于“耐受性”图像大小。我们进一步评估了对象到摄像机对检测准确性的影响,并表明较高的空间分辨率可实现更大的检测范围。这些结果为选择图像空间分辨率和压缩设置提供了重要的指南,这些分辨率和压缩设置基于可用的带宽,存储,所需的推理时间和/或所需的检测范围,在实际应用中。
translated by 谷歌翻译
美国的意识形态分裂在日常交流中变得越来越突出。因此,关于政治两极分化的许多研究,包括最近采取计算观点的许多努力。通过检测文本语料库中的政治偏见,可以尝试描述和辨别该文本的两极分性。从直觉上讲,命名的实体(即,用作名词的名词和短语)和文本中的标签经常带有有关政治观点的信息。例如,使用“支持选择”一词的人可能是自由的,而使用“亲生生命”一词的人可能是保守的。在本文中,我们试图揭示社交媒体文本数据中的政治极性,并通过将极性得分分配给实体和标签来量化这些极性。尽管这个想法很简单,但很难以可信赖的定量方式进行这种推论。关键挑战包括少数已知标签,连续的政治观点,以及在嵌入单词媒介中的极性得分和极性中性语义含义的保存。为了克服这些挑战,我们提出了极性感知的嵌入多任务学习(PEM)模型。该模型包括(1)自制的上下文保护任务,(2)基于注意力的推文级别的极性推导任务,以及(3)对抗性学习任务,可促进嵌入式的极性维度及其语义之间的独立性方面。我们的实验结果表明,我们的PEM模型可以成功学习极性感知的嵌入。我们检查了各种应用,从而证明了PEM模型的有效性。我们还讨论了我们的工作的重要局限性,并在将PEM模型应用于现实世界情景时的压力谨慎。
translated by 谷歌翻译
室外(OOD)检测是面向任务的对话框系统中的关键组件,旨在确定查询是否不在预定义的支持的意图集之外。事实证明,先前基于软磁性的检测算法对OOD样品被过度自信。在本文中,我们分析了过度自信的OOD来自由于训练和测试分布之间的不匹配而导致的分布不确定性,这使得该模型无法自信地做出预测,因此可能导致异常软磁得分。我们提出了一个贝叶斯OOD检测框架,以使用Monte-Carlo辍学来校准分布不确定性。我们的方法是灵活的,并且可以轻松地插入现有的基于软磁性的基线和增益33.33 \%OOD F1改进,而与MSP相比仅增加了0.41 \%的推理时间。进一步的分析表明,贝叶斯学习对OOD检测的有效性。
translated by 谷歌翻译