我们提出了一个新颖的建筑,以实现密集的对应关系。当前的最新方法是基于变压器的方法,它们专注于功能描述符或成本量集合。但是,尽管关节聚集会通过提供一个人(即图像的结构或语义信息)或像素匹配的相似性来提高一个或另一个,但并非两者都聚集,但并非两者都汇总,尽管关节聚集会相互促进。在这项工作中,我们提出了一个基于变压器的新型网络,该网络以利用其互补信息的方式交织了两种形式的聚合。具体而言,我们设计了一个自我发项层,该层利用描述符来消除嘈杂的成本量,并且还利用成本量以促进准确匹配的方式汇总特征。随后的跨意思层执行进一步的聚合,该聚集在图像的描述上,并由早期层的聚合输出有助于。我们通过层次处理进一步提高了性能,在该处理中,更粗糙的聚合指导那些处于优质水平的过程。我们评估了所提出的方法对密集匹配任务的有效性,并在所有主要基准上实现最先进的性能。还提供了广泛的消融研究来验证我们的设计选择。
translated by 谷歌翻译
本文介绍了一个新颖的成本聚合网络,称为变压器(VAT),称为体积聚集,以进行几次分割。变压器的使用可以通过在全球接收场上的自我注意来使相关图的聚集受益。但是,变压器处理的相关图的令牌化可能是有害的,因为令牌边界处的不连续性会降低令牌边缘附近可用的局部环境,并减少电感偏差。为了解决这个问题,我们提出了一个4D卷积的SWIN变压器,在该问题上,高维的SWIN变压器之前是一系列的小内核卷积,这些卷积将局部环境赋予所有像素并引入卷积归纳偏置。另外,我们通过在锥体结构中应用变压器来提高聚合性能,在锥体结构中,在更粗糙的水平上的聚集指导聚集在较好的水平上。然后,在查询的外观嵌入中,在随后的解码器中过滤变压器输出中的噪声。使用此模型,为所有标准基准设置了一个新的最新基准,以几次射击分段设置。结果表明,增值税还达到了语义通信的最先进的性能,而成本汇总也起着核心作用。
translated by 谷歌翻译
我们介绍一个新颖的成本聚合网络,用变压器(VAT)被复制体积聚集,通过使用卷曲和变压器来解决几次拍摄分段任务,以有效地处理查询和支持之间的高维相关映射。具体而言,我们提出了由卷嵌入模块组成的编码器,不仅将相关性图转换为更具易易概要,而且为成本聚合注入一些卷积电感偏置和体积变压器模块。我们的编码器具有金字塔形结构,让较粗糙的级别聚合来指导更精细的水平并强制执行互补匹配分数。然后,我们将输出送入我们的亲和感知解码器以及投影特征映射,以指导分割过程。组合这些组件,我们进行实验以证明所提出的方法的有效性,我们的方法为几次拍摄分割任务中的所有标准基准设置了新的最先进的。此外,我们发现所提出的方法甚至可以在语义对应任务中的标准基准中获得最先进的性能,尽管没有专门为此任务设计。我们还提供广泛的消融研究,以验证我们的建筑选择。培训的权重和代码可用于:https://seokju-cho.github.io/vat/。
translated by 谷歌翻译
在视觉上或在视觉上或语义上相似的图像中建立密集的技术的传统技术集中在设计特定的任务特定匹配之前,这难以模拟。为了克服这一点,最近的基于学习的方法已经试图在大型训练数据上学习模型本身之前的良好匹配。性能改善是明显的,但需要足够的培训数据和密集学习阻碍了他们的适用性。此外,在测试时间中使用固定模型不考虑一对图像可能需要其自身的事实,从而提供有限的性能和未遵守观看图像的较差。在本文中,我们示出了通过仅优化在输入对图像上的未训练匹配网络上,可以捕获特定于图像对特定的。针对密集对应的这种测试时间优化量身定制,我们提出了一个残留的匹配网络和信心感知对比损失,以保证有意义的收敛性。实验表明,我们的框架被称为最先前(DMP)的深度匹配,是竞争力的,甚至优于几何与几何匹配和语义匹配的基准测试的最新学习方法,即使它既不需要大型培训数据也不需要深入学习。通过预先培训的网络,DMP在所有基准上达到最先进的性能。
translated by 谷歌翻译
我们提出了一种新的成本聚合网络,称为成本聚合变压器(CAT),在语义类似的图像之间找到密集的对应关系,其中具有大型类内外观和几何变化构成的额外挑战。成本聚合是匹配任务的一个非常重要的过程,匹配精度取决于其输出的质量。与寻址成本聚集的手工制作或基于CNN的方法相比,缺乏严重变形的鲁棒性或继承了由于接受领域有限而无法区分错误匹配的CNN的限制,猫探讨了初始相关图之间的全球共识一些建筑设计的帮助,使我们能够充分利用自我关注机制。具体地,我们包括外观亲和力建模,以帮助成本聚合过程,以消除嘈杂的初始相关映射并提出多级聚合,以有效地从分层特征表示中捕获不同的语义。然后,我们与交换自我关注技术和残留连接相结合,不仅要强制执行一致的匹配,而且还可以缓解学习过程,我们发现这些结果导致了表观性能提升。我们进行实验,以证明拟议模型在最新方法中的有效性,并提供广泛的消融研究。代码和培训的型号可以在https://github.com/sunghwanhong/cats提供。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
translated by 谷歌翻译
Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译