视频质量评估(VQA)仍然是一个重要而挑战性的问题,影响了最广泛的尺度的许多应用程序。移动设备和云计算技术的最新进展使得可以捕获,处理和共度高分辨率,高分辨率(HFR)视频几乎瞬间。能够监控和控制这些流式视频的质量可以使得能够提供更令人愉快的内容和感知的优化速率控制。因此,需要一种强迫需要开发可以在巨大尺度部署的VQA模型。虽然最近的一些效果已应用于可变帧速率和HFR视频质量的全参考(FR)分析,但是没有研究帧速率变化的无引用(NR)VQA算法的开发。在这里,我们提出了一种用于评估HFR视频的一级盲VQA模型,我们将其配给了帧群感知视频评估程序W / O参考(Faver)。 Faver使用扩展模型的空间自然场景统计数据,即包括节省空间小波分解的视频信号,进行有效的帧速率敏感质量预测。我们对几个HFR视频质量数据集的广泛实验表明,PEVER以合理的计算成本优于其他盲VQA算法。为了便于可重复的研究和公共评估,在线可以在线进行狂热的实施:\ url {https://github.com/uniqzheng/hfr-bvqa}。
translated by 谷歌翻译
5G毫米波(MMWAVE)信号与传播信道和传播环境具有固有的几何连接。因此,它们可用于共同本地化接收器并映射传播环境,该传播环境被称为同时定位和映射(SLAM)。5G SLAM中最重要的任务之一是处理测量模型的非线性。为了解决这个问题,现有的5G SLAM依赖于Sigma点或扩展卡尔曼滤波器,针对现有概率密度函数(PDF)线性化测量功能。在本文中,我们研究了关于后部PDF的测量功能的线性化,并将迭代后线性化滤波器实施到泊松多Bernoulli Slam滤波器中。仿真结果表明了所得SLAM过滤器的精度和精确改善。
translated by 谷歌翻译
学习的图像压缩技术近年来取得了相当大的发展。在本文中,我们发现性能瓶颈位于使用单个高度解码器,在这种情况下,三元高斯模型折叠到二进制文件。为了解决这个问题,我们建议使用三个高度解码器来分离混合参数的解码过程,以分散的高斯混合似然性,实现更准确的参数估计。实验结果表明,与最先进的方法相比,MS-SSSIM优化的所提出的方法实现了3.36%的BD速率。所提出的方法对编码时间和拖鞋的贡献可以忽略不计。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 7 evolving registration approaches and 4 traditional registration approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.
translated by 谷歌翻译
The explosion of e-commerce has caused the need for processing and analysis of product titles, like entity typing in product titles. However, the rapid activity in e-commerce has led to the rapid emergence of new entities, which is difficult to be solved by general entity typing. Besides, product titles in e-commerce have very different language styles from text data in general domain. In order to handle new entities in product titles and address the special language styles problem of product titles in e-commerce domain, we propose our textual entailment model with continuous prompt tuning based hypotheses and fusion embeddings for e-commerce entity typing. First, we reformulate the entity typing task into a textual entailment problem to handle new entities that are not present during training. Second, we design a model to automatically generate textual entailment hypotheses using a continuous prompt tuning method, which can generate better textual entailment hypotheses without manual design. Third, we utilize the fusion embeddings of BERT embedding and CharacterBERT embedding with a two-layer MLP classifier to solve the problem that the language styles of product titles in e-commerce are different from that of general domain. To analyze the effect of each contribution, we compare the performance of entity typing and textual entailment model, and conduct ablation studies on continuous prompt tuning and fusion embeddings. We also evaluate the impact of different prompt template initialization for the continuous prompt tuning. We show our proposed model improves the average F1 score by around 2% compared to the baseline BERT entity typing model.
translated by 谷歌翻译
Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators. We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective. Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart. We realize the framework as sparsity-constrained discontinuous optimization problems, which explicitly balance uncertainty and representation for large-scale applications and could be solved by greedy or proximal iterative hard thresholding algorithms. The proposed method can adapt to various settings, including both Bayesian and non-Bayesian neural networks. Numerical experiments show that our work achieves competitive performance across different settings with lower computational complexity.
translated by 谷歌翻译
人脸图像通常以广泛的视觉量表出现。现有的面部表示通过组装有限系列的预定尺度的多尺度方案来追求处理量表变化的带宽。这种多弹药方案带来了推理负担,而预定义的量表不可避免地从真实数据中差异。取而代之的是,从数据中学习比例参数,并将其用于单发功能推理是一个不错的解决方案。为此,我们通过诉诸规模空间理论并实现两倍的设施来改革Conv层:1)Conv层从真实数据分布中学习一组尺度,每个数据分布都由Conv内核来实现; 2)该图层自动在适当的通道和位置上突出显示与输入模式量表及其存在相对应的位置。然后,我们通过堆叠改革层的层来实现分层尺度的关注,建立一种名为“比例尺注意Cons Neurnet网络”(\ textbf {scan-cnn})的新颖风格。我们将扫描CNN应用于面部识别任务,并推动SOTA性能的前沿。当面部图像模糊时,准确性增长更为明显。同时,作为单发方案,该推断比多弹性融合更有效。与普通CNN相比,制造了一组工具,以确保对扫描CNN进行快速训练和推理成本的零增加。
translated by 谷歌翻译
现有的LIDAR基金​​标记系统具有使用限制。特别是,利达塔格(Lidartag)需要特定的标记放置和基于图像的激光雷达基准标记,要求从一个角度对点云进行采样。结果,随着点云从多个角度采样,基准标记的检测仍然是一个未解决的问题。在这封信中,我们开发了一种新颖的算法来检测多视点云中的基准标记。提出的算法包括两个阶段。首先,感兴趣的区域(ROI)检测发现可能包含基准标记的点簇。具体而言,由于从空间的角度来看,从强度的角度提取ROI的方法是引入的,即从空间角度来看,标记是纸张或薄板的床单,与它们所连接的平面是不可区分的。其次,标记检测验证候选ROI是否包含基金标记,并输出有效ROI中标记的ID号和顶点位置。特别是,将ROI传输到预定义的中间平面,目的是采用球形投影以生成强度图像,然后通过强度图像完成标记检测。提供定性和定量实验结果以验证所提出的算法。代码和结果可在以下网址获得:https://github.com/york-sdcnlab/marker?detection-general
translated by 谷歌翻译