量子计算机是下一代设备,有望执行超出古典计算机范围的计算。实现这一目标的主要方法是通过量子机学习,尤其是量子生成学习。由于量子力学的固有概率性质,因此可以合理地假设量子生成学习模型(QGLM)可能会超过其经典对应物。因此,QGLM正在从量子物理和计算机科学社区中受到越来越多的关注,在这些QGLM中,可以在近期量子机上有效实施各种QGLM,并提出了潜在的计算优势。在本文中,我们从机器学习的角度回顾了QGLM的当前进度。特别是,我们解释了这些QGLM,涵盖了量子电路出生的机器,量子生成的对抗网络,量子玻尔兹曼机器和量子自动编码器,作为经典生成学习模型的量子扩展。在这种情况下,我们探讨了它们的内在关系及其根本差异。我们进一步总结了QGLM在常规机器学习任务和量子物理学中的潜在应用。最后,我们讨论了QGLM的挑战和进一步研究指示。
translated by 谷歌翻译
We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests). Experiments over two annotation tasks suggest that POTATO improves labeling speed through its specially-designed productivity features, especially for long documents and complex tasks. POTATO is available at https://github.com/davidjurgens/potato and will continue to be updated.
translated by 谷歌翻译
精确分割是分析心脏周期语义信息并使用心血管信号捕获异常的至关重要的第一步。但是,在深层语义分割领域,通常会单方面与数据的个体属性相混淆。走向心血管信号,准周期性是要学习的必不可少的特征,被视为形态学属性(AM)和节奏(AR)的合成。我们的关键见解是在深度表示的生成过程中抑制对AM或AR的过度依赖性。为了解决这个问题,我们建立了一个结构性因果模型,作为分别自定义AM和AR的干预方法的基础。在本文中,我们提出了对比性因果干预(CCI),以在框架级对比框架下形成一种新颖的训练范式。干预可以消除单个属性带来的隐式统计偏见,并导致更客观的表示。我们对QRS位置和心脏声音分割的受控条件进行了全面的实验。最终结果表明,我们的方法显然可以将QRS位置的性能提高高达0.41%,心脏声音分段为2.73%。该方法的效率推广到多个数据库和嘈杂的信号。
translated by 谷歌翻译
Environmental disturbances, such as sensor data noises, various lighting conditions, challenging weathers and external adversarial perturbations, are inevitable in real self-driving applications. Existing researches and testings have shown that they can severely influence the vehicles perception ability and performance, one of the main issue is the false positive detection, i.e., the ghost object which is not real existed or occurs in the wrong position (such as a non-existent vehicle). Traditional navigation methods tend to avoid every detected objects for safety, however, avoiding a ghost object may lead the vehicle into a even more dangerous situation, such as a sudden break on the highway. Considering the various disturbance types, it is difficult to address this issue at the perceptual aspect. A potential solution is to detect the ghost through relation learning among the whole scenario and develop an integrated end-to-end navigation system. Our underlying logic is that the behavior of all vehicles in the scene is influenced by their neighbors, and normal vehicles behave in a logical way, while ghost vehicles do not. By learning the spatio-temporal relation among surrounding vehicles, an information reliability representation is learned for each detected vehicle and then a robot navigation network is developed. In contrast to existing works, we encourage the network to learn how to represent the reliability and how to aggregate all the information with uncertainties by itself, thus increasing the efficiency and generalizability. To the best of the authors knowledge, this paper provides the first work on using graph relation learning to achieve end-to-end robust navigation in the presence of ghost vehicles. Simulation results in the CARLA platform demonstrate the feasibility and effectiveness of the proposed method in various scenarios.
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.
translated by 谷歌翻译