提示方法被认为是几次自然语言处理的关键进展之一。最近对基于离散令牌的``硬提示''转移到连续``软提示''的最新研究,这些提示将可学习的向量用作伪提示代币并实现更好的性能。尽管显示出有希望的前景,但观察到这些软宣传的方法在很大程度上依赖良好的初始化来生效。不幸的是,获得软提示的完美初始化需要了解内在语言模型的工作和精心设计,这绝非易事,必须从头开始重新启动每个新任务。为了解决此问题,我们提出了一种称为Metaprompting的广义软提示方法,该方法采用了良好认可的模型 - 静态元学习算法,以自动找到更好的及时初始化,从而快速适应新的促进任务。问题并在四个不同的数据集上带来了显着改善(1次设置的准确性提高了6分),从而实现了新的最新性能。
translated by 谷歌翻译
在各种设备上部署深度学习模型已成为一个重要的话题。硬件专业化的浪潮为多维张量计算带来了一套多样化的加速度原始图。这些新的加速原始基原料以及新兴的机器学习模型带来了巨大的工程挑战。在本文中,我们提出了Tensorir,这是一种编译器抽象,用于通过这些张量计算原始素优化程序。Tensorir概括了现有机器学习编译器中使用的循环巢表示,以将张量计算作为一流的公民。最后,我们在抽象之上构建了一个端到端框架,以自动优化给定的张量计算原始图的深度学习模型。实验结果表明,Tensorir编译会自动使用给定硬件后端的张量计算原始图,并提供与跨平台的最新手工精制系统竞争性能的性能。
translated by 谷歌翻译
作为有效的策略,数据增强(DA)减轻了深度学习技术可能失败的数据稀缺方案。它广泛应用于计算机视觉,然后引入自然语言处理并实现了许多任务的改进。DA方法的主要重点之一是提高培训数据的多样性,从而帮助模型更好地推广到看不见的测试数据。在本调查中,我们根据增强数据的多样性,将DA方法框架为三类,包括释义,注释和采样。我们的论文根据上述类别,详细分析了DA方法。此外,我们还在NLP任务中介绍了他们的应用以及挑战。
translated by 谷歌翻译
In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we present a dual encoder model called BagFormer that utilizes a cross modal interaction mechanism to improve recall performance without sacrificing latency and throughput. BagFormer achieves this through the use of bag-wise interactions, which allow for the transformation of text to a more appropriate granularity and the incorporation of entity knowledge into the model. Our experiments demonstrate that BagFormer is able to achieve results comparable to state-of-the-art single encoder models in cross-modal retrieval tasks, while also offering efficient training and inference with 20.72 times lower latency and 25.74 times higher throughput.
translated by 谷歌翻译
The past few years have witnessed the prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the community of 3D point cloud learning. Different from previous pre-training pipelines for 3D point clouds that generally fall into the scope of either generative modeling or contrastive learning, in this paper, we investigate a translative pre-training paradigm, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from an input 3D object point cloud to its diverse forms of 2D rendered images (e.g., silhouette, depth, contour). Specifically, we begin with deducing view-conditioned point-wise embeddings via the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which is further fed into the subsequent 2D convolutional translation heads for image generation. We conduct extensive experiments on common task scenarios of 3D shape analysis, where our PointVST shows consistent and prominent performance superiority over current state-of-the-art methods under diverse evaluation protocols. Our code will be made publicly available.
translated by 谷歌翻译
This paper utilizes an anomaly detection algorithm to check if underwater gliders are operating normally in the unknown ocean environment. Glider pilots can be warned of the detected glider anomaly in real time, thus taking over the glider appropriately and avoiding further damage to the glider. The adopted algorithm is validated by two valuable sets of data in real glider deployments, the University of South Florida (USF) glider Stella and the Skidaway Institute of Oceanography (SkIO) glider Angus.
translated by 谷歌翻译
Blind watermarking provides powerful evidence for copyright protection, image authentication, and tampering identification. However, it remains a challenge to design a watermarking model with high imperceptibility and robustness against strong noise attacks. To resolve this issue, we present a framework Combining the Invertible and Non-invertible (CIN) mechanisms. The CIN is composed of the invertible part to achieve high imperceptibility and the non-invertible part to strengthen the robustness against strong noise attacks. For the invertible part, we develop a diffusion and extraction module (DEM) and a fusion and split module (FSM) to embed and extract watermarks symmetrically in an invertible way. For the non-invertible part, we introduce a non-invertible attention-based module (NIAM) and the noise-specific selection module (NSM) to solve the asymmetric extraction under a strong noise attack. Extensive experiments demonstrate that our framework outperforms the current state-of-the-art methods of imperceptibility and robustness significantly. Our framework can achieve an average of 99.99% accuracy and 67.66 dB PSNR under noise-free conditions, while 96.64% and 39.28 dB combined strong noise attacks. The code will be available in https://github.com/rmpku/CIN.
translated by 谷歌翻译
Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.
translated by 谷歌翻译
Transformer-based models have been widely demonstrated to be successful in computer vision tasks by modelling long-range dependencies and capturing global representations. However, they are often dominated by features of large patterns leading to the loss of local details (e.g., boundaries and small objects), which are critical in medical image segmentation. To alleviate this problem, we propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs, namely, the Global-to-Local Spatial Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA has the ability to aggregate and represent both global and local spatial features, which are beneficial for locating large and small objects, respectively. The SBA module is used to aggregate the boundary characteristic from low-level features and semantic information from high-level features for better preserving boundary details and locating the re-calibration objects. Extensive experiments in six benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images. In addition, our approach is more robust than existing methods in various challenging situations such as small object segmentation and ambiguous object boundaries.
translated by 谷歌翻译
The acquisition of high-quality human annotations through crowdsourcing platforms like Amazon Mechanical Turk (MTurk) is more challenging than expected. The annotation quality might be affected by various aspects like annotation instructions, Human Intelligence Task (HIT) design, and wages paid to annotators, etc. To avoid potentially low-quality annotations which could mislead the evaluation of automatic summarization system outputs, we investigate the recruitment of high-quality MTurk workers via a three-step qualification pipeline. We show that we can successfully filter out bad workers before they carry out the evaluations and obtain high-quality annotations while optimizing the use of resources. This paper can serve as basis for the recruitment of qualified annotators in other challenging annotation tasks.
translated by 谷歌翻译