While large pre-trained models have transformed the field of natural language processing (NLP), the high training cost and low cross-lingual availability of such models prevent the new advances from being equally shared by users across all languages, especially the less spoken ones. To promote equal opportunities for all language speakers in NLP research and to reduce energy consumption for sustainability, this study proposes an effective and energy-efficient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost. We validate this approach in 18 languages and show that this framework is comparable to, if not better than, other heuristics trained with high cost. In addition, when given a low computational cost (2.5\%), the framework outperforms the original monolingual language models in six out of seven tested languages. We release language models in 50 languages translated from English and the source code here.
translated by 谷歌翻译
由于单峰生物识别系统的不稳定性和局限性,多模式系统吸引了研究人员的关注。但是,如何利用不同方式之间的独立和互补信息仍然是一个关键和具有挑战性的问题。在本文中,我们提出了一种基于指纹和手指静脉的多模式融合识别算法(指纹手指静脉 - 通道 - 通道空间注意融合模块,FPV-CSAFM)。具体而言,对于每对指纹和手指静脉图像,我们首先提出一个简单有效的卷积神经网络(CNN)来提取特征。然后,我们构建一个多模式融合模块(通道空间注意融合模块,CSAFM),以完全融合指纹和指纹之间的互补信息。与现有的融合策略不同,我们的融合方法可以根据渠道和空间维度不同模态的重要性动态调整融合权重,以便更好地将信息之间的信息更好地结合在一起,并提高整体识别性能。为了评估我们方法的性能,我们在多个公共数据集上进行了一系列实验。实验结果表明,所提出的FPV-CSAFM基于指纹和手指静脉在三个多模式数据集上实现了出色的识别性能。
translated by 谷歌翻译
变异量子算法(VQA)在NISQ时代表现出巨大的潜力。在VQA的工作流程中,Ansatz的参数迭代更新以近似所需的量子状态。我们已经看到了各种努力,以较少的大门起草更好的安萨兹。在量子计算机中,栅极Ansatz最终将转换为控制信号,例如TransMons上的微波脉冲。并且对照脉冲需要精心校准,以最大程度地减少误差(例如过度旋转和旋转)。在VQA的情况下,此过程将引入冗余,但是VQAS的变异性能自然可以通过更新幅度和频率参数来处理过度旋转和重组的问题。因此,我们提出了PAN,这是一种用于VQA的天然脉冲ANSATZ GENTARATOR框架。我们生成具有可训练参数用于振幅和频率的天然脉冲ansatz。在我们提出的锅中,我们正在调整参数脉冲,这些脉冲在NISQ计算机上得到了内在支持。考虑到本机 - 脉冲ANSATZ不符合参数迁移规则,我们需要部署非级别优化器。为了限制发送到优化器的参数数量,我们采用了一种生成本机 - 脉冲ANSATZ的渐进式方式。实验是在模拟器和量子设备上进行的,以验证我们的方法。当在NISQ机器上采用时,PAN获得的延迟平均提高了86%。 PAN在H2和HEH+上的VQE任务分别能够达到99.336%和96.482%的精度,即使NISQ机器中有很大的噪声。
translated by 谷歌翻译
量子噪声是嘈杂中间级量子(NISQ)计算机中的关键挑战。以前的缓解噪声的工作主要集中在门级或脉冲级噪声自适应编译。然而,有限的研究工作通过使量子电路本身对噪声具有更高的优化级别。我们提出了Quoutumnas,是变分电路和量子位映射的噪声自适应共同搜索的全面框架。变形量子电路是构建QML和量子仿真的有希望的方法。然而,由于大型设计空间和参数训练成本,找到最佳变分电路及其最佳参数是具有挑战性的。我们建议通过引入新的超级速度来解耦电路搜索和参数培训。超电路由多层预定的参数化栅极构成,并通过迭代采样和更新其的参数子集(Subcircuit)训练。它提供了从头开始培训的子通差形性能的准确估计。然后我们执行Subcircuit的演进共同搜索和其量子位映射。使用从超级电路继承的参数和使用真实设备噪声模型进行估计,估计子电路性能。最后,我们执行迭代栅极修剪和FineTuning以去除冗余栅极。在10个量子计算上广泛评估了12个QML和VQE基准,Quoutumnas显着优于基线。对于QML,Quoutumnas是第一个展示超过95%的2级,85%的4级和真实QC的32%的10级分类准确性。与UCCSD相比,它还实现了H2,H2O,LIH,CH4,BEH2上的VQE任务的最低特征值。我们还开源Quantumengine(https://github.com/mit-han-lab/pytorch-quantum),用于快速训练参数化量子电路,以促进未来的研究。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译
Learning with noisy-labels has become an important research topic in computer vision where state-of-the-art (SOTA) methods explore: 1) prediction disagreement with co-teaching strategy that updates two models when they disagree on the prediction of training samples; and 2) sample selection to divide the training set into clean and noisy sets based on small training loss. However, the quick convergence of co-teaching models to select the same clean subsets combined with relatively fast overfitting of noisy labels may induce the wrong selection of noisy label samples as clean, leading to an inevitable confirmation bias that damages accuracy. In this paper, we introduce our noisy-label learning approach, called Asymmetric Co-teaching (AsyCo), which introduces novel prediction disagreement that produces more consistent divergent results of the co-teaching models, and a new sample selection approach that does not require small-loss assumption to enable a better robustness to confirmation bias than previous methods. More specifically, the new prediction disagreement is achieved with the use of different training strategies, where one model is trained with multi-class learning and the other with multi-label learning. Also, the new sample selection is based on multi-view consensus, which uses the label views from training labels and model predictions to divide the training set into clean and noisy for training the multi-class model and to re-label the training samples with multiple top-ranked labels for training the multi-label model. Extensive experiments on synthetic and real-world noisy-label datasets show that AsyCo improves over current SOTA methods.
translated by 谷歌翻译
Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting where a machine learning model is to be distributed to various buyers and each buyer receives a slightly different copy with same functionality. A malicious buyer generates adversarial examples from a particular copy $\mathcal{M}_i$ and uses them to attack other copies. From these adversarial examples, the investigator wants to identify the source $\mathcal{M}_i$. To address this problem, we propose a two-stage separate-and-trace framework. The model separation stage generates multiple copies of a model for a same classification task. This process injects unique characteristics into each copy so that adversarial examples generated have distinct and traceable features. We give a parallel structure which embeds a ``tracer'' in each copy, and a noise-sensitive training loss to achieve this goal. The tracing stage takes in adversarial examples and a few candidate models, and identifies the likely source. Based on the unique features induced by the noise-sensitive loss function, we could effectively trace the potential adversarial copy by considering the output logits from each tracer. Empirical results show that it is possible to trace the origin of the adversarial example and the mechanism can be applied to a wide range of architectures and datasets.
translated by 谷歌翻译