如图1所示,光学特征识别(OCR)技术已在各种场景中广泛使用。设计实用的OCR系统仍然是一项有意义但具有挑战性的任务。在以前的工作中,考虑到效率和准确性,我们提出了实用的超轻型OCR系统(PP-OCR)和优化的版本PP-OCRV2。为了进一步提高PP-OCRV2的性能,本文提出了更强大的OCR系统PP-OCRV3。 PP-OCRV3基于PP-OCRV2的9个方面升级了文本检测模型和文本识别模型。对于文本检测器,我们引入了一个带有大型接收场LK-PAN的锅模块,该模块是一个名为RSE-FPN的剩余注意机制的FPN模块和DML蒸馏策略。对于文本识别器,基本模型将从CRNN替换为SVTR,我们介绍了轻量级文本识别网络SVTR LCNET,通过注意力进行CTC的指导培训,数据增强策略TextConaug,由自我审查的TextRotnet,UDML和UDML和UDML和UDML和更好的预培训模型。 UIM加速模型并改善效果。实际数据上的实验表明,在可比的推理速度下,PP-OCRV3的Hmean比PP-OCRV2高5%。上述所有上述型号都是开源的,并且代码可在由PaddlePaddle供电的GitHub存储库Paddleocr中可用。
translated by 谷歌翻译
多标签遥感图像分类(MLRSIC)已获得越来越多的研究兴趣。将多个标签的辅助关系作为其他信息有助于提高此任务的性能。当前方法着重于使用它来限制卷积神经网络(CNN)的最终功能输出。一方面,这些方法不会充分利用标签相关来形成特征表示。另一方面,它们增加了系统的标签噪声灵敏度,导致稳健性差。在本文中,提出了一种称为语义交织的全球通道注意(Signa)的新颖方法。首先,根据数据集的统计信息获得标签共发生图。标签共发生图用作图形神经网络(GNN)的输入,以生成最佳特征表示。然后,语义特征和视觉特征交错,以指导图像从原始特征空间到具有嵌入式标签关系的语义特征空间的特征表达。 Signa在新的语义特征空间中触发了特征地图通道的全球关注,以提取更重要的视觉特征。提出了基于多头签名的功能自适应加权网络,以插件的方式对任何CNN作用。对于遥感图像,可以通过将CNN插入浅层层来实现更好的分类性能。我们对三个数据集进行了广泛的实验比较:UCM数据集,AID数据集和DFC15数据集。实验结果表明,与最新方法(SOTA)方法相比,所提出的Signa具有出色的分类性能。值得一提的是,本文的代码将向社区开放,以进行可重复性研究。我们的代码可在https://github.com/kyle-one/signa上找到。
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform.
translated by 谷歌翻译
With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.
translated by 谷歌翻译
Because of the necessity to obtain high-quality images with minimal radiation doses, such as in low-field magnetic resonance imaging, super-resolution reconstruction in medical imaging has become more popular (MRI). However, due to the complexity and high aesthetic requirements of medical imaging, image super-resolution reconstruction remains a difficult challenge. In this paper, we offer a deep learning-based strategy for reconstructing medical images from low resolutions utilizing Transformer and Generative Adversarial Networks (T-GAN). The integrated system can extract more precise texture information and focus more on important locations through global image matching after successfully inserting Transformer into the generative adversarial network for picture reconstruction. Furthermore, we weighted the combination of content loss, adversarial loss, and adversarial feature loss as the final multi-task loss function during the training of our proposed model T-GAN. In comparison to established measures like PSNR and SSIM, our suggested T-GAN achieves optimal performance and recovers more texture features in super-resolution reconstruction of MRI scanned images of the knees and belly.
translated by 谷歌翻译
In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. Different from most existing NeRF methods that are based on multiple views, monocular videos only contain one view at each timestamp, thereby suffering from ambiguity along the view direction in estimating point features and scene flows. Previous studies such as DynNeRF disambiguate point features by positional encoding, which is not transferable and severely limits the generalization ability. As a result, these methods have to train one independent model for each scene and suffer from heavy computational costs when applying to increasing monocular videos in real-world applications. To address this, We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames. More specifically, we learn an implicit velocity field to estimate point trajectory from temporal features with Neural ODE, which is followed by a flow-based feature aggregation module to obtain spatial features along the point trajectory. We jointly optimize temporal and spatial features by training the network in an end-to-end manner. Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
translated by 谷歌翻译
Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds. Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation. Prior works either address this issue by post-processing or jointly learn object boundaries to implicitly improve feature encoding of the networks. These approaches often require additional modules which are difficult to integrate into the original architecture. To improve the segmentation near object boundaries, we propose a boundary-aware feature propagation mechanism. This mechanism is achieved by exploiting a multi-task learning framework that aims to explicitly guide the boundaries to their original locations. With one shared encoder, our network outputs (i) boundary localization, (ii) prediction of directions pointing to the object's interior, and (iii) semantic segmentation, in three parallel streams. The predicted boundaries and directions are fused to propagate the learned features to refine the segmentation. We conduct extensive experiments on the S3DIS and SensatUrban datasets against various baseline methods, demonstrating that our proposed approach yields consistent improvements by reducing boundary errors. Our code is available at https://github.com/shenglandu/PushBoundary.
translated by 谷歌翻译
Real estate appraisal is a crucial issue for urban applications, which aims to value the properties on the market. Traditional methods perform appraisal based on the domain knowledge, but suffer from the efforts of hand-crafted design. Recently, several methods have been developed to automatize the valuation process by taking the property trading transaction into account when estimating the property value. However, existing methods only consider the real estate itself, ignoring the relation between the properties. Moreover, naively aggregating the information of neighbors fails to model the relationships between the transactions. To tackle these limitations, we propose a novel Neighbor Relation Graph Learning Framework (ReGram) by incorporating the relation between target transaction and surrounding neighbors with the attention mechanism. To model the influence between communities, we integrate the environmental information and the past price of each transaction from other communities. Moreover, since the target transactions in different regions share some similarities and differences of characteristics, we introduce a dynamic adapter to model the different distributions of the target transactions based on the input-related kernel weights. Extensive experiments on the real-world dataset with various scenarios demonstrate that ReGram robustly outperforms the state-of-the-art methods. Furthermore, comprehensive ablation studies were conducted to examine the effectiveness of each component in ReGram.
translated by 谷歌翻译