Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
translated by 谷歌翻译
Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers.
translated by 谷歌翻译
分布式隐私的回归方案已在各个领域开发和扩展,在这些领域中,多方协作和私人运行优化算法,例如梯度下降,以学习一组最佳参数。但是,传统的基于梯度的方法无法解决包含具有L1正则化的客观功能的问题,例如LASSO回归。在本文中,我们介绍了一个名为FCD的新分布式方案联合坐标下降,旨在在多方场景下安全地解决此问题。具体而言,通过安全的聚合和添加的扰动,我们的方案确保:(1)没有向其他方泄漏本地信息,并且(2)全局模型参数不会暴露于云服务器。最终,各方可以消除附加的扰动,以得出具有高性能的全球模型。我们表明,FCD方案填补了多方安全坐标下降方法的空白,并且适用于一般线性回归,包括线性,脊和拉索回归。理论安全分析和实验结果表明,可以有效,有效地执行FCD,并以低MAE度量作为在现实世界UCI数据集的三种线性回归的任务下作为集中方法提供的低MAE度量。
translated by 谷歌翻译
选择第一次到达的Prestack收集时间被称为首次到达时间(FAT)采摘,这是地震数据处理中必不可少的一步,并且主要是手动解决的。随着当前地震数据收集密度的增加,手动采摘效率无法满足实际需求。因此,近几十年来,自动采摘方法已经大大开发出来,尤其是基于深度学习的方法。但是,当前有监督的基于深度学习的方法很少可以避免对标记样品的依赖。此外,由于收集数据是一组与自然图像大不相同的信号,因此当前方法在低信号与噪声比(SNR)的情况下很难解决脂肪拾取问题。在本文中,对于Hard Rock地震收集数据,我们提出了一个多阶段分割拾取网络(MSSPN),该网络解决了跨工作地点的概括问题以及在低SNR的情况下的采摘问题。在MSSPN中,有四个子模型可以模拟手动采摘处理,从而将其假定为从粗糙到细的四个阶段。具有不同质量的七个现场数据集的实验表明,我们的MSSPN的表现优于大幅度的基准。尤其是,在中等和高snrs的情况下,我们的方法可以实现超过90 \%的精确拾取,甚至精细模型也可以使用低SNR实现88 \%精确的数据集。
translated by 谷歌翻译
Gaits和Transitions是腿部运动的关键组件。对于腿机器人,描述和再现Gaits以及过渡仍然存在长期挑战。强化学习已成为制定腿机器人控制器的强大工具。然而,学习多次Gaits和Transitions,与多任务学习问题有关。在这项工作中,我们提出了一种新颖的框架,用于培训一个简单的控制策略,以便将四足机器人培训到各种GA足够的机器人。使用四个独立阶段作为步态发生器和控制策略之间的界面,其表征了四英尺的运动。由阶段引导,四叉机器人能够根据生成的遗传率,例如步行,小跑,起搏和边界,并在那些Gaits之间进行过渡。可以使用更多的一般阶段来产生复杂的Gaits,例如混合节奏跳舞。通过控制策略,黑豹机器人是一种中型狗大小的四足机器人,可以在自然环境中平滑且鲁棒地在速度和鲁棒方面进行速度下进行所有学习的电机技能。
translated by 谷歌翻译
最近,基于模板的跟踪器已成为领先的跟踪算法,在效率和准确性方面具有希望的性能。然而,查询特征与给定模板之间的相关操作仅利用准确的目标本地化,导致状态估计误差,特别是当目标遭受严重可变形变化时。为了解决这个问题,已经提出了基于分段的跟踪器,以便使用每像素匹配来有效地提高可变形物体的跟踪性能。然而,大多数现有跟踪器仅指初始帧中的目标特征,从而缺乏处理具有挑战性因素的辨别能力,例如,类似的分心,背景杂乱,外观变化等。在此目的,我们提出了一种动态的紧凑型存储器嵌入以增强基于分段的可变形视觉跟踪方法的辨别。具体而言,我们初始化与第一帧中的目标功能嵌入的内存嵌入。在跟踪过程中,与现有内存具有高相关的当前目标特征被更新为在线嵌入的内存。为了进一步提高可变形对象的分割精度,我们采用了点对集的匹配策略来测量像素 - 方向查询特征和整个模板之间的相关性,以捕获更详细的变形信息。关于六个具有挑战性的跟踪基准的广泛评估,包括VOT2016,VOT2018,VOT2019,GOT-10K,TrackingNet和莱斯特展示了我们对近期近似追踪者的方法的优势。此外,我们的方法优于基于出色的基于分段的跟踪器,即DVIS2017基准测试。
translated by 谷歌翻译
The detection of human body and its related parts (e.g., face, head or hands) have been intensively studied and greatly improved since the breakthrough of deep CNNs. However, most of these detectors are trained independently, making it a challenging task to associate detected body parts with people. This paper focuses on the problem of joint detection of human body and its corresponding parts. Specifically, we propose a novel extended object representation that integrates the center location offsets of body or its parts, and construct a dense single-stage anchor-based Body-Part Joint Detector (BPJDet). Body-part associations in BPJDet are embedded into the unified representation which contains both the semantic and geometric information. Therefore, BPJDet does not suffer from error-prone association post-matching, and has a better accuracy-speed trade-off. Furthermore, BPJDet can be seamlessly generalized to jointly detect any body part. To verify the effectiveness and superiority of our method, we conduct extensive experiments on the CityPersons, CrowdHuman and BodyHands datasets. The proposed BPJDet detector achieves state-of-the-art association performance on these three benchmarks while maintains high accuracy of detection. Code will be released to facilitate further studies.
translated by 谷歌翻译
Most recent head pose estimation (HPE) methods are dominated by the Euler angle representation. To avoid its inherent ambiguity problem of rotation labels, alternative quaternion-based and vector-based representations are introduced. However, they both are not visually intuitive, and often derived from equivocal Euler angle labels. In this paper, we present a novel single-stage keypoint-based method via an {\it intuitive} and {\it unconstrained} 2D cube representation for joint head detection and pose estimation. The 2D cube is an orthogonal projection of the 3D regular hexahedron label roughly surrounding one head, and itself contains the head location. It can reflect the head orientation straightforwardly and unambiguously in any rotation angle. Unlike the general 6-DoF object pose estimation, our 2D cube ignores the 3-DoF of head size but retains the 3-DoF of head pose. Based on the prior of equal side length, we can effortlessly obtain the closed-form solution of Euler angles from predicted 2D head cube instead of applying the error-prone PnP algorithm. In experiments, our proposed method achieves comparable results with other representative methods on the public AFLW2000 and BIWI datasets. Besides, a novel test on the CMU panoptic dataset shows that our method can be seamlessly adapted to the unconstrained full-view HPE task without modification.
translated by 谷歌翻译
As an effective method to deliver external materials into biological cells, microinjection has been widely applied in the biomedical field. However, the cognition of cell mechanical property is still inadequate, which greatly limits the efficiency and success rate of injection. Thus, a new rate-dependent mechanical model based on membrane theory is proposed for the first time. In this model, an analytical equilibrium equation between the injection force and cell deformation is established by considering the speed effect of microinjection. Different from the traditional membrane-theory-based model, the elastic coefficient of the constitutive material in the proposed model is modified as a function of the injection velocity and acceleration, effectively simulating the influence of speeds on the mechanical responses and providing a more generalized and practical model. Using this model, other mechanical responses at different speeds can be also accurately predicted, including the distribution of membrane tension and stress and the deformed shape. To verify the validity of the model, numerical simulations and experiments are carried out. The results show that the proposed model can match the real mechanical responses well at different injection speeds.
translated by 谷歌翻译
Each student matters, but it is hardly for instructors to observe all the students during the courses and provide helps to the needed ones immediately. In this paper, we present StuArt, a novel automatic system designed for the individualized classroom observation, which empowers instructors to concern the learning status of each student. StuArt can recognize five representative student behaviors (hand-raising, standing, sleeping, yawning, and smiling) that are highly related to the engagement and track their variation trends during the course. To protect the privacy of students, all the variation trends are indexed by the seat numbers without any personal identification information. Furthermore, StuArt adopts various user-friendly visualization designs to help instructors quickly understand the individual and whole learning status. Experimental results on real classroom videos have demonstrated the superiority and robustness of the embedded algorithms. We expect our system promoting the development of large-scale individualized guidance of students.
translated by 谷歌翻译