The generation of Chinese fonts has a wide range of applications. The currently predominated methods are mainly based on deep generative models, especially the generative adversarial networks (GANs). However, existing GAN-based models usually suffer from the well-known mode collapse problem. When mode collapse happens, the kind of GAN-based models will be failure to yield the correct fonts. To address this issue, we introduce a one-bit stroke encoding and a few-shot semi-supervised scheme (i.e., using a few paired data as semi-supervised information) to explore the local and global structure information of Chinese characters respectively, motivated by the intuition that strokes and characters directly embody certain local and global modes of Chinese characters. Based on these ideas, this paper proposes an effective model called \textit{StrokeGAN+}, which incorporates the stroke encoding and the few-shot semi-supervised scheme into the CycleGAN model. The effectiveness of the proposed model is demonstrated by amounts of experiments. Experimental results show that the mode collapse issue can be effectively alleviated by the introduced one-bit stroke encoding and few-shot semi-supervised training scheme, and that the proposed model outperforms the state-of-the-art models in fourteen font generation tasks in terms of four important evaluation metrics and the quality of generated characters. Besides CycleGAN, we also show that the proposed idea can be adapted to other existing models to improve their performance. The effectiveness of the proposed model for the zero-shot traditional Chinese font generation is also evaluated in this paper.
translated by 谷歌翻译
Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
translated by 谷歌翻译
Font generation is a difficult and time-consuming task, especially in those languages using ideograms that have complicated structures with a large number of characters, such as Chinese. To solve this problem, few-shot font generation and even one-shot font generation have attracted a lot of attention. However, most existing font generation methods may still suffer from (i) large cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii) incorrect generation of complicated characters. In this paper, we propose a novel one-shot font generation method based on a diffusion model, named Diff-Font, which can be stably trained on large datasets. The proposed model aims to generate the entire font library by giving only one sample as the reference. Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character. To our best knowledge, the proposed Diff-Font is the first work that developed diffusion models to handle the font generation task. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation. Compared to previous font generation methods, our model reaches state-of-the-art performance both qualitatively and quantitatively.
translated by 谷歌翻译
旨在生成新的字体的几个示例字体(FFG),由于劳动力成本的显着降低,它引起了人们的关注。典型的FFG管道将标准字体库中的字符视为内容字形,并通过从参考字形中提取样式信息将其转移到新的目标字体中。大多数现有的解决方案明确地删除了全球或组件的参考字形的内容和参考字形的样式。但是,字形的风格主要在于当地细节,即激进,组件和笔触的风格一起描绘了雕文的样式。因此,即使是单个字符也可以包含在空间位置分布的不同样式。在本文中,我们通过学习提出了一种新的字体生成方法1)参考文献中的细粒度局部样式,以及2)内容和参考文字之间的空间对应关系。因此,内容字形中的每个空间位置都可以使用正确的细粒样式分配。为此,我们对内容字形的表示作为查询和参考字形表示作为键和值的跨注意。交叉注意机制无需明确地删除全球或组件建模,而是可以在参考文字中遵循正确的本地样式,并将参考样式汇总为给定内容字形的精细粒度样式表示。实验表明,所提出的方法的表现优于FFG中最新方法。特别是,用户研究还证明了我们方法的样式一致性显着优于以前的方法。
translated by 谷歌翻译
Generating new fonts is a time-consuming and labor-intensive, especially in a language with a huge amount of characters like Chinese. Various deep learning models have demonstrated the ability to efficiently generate new fonts with a few reference characters of that style. This project aims to develop a few-shot cross-lingual font generator based on AGIS-Net and improve the performance metrics mentioned. Our approaches include redesigning the encoder and the loss function. We will validate our method on multiple languages and datasets mentioned.
translated by 谷歌翻译
汉字带有大量的形态和语义信息;因此,汉字形态的语义增强引起了极大的关注。先前的方法旨在直接从整个汉字图像中提取信息,这些图像通常无法同时捕获全球和本地信息。在本文中,我们开发了一种基于中风的自动编码器(SAE),以用自我监督的方法对汉字的复杂形态进行建模。按照其规范写作顺序,我们首先将汉字作为一系列带有固定写作顺序的中风图像,然后我们的SAE模型经过训练以重建此中风图像序列。只要训练集中出现这种预训练的SAE模型,只要它们的中风或激进分出现在看不见的字符中。我们在不同形式的中风图像上设计了两个对比的SAE架构。一种是对现有基于中风的方法进行微调的,用于零拍识别手写的汉字,另一个用于从其形态特征中富含中文单词的嵌入。实验结果证明,在预训练之后,我们的SAE架构以零拍的识别优于其他现有方法,并以其丰富的形态和语义信息增强了汉字的表示。
translated by 谷歌翻译
本文的目标是对面部素描合成(FSS)问题进行全面的研究。然而,由于获得了手绘草图数据集的高成本,因此缺乏完整的基准,用于评估过去十年的FSS算法的开发。因此,我们首先向FSS引入高质量的数据集,名为FS2K,其中包括2,104个图像素描对,跨越三种类型的草图样式,图像背景,照明条件,肤色和面部属性。 FS2K与以前的FSS数据集不同于难度,多样性和可扩展性,因此应促进FSS研究的进展。其次,我们通过调查139种古典方法,包括34个手工特征的面部素描合成方法,37个一般的神经式传输方法,43个深映像到图像翻译方法,以及35个图像 - 素描方法。此外,我们详细说明了现有的19个尖端模型的综合实验。第三,我们为FSS提供了一个简单的基准,名为FSGAN。只有两个直截了当的组件,即面部感知屏蔽和风格矢量扩展,FSGAN将超越所提出的FS2K数据集的所有先前最先进模型的性能,通过大边距。最后,我们在过去几年中汲取的经验教训,并指出了几个未解决的挑战。我们的开源代码可在https://github.com/dengpingfan/fsgan中获得。
translated by 谷歌翻译
自动艺术文本生成是一个新兴主题,由于其广泛的应用而受到越来越多的关注。艺术文本可以分别分为三个组成部分,内容,字体和纹理。现有的艺术文本生成模型通常着重于操纵上述组件的一个方面,这是可控的一般艺术文本生成的亚最佳解决方案。为了解决这个问题,我们提出了一种新颖的方法,即Gentext,以通过将字体和纹理样式从不同的源图像迁移到目标图像来实现一般的艺术文本样式转移。具体而言,我们当前的工作分别结合了三个不同的阶段,分别是具有单个强大的编码网络和两个单独的样式生成器网络,一个用于字体传输的统一平台,分别为统一的平台,另一个用于风格化和命运化。命令阶段首先提取字体参考图像的字体样式,然后字体传输阶段使用所需的字体样式生成目标内容。最后,样式阶段呈现有关参考图像中纹理样式的结果字体图像。此外,考虑到配对艺术文本图像的难度数据采集,我们的模型是在无监督的设置下设计的,可以从未配对的数据中有效地优化所有阶段。定性和定量结果是在艺术文本基准上执行的,这证明了我们提出的模型的出色性能。带有模型的代码将来将公开使用。
translated by 谷歌翻译
自我监督图像生成中普遍的方法是在像素级表示上操作。尽管这种方法可以产生高质量的图像,但它不能从矢量化的简单性和先天质量中受益。在这里,我们提出了一个以图像的冲程级表示作用的绘图代理。在每个时间步骤中,代理商首先评估当前画布,并决定是停止还是继续绘画。当做出“抽奖”决定时,代理输出一个程序,指示要绘制的中风。结果,它通过使用最小数量的笔触并动态决定何时停止,从而产生最终的栅格图像。我们通过对MNIST和Omniglot数据集进行强化学习来培训我们的代理,以无条件生成和解析(重建)任务。我们利用我们的解析代理在Omniglot挑战中进行典范生成和类型的条件概念生成,而无需进行任何进一步的培训。我们在所有三代任务和解析任务上提供了成功的结果。至关重要的是,我们不需要任何中风级别或矢量监督;我们只使用栅格图像进行训练。
translated by 谷歌翻译
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
translated by 谷歌翻译
尽管具有生成对抗网络(GAN)的图像到图像(I2I)翻译的显着进步,但使用单对生成器和歧视器将图像有效地转换为多个目标域中的一组不同图像仍然具有挑战性。现有的I2i翻译方法采用多个针对不同域的特定于域的内容编码,其中每个特定于域的内容编码器仅经过来自同一域的图像的训练。然而,我们认为应从所有域之间的图像中学到内容(域变相)特征。因此,现有方案的每个特定于域的内容编码器都无法有效提取域不变特征。为了解决这个问题,我们提出了一个灵活而通用的Sologan模型,用于在多个域之间具有未配对数据的多模式I2I翻译。与现有方法相反,Solgan算法使用具有附加辅助分类器的单个投影鉴别器,并为所有域共享编码器和生成器。因此,可以使用来自所有域的图像有效地训练Solgan,从而可以有效提取域 - 不变性内容表示。在多个数据集中,针对多个同行和sologan的变体的定性和定量结果证明了该方法的优点,尤其是对于挑战i2i翻译数据集的挑战,即涉及极端形状变化的数据集或在翻译后保持复杂的背景,需要保持复杂的背景。此外,我们通过消融研究证明了Sogan中每个成分的贡献。
translated by 谷歌翻译
Cartoons are an important part of our entertainment culture. Though drawing a cartoon is not for everyone, creating it using an arrangement of basic geometric primitives that approximates that character is a fairly frequent technique in art. The key motivation behind this technique is that human bodies - as well as cartoon figures - can be split down into various basic geometric primitives. Numerous tutorials are available that demonstrate how to draw figures using an appropriate arrangement of fundamental shapes, thus assisting us in creating cartoon characters. This technique is very beneficial for children in terms of teaching them how to draw cartoons. In this paper, we develop a tool - shape2toon - that aims to automate this approach by utilizing a generative adversarial network which combines geometric primitives (i.e. circles) and generate a cartoon figure (i.e. Mickey Mouse) depending on the given approximation. For this purpose, we created a dataset of geometrically represented cartoon characters. We apply an image-to-image translation technique on our dataset and report the results in this paper. The experimental results show that our system can generate cartoon characters from input layout of geometric shapes. In addition, we demonstrate a web-based tool as a practical implication of our work.
translated by 谷歌翻译
学习为仅基于几个图像(称为少数图像生成的少数图像)生成新类别的新图像,引起了研究的兴趣。几项最先进的作品取得了令人印象深刻的结果,但多样性仍然有限。在这项工作中,我们提出了一个新型的三角洲生成对抗网络(Deltagan),该网络由重建子网和一代子网组成。重建子网捕获了类别内转换,即“ delta”,在相同类别对之间。生成子网为输入图像生成了特定于样本的“ delta”,该图像与此输入图像结合使用,以在同一类别中生成新图像。此外,对抗性的三角洲匹配损失旨在将上述两个子网链接在一起。在五个少量图像数据集上进行的广泛实验证明了我们提出的方法的有效性。
translated by 谷歌翻译
学习为仅基于几个图像(称为少数图像生成的少数图像)生成新类别的新图像,引起了研究的兴趣。几项最先进的作品取得了令人印象深刻的结果,但多样性仍然有限。在这项工作中,我们提出了一个新型的三角洲生成对抗网络(Deltagan),该网络由重建子网和一代子网组成。重建子网捕获了类别内转换,即同一类别对之间的三角洲。该生成子网为输入图像生成了特定于样本的三角洲,该图像与此输入图像结合使用,以在同一类别中生成新图像。此外,对抗性的三角洲匹配损失旨在将上述两个子网链接在一起。六个基准数据集的广泛实验证明了我们提出的方法的有效性。我们的代码可从https://github.com/bcmi/deltagan-few-shot-image-generation获得。
translated by 谷歌翻译
With the development of convolutional neural networks, hundreds of deep learning based dehazing methods have been proposed. In this paper, we provide a comprehensive survey on supervised, semi-supervised, and unsupervised single image dehazing. We first discuss the physical model, datasets, network modules, loss functions, and evaluation metrics that are commonly used. Then, the main contributions of various dehazing algorithms are categorized and summarized. Further, quantitative and qualitative experiments of various baseline methods are carried out. Finally, the unsolved issues and challenges that can inspire the future research are pointed out. A collection of useful dehazing materials is available at \url{https://github.com/Xiaofeng-life/AwesomeDehazing}.
translated by 谷歌翻译
通过对抗训练的雾霾图像转换的关键程序在于仅涉及雾度合成的特征,即表示不变语义内容的特征,即内容特征。以前的方法通过利用它在培训过程中对Haze图像进行分类来分开单独的内容。然而,在本文中,我们认识到在这种技术常规中的内容式解剖学的不完整性。缺陷的样式功能与内容信息纠缠不可避免地引导阴霾图像的呈现。要解决,我们通过随机线性插值提出自我监督的风格回归,以减少风格特征中的内容信息。烧蚀实验表明了静态感知雾度图像合成中的解开的完整性及其优越性。此外,所产生的雾度数据应用于车辆检测器的测试概括。雾度和检测性能之间的进一步研究表明,雾度对车辆探测器的概括具有明显的影响,并且这种性能降低水平与雾度水平线性相关,反过来验证了该方法的有效性。
translated by 谷歌翻译
Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LS-GANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ 2 divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.
translated by 谷歌翻译
长尾效应是一个常见的问题,它限制了对现实世界数据集中深度学习模型的性能。由于字符使用频率差异,角色图像数据集的开发还受到这种不平衡数据分布的影响。因此,当当前的角色识别方法应用于现实世界数据集时,尤其是尾巴中缺少训练样本的字符类别,例如不常见的字符或历史文档中的字符。在本文中,我们通过自由基提取(即REZCR)提出一个零摄像的角色识别框架,以提高几个样本字符类别的识别性能,在其中我们通过分解和分解和分解和分解和分解和分解字符的图形单位来利用有关的信息重建拼字法之后的字符。 REZCR由基于注意力的激进信息提取器(RIE)和基于知识图的角色推理器(KGR)组成。 RIE的目的是认识到候选激进分子及其从角色图像中可能的结构关系。结果将被馈入KGR,以通过使用预设计的字符知识图来识别目标字符。我们在多个数据集上验证我们的方法,REZCR显示出有希望的实验结果,尤其是对于少数样本字符数据集。
translated by 谷歌翻译
构建高质量的角色图像数据集很具有挑战性,因为现实世界图像通常受图像退化的影响。将当前图像恢复方法应用于此类现实世界字符图像时存在局限性,因为(i)字符图像中的噪声类别与一般图像中的噪声类别不同; (ii)现实世界字符图像通常包含更复杂的图像降解,例如不同噪声水平的混合噪声。为了解决这些问题,我们提出了一个现实世界角色恢复网络(RCRN),以有效恢复降级的角色图像,其中使用字符骨架信息和比例安装特征提取来获得更好的恢复性能。所提出的方法由骨架提取器(SENET)和角色图像修复器(CIRNET)组成。 Senet旨在保持角色的结构一致性并使复杂的噪声正常化。然后,Cirnet从降级的角色图像及其骨骼中重建了清洁图像。由于缺乏现实世界字符图像恢复的基准,我们构建了一个包含1,606个字符图像的数据集,这些图像具有现实世界中的降级,以评估所提出方法的有效性。实验结果表明,RCRN在定量和质量上优于最先进的方法。
translated by 谷歌翻译
轻巧的人群计数模型,尤其是基于知识蒸馏(KD)的模型,由于其对计算效率和硬件需求的优越性,近年来吸引了人们的关注。但是,现有的基于KD的模型通常会遇到容量差距问题,从而导致学生网络的性能受到教师网络的限制。在本文中,我们通过在研究过程中引起了人类养生机制的审查机制,通过引入新的审查机制来解决这个问题。因此,提出的模型被称为ReviewKD。所提出的模型包括指导阶段和审查阶段,我们首先利用训练有素的重型教师网络将其潜在特征转移到指导阶段的轻量级学生网络中,然后在审核阶段中产生了精致的估计。密度图通过审查机制基于学习的功能。与最新模型相比,通过六个基准数据集的一组实验证明了评论KD的有效性。数值结果表明,ReviewKD的表现优于现有的轻量级模型用于人群计数,并且可以有效缓解容量差距问题,尤其是在教师网络之外的表现。除了轻巧的型号外,我们还表明,建议的审查机制可以用作插件模块,以进一步提高一种沉重的人群计数模型的性能,而无需修改神经网络体系结构并引入任何其他模型参数。
translated by 谷歌翻译