Jamdani is the strikingly patterned textile heritage of Bangladesh. The exclusive geometric motifs woven on the fabric are the most attractive part of this craftsmanship having a remarkable influence on textile and fine art. In this paper, we have developed a technique based on the Generative Adversarial Network that can learn to generate entirely new Jamdani patterns from a collection of Jamdani motifs that we assembled, the newly formed motifs can mimic the appearance of the original designs. Users can input the skeleton of a desired pattern in terms of rough strokes and our system finalizes the input by generating the complete motif which follows the geometric structure of real Jamdani ones. To serve this purpose, we collected and preprocessed a dataset containing a large number of Jamdani motifs images from authentic sources via fieldwork and applied a state-of-the-art method called pix2pix to it. To the best of our knowledge, this dataset is currently the only available dataset of Jamdani motifs in digital format for computer vision research. Our experimental results of the pix2pix model on this dataset show satisfactory outputs of computer-generated images of Jamdani motifs and we believe that our work will open a new avenue for further research.
translated by 谷歌翻译
Cartoons are an important part of our entertainment culture. Though drawing a cartoon is not for everyone, creating it using an arrangement of basic geometric primitives that approximates that character is a fairly frequent technique in art. The key motivation behind this technique is that human bodies - as well as cartoon figures - can be split down into various basic geometric primitives. Numerous tutorials are available that demonstrate how to draw figures using an appropriate arrangement of fundamental shapes, thus assisting us in creating cartoon characters. This technique is very beneficial for children in terms of teaching them how to draw cartoons. In this paper, we develop a tool - shape2toon - that aims to automate this approach by utilizing a generative adversarial network which combines geometric primitives (i.e. circles) and generate a cartoon figure (i.e. Mickey Mouse) depending on the given approximation. For this purpose, we created a dataset of geometrically represented cartoon characters. We apply an image-to-image translation technique on our dataset and report the results in this paper. The experimental results show that our system can generate cartoon characters from input layout of geometric shapes. In addition, we demonstrate a web-based tool as a practical implication of our work.
translated by 谷歌翻译
The cover is the face of a book and is a point of attraction for the readers. Designing book covers is an essential task in the publishing industry. One of the main challenges in creating a book cover is representing the theme of the book's content in a single image. In this research, we explore ways to produce a book cover using artificial intelligence based on the fact that there exists a relationship between the summary of the book and its cover. Our key motivation is the application of text-to-image synthesis methods to generate images from given text or captions. We explore several existing text-to-image conversion techniques for this purpose and propose an approach to exploit these frameworks for producing book covers from provided summaries. We construct a dataset of English books that contains a large number of samples of summaries of existing books and their cover images. In this paper, we describe our approach to collecting, organizing, and pre-processing the dataset to use it for training models. We apply different text-to-image synthesis techniques to generate book covers from the summary and exhibit the results in this paper.
translated by 谷歌翻译
孟加拉国手语(BDSL)与其他标志语言一样 - 对于普通人来说很难学习,尤其是在表达信件时。在这张海报中,我们提出了Persign,该系统可以通过引入标志手势来重现人的形象。我们使此操作个性化,这意味着生成的图像可以保持人的初始图像轮廓 - 脸部,肤色,服装,背景 - 不变,同时适当地改变了手,手掌和手指位置。我们使用图像到图像翻译技术并构建相应的唯一数据集来完成任务。我们认为,翻译的图像可以减少签名者(使用手语的人)和非签名者之间的沟通差距,而无需事先了解BDSL。
translated by 谷歌翻译
每天,越来越多的人正在转向在线学习,这改变了我们的传统课堂方法。录音讲座一直是在线教育者的正常任务,并且在疫情中最近变得更加重要,因为实际的课程仍在推迟在几个国家。录制讲座时,由于其与计算机接口的便携性和能力,图形平板电脑是一个很大的白板替代白板。然而,这种图形平板电脑对于大多数教师来说太昂贵了。在本文中,我们向教师和教育工作者提出了一种基于计算机视觉的图形平板电脑,这主要以与图形平板电脑相同的方式,而只是需要笔,纸张和笔记本电脑的网络摄像头。我们称之为“自己为自己的图形标签”或“DIY图形选项卡”。我们的系统在由摄像机获取的纸上收到一系列人员写作作为输入的纸张,并输出包含纸张写入内容的屏幕。由于人的手,由于人的手,随机运动,纸张,照明条件不佳,由于视角,透视失真等诸如遮挡等许多障碍物而言。一种管道通过我们的系统,在生成适当的输出之前,进行实例分段和预处理。我们还从教师和学生进行了用户体验评估,并在本文中审查了他们的回复。
translated by 谷歌翻译
面部及其表达是数字图像的有效科目之一。检测图像的情绪是计算机视野领域的古代任务;然而,从图像进行反向合成的面部表达式 - 是非常新的。使用不同面部表情的再生图像的这种操作,或者改变图像中的现有表达需要生成的对抗网络(GaN)。在本文中,我们的目标是使用GaN改变图像中的面部表情,其中具有初始表达式(即,快乐)的输入图像被改变为同一个人的不同表达式(即,厌恶)。我们在Mug数据集的修改版本上使用了Stargn技术来完成此目标。此外,我们通过在从给定文本中的情感指示的图像中重塑面部表情进一步扩展我们的工作。因此,我们应用了一个长期的短期内存(LSTM)方法来从文本中提取情绪并将其转发给我们的表达式更改模块。作为我们的工作管道的演示,我们还创建了一个博客的应用程序原型,该博客将根据用户的文本情绪与不同的表达式重新生成配置文件图片。
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
Compared to regular cameras, Dynamic Vision Sensors or Event Cameras can output compact visual data based on a change in the intensity in each pixel location asynchronously. In this paper, we study the application of current image-based SLAM techniques to these novel sensors. To this end, the information in adaptively selected event windows is processed to form motion-compensated images. These images are then used to reconstruct the scene and estimate the 6-DOF pose of the camera. We also propose an inertial version of the event-only pipeline to assess its capabilities. We compare the results of different configurations of the proposed algorithm against the ground truth for sequences of two publicly available event datasets. We also compare the results of the proposed event-inertial pipeline with the state-of-the-art and show it can produce comparable or more accurate results provided the map estimate is reliable.
translated by 谷歌翻译
With Twitter's growth and popularity, a huge number of views are shared by users on various topics, making this platform a valuable information source on various political, social, and economic issues. This paper investigates English tweets on the Russia-Ukraine war to analyze trends reflecting users' opinions and sentiments regarding the conflict. The tweets' positive and negative sentiments are analyzed using a BERT-based model, and the time series associated with the frequency of positive and negative tweets for various countries is calculated. Then, we propose a method based on the neighborhood average for modeling and clustering the time series of countries. The clustering results provide valuable insight into public opinion regarding this conflict. Among other things, we can mention the similar thoughts of users from the United States, Canada, the United Kingdom, and most Western European countries versus the shared views of Eastern European, Scandinavian, Asian, and South American nations toward the conflict.
translated by 谷歌翻译