智能论文笔记

A Taxonomy of Prompt Modifiers for Text-To-Image Generation

Jonas Oppenlaender

分类：自然语言处理

2022-04-20

自2021年以来，文本到图像的生成引起了人们的关注。如今，可以通过深层生成模型从文本输入（“提示”）中综合美丽而有趣的数字图像和艺术品。围绕文本图像生成和AI生成的艺术的在线社区很快就出现了。本文根据3个月的人种学研究确定了在线社区中从业人员使用的六种类型的迅速修饰符。迅速修饰符的新颖分类学为研究人员提供了研究文本到图像生成实践的概念起点，但也可以帮助AI生成的ART的实践者改善其图像。我们进一步概述了如何在“及时工程”的实践中应用及时修饰符。我们讨论了这种新颖的创造性实践在人类互动（HCI）领域的研究机会。本文最后讨论了从人类互动（HAI）（HAI）在未来的应用中，除文本到图像生成和AI生成的艺术的用例之外，从人类互动（HAI）的角度讨论了更广泛的含义。

translated by 谷歌翻译

The Infinite Index: Information Retrieval on Generative Text-To-Image Models

Niklas Deckers , Maik Fröbe , Johannes Kiesel , Gianluca Pandolfo , Christopher Schröder , Benno Stein , Martin Potthast

分类：自然语言处理 | 计算机视觉

2022-12-14

The text-to-image model Stable Diffusion has recently become very popular. Only weeks after its open source release, millions are experimenting with image generation. This is due to its ease of use, since all it takes is a brief description of the desired image to "prompt" the generative model. Rarely do the images generated for a new prompt immediately meet the user's expectations. Usually, an iterative refinement of the prompt ("prompt engineering") is necessary for satisfying images. As a new perspective, we recast image prompt engineering as interactive image retrieval - on an "infinite index". Thereby, a prompt corresponds to a query and prompt engineering to query refinement. Selected image-prompt pairs allow direct relevance feedback, as the model can modify an image for the refined prompt. This is a form of one-sided interactive retrieval, where the initiative is on the user side, whereas the server side remains stateless. In light of an extensive literature review, we develop these parallels in detail and apply the findings to a case study of a creative search task on such a model. We note that the uncertainty in searching an infinite index is virtually never-ending. We also discuss future research opportunities related to retrieval models specialized for generative models and interactive generative image retrieval. The application of IR technology, such as query reformulation and relevance feedback, will contribute to improved workflows when using generative models, while the notion of an infinite index raises new challenges in IR research.

translated by 谷歌翻译

How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models

Hai Dang , Lukas Mecke , Florian Lehmann , Sven Goller , Daniel Buschek

分类：自然语言处理

2022-09-03

深层生成模型有可能从根本上改变我们创建高保真数字内容的方式，但通常很难控制。提示生成模型是一个有希望的最新发展，原则上，最终用户可以创造性地利用零击和几乎没有学习的学习来将新任务分配给AI Ad-Hoc，只需将其写下即可。但是，对于大多数最终用户而言，编写有效提示目前主要是试验和错误过程。为了解决这个问题，我们讨论了使用促使人类互动的新范式的交互式创意应用程序的关键机会和挑战。根据我们的分析，我们为支持提示的用户界面提出了四个设计目标。我们用混凝土UI设计草图说明了这些内容，重点是创意写作的用例。HCI和AI的研究社区可以将这些作为起点，以开发足够的用户界面，以供能够零和少数学习的模型。

translated by 谷歌翻译

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

Katherine Crowson , Stella Biderman , Daniel Kornis , Dashiell Stander , Eric Hallahan , Louis Castricato , Edward Raff

分类：计算机视觉

2022-04-18

从开放式域文本提示中生成和编辑图像是迄今为止需要昂贵且经过特殊训练的型号的一项挑战性的任务。我们为这两个任务展示了一种新颖的方法，该方法能够通过使用多模式编码器来指导图像世代，从而从具有显着语义复杂性的文本提示中产生高视觉质量的图像。我们在各种任务上说明了如何使用夹[37]引导VQGAN [11]产生的视觉质量输出比先前的较不灵活的方法，例如DALL-E [38]，Glide [33]和Open-Edit [24]，尽管没有接受培训的任务。我们的代码在公共存储库中可用。

translated by 谷歌翻译

What is it like to program with artificial intelligence?

Advait Sarkar , Andrew D. Gordon , Carina Negreanu , Christian Poelitz , Sruti Srinivasa Ragavan , Ben Zorn

分类：人工智能

2022-08-12

大型语言模型，例如OpenAI的法典和DeepMind的字母，可以生成代码来解决以自然语言表达的各种问题。这项技术已经在至少一项广泛使用的编程编辑器扩展程序中进行了商业化：Github Copilot。在本文中，我们探讨了具有大型语言模型（LLM辅助编程）的编程与程序员协助的先前概念化相似，并且与众不同。我们借鉴了公开可用的经验报告，有关LLM辅助编程以及先前的可用性和设计研究。我们发现，尽管LLM辅助编程通过搜索和重用分享了一些编译，配对编程和编程的属性，但技术可能性和实践经验都存在根本差异。因此，应该将LLM辅助编程视为具有自己独特的属性和挑战的新方法。最后，我们借鉴了用户研究的观察结果，在该观察中，非专家最终用户程序员使用LLM辅助工具来求解电子表格中的数据任务。我们讨论可能出现的问题，并在将大型语言模型应用于最终用户编程时，尤其是对于几乎没有编程专业知识的用户。

translated by 谷歌翻译

AI Art in Architecture

Joern Ploennigs , Markus Berger

分类：人工智能

2022-12-19

Recent diffusion-based AI art platforms are able to create impressive images from simple text descriptions. This makes them powerful tools for concept design in any discipline that requires creativity in visual design tasks. This is also true for early stages of architectural design with multiple stages of ideation, sketching and modelling. In this paper, we investigate how applicable diffusion-based models already are to these tasks. We research the applicability of the platforms Midjourney, DALL-E 2 and StableDiffusion to a series of common use cases in architectural design to determine which are already solvable or might soon be. We also analyze how they are already being used by analyzing a data set of 40 million Midjourney queries with NLP methods to extract common usage patterns. With this insights we derived a workflow to interior and exterior design that combines the strengths of the individual platforms.

translated by 谷歌翻译

When Creators Meet the Metaverse: A Survey on Computational Arts

Lik-Hang Lee , Zijun Lin , Rui Hu , Zhengya Gong , Abhishek Kumar , Tangyao Li , Sijia Li , Pan Hui

分类：人工智能 | 机器学习

2021-11-26

MetaVerse，巨大的虚拟物理网络空间，为艺术家带来了前所未有的机会，将我们的身体环境的每个角落与数字创造力混合。本文对计算艺术进行了全面的调查，其中七个关键主题与成权相关，描述了混合虚拟物理现实中的新颖艺术品。主题首先涵盖了MetaVerse的建筑元素，例如虚拟场景和字符，听觉，文本元素。接下来，已经反映了诸如沉浸式艺术，机器人艺术和其他用户以其他用户的方法提供了沉浸式艺术，机器人艺术和其他用户中心的若干非凡类型的新颖创作。最后，我们提出了几项研究议程：民主化的计算艺术，数字隐私和搬迁艺术家的安全性，为数字艺术品，技术挑战等等的所有权认可。该调查还担任艺术家和搬迁技术人员的介绍材料，以开始在超现实主义网络空间领域创造。

translated by 谷歌翻译

Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans

John J. Nay

分类：人工智能 | 机器学习

2022-09-14

We are currently unable to specify human goals and societal values in a way that reliably directs AI behavior. Law-making and legal interpretation form a computational engine that converts opaque human values into legible directives. "Law Informs Code" is the research agenda capturing complex computational legal processes, and embedding them in AI. Similar to how parties to a legal contract cannot foresee every potential contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex ante specify rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code. We describe how data generated by legal processes (methods of law-making, statutory interpretation, contract drafting, applications of legal standards, legal reasoning, etc.) can facilitate the robust specification of inherently vague human goals. This increases human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we present a framework for understanding law as the applied philosophy of multi-agent alignment. Although law is partly a reflection of historically contingent political power - and thus not a perfect aggregation of citizen preferences - if properly parsed, its distillation offers the most legitimate computational comprehension of societal values available. If law eventually informs powerful AI, engaging in the deliberative political process to improve law takes on even more meaning.

translated by 谷歌翻译

Structured Like a Language Model: Analysing AI as an Automated Subject

Liam Magee , Vanicka Arora , Luke Munn

分类：人工智能

2022-12-08

Drawing from the resources of psychoanalysis and critical media studies, in this paper we develop an analysis of Large Language Models (LLMs) as automated subjects. We argue the intentional fictional projection of subjectivity onto LLMs can yield an alternate frame through which AI behaviour, including its productions of bias and harm, can be analysed. First, we introduce language models, discuss their significance and risks, and outline our case for interpreting model design and outputs with support from psychoanalytic concepts. We trace a brief history of language models, culminating with the releases, in 2022, of systems that realise state-of-the-art natural language processing performance. We engage with one such system, OpenAI's InstructGPT, as a case study, detailing the layers of its construction and conducting exploratory and semi-structured interviews with chatbots. These interviews probe the model's moral imperatives to be helpful, truthful and harmless by design. The model acts, we argue, as the condensation of often competing social desires, articulated through the internet and harvested into training data, which must then be regulated and repressed. This foundational structure can however be redirected via prompting, so that the model comes to identify with, and transfer, its commitments to the immediate human subject before it. In turn, these automated productions of language can lead to the human subject projecting agency upon the model, effecting occasionally further forms of countertransference. We conclude that critical media methods and psychoanalytic theory together offer a productive frame for grasping the powerful new capacities of AI-driven language systems.

translated by 谷歌翻译

AI in HCI Design and User Experience

Wei Xu

分类：人工智能

2023-01-03

In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.

translated by 谷歌翻译

The Biased Artist: Exploiting Cultural Biases via Homoglyphs in Text-Guided Image Generation Models

Lukas Struppek , Dominik Hintersdorf , Kristian Kersting

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-19

文本指导的图像生成模型，例如DALL-E 2和稳定的扩散，最近受到了学术界和公众的关注。这些模型提供了文本描述，能够生成描绘各种概念和样式的高质量图像。但是，此类模型接受了大量公共数据的培训，并从其培训数据中隐含地学习关系，这些数据并不明显。我们证明，可以通过简单地用视觉上类似的非拉丁字符替换文本描述中的单个字符来触发并注入生成的图像中的常见多模型模型，这些偏见可以被触发并注入生成的图像。这些所谓的同符文更换使恶意用户或服务提供商能够诱导偏见到生成的图像中，甚至使整个一代流程变得无用。我们实际上说明了对DALL-E 2和稳定扩散的这种攻击，例如文本引导的图像生成模型，并进一步表明夹子的行为也相似。我们的结果进一步表明，经过多语言数据训练的文本编码器提供了一种减轻同符替代效果的方法。

translated by 谷歌翻译

Steps towards prompt-based creation of virtual worlds

Jasmine Roberts , Andrzej Banburski-Fahey , Jaron Lanier

分类：人工智能 | 机器学习

2022-11-10

Large language models trained for code generation can be applied to speaking virtual worlds into existence (creating virtual worlds). In this work we show that prompt-based methods can both accelerate in-VR level editing, as well as can become part of gameplay rather than just part of game development. As an example, we present Codex VR Pong which shows non-deterministic game mechanics using generative processes to not only create static content but also non-trivial interactions between 3D objects. This demonstration naturally leads to an integral discussion on how one would evaluate and benchmark experiences created by generative models - as there are no qualitative or quantitative metrics that apply in these scenarios. We conclude by discussing impending challenges of AI-assisted co-creation in VR.

translated by 谷歌翻译

Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers

Daphne Ippolito , Ann Yuan , Andy Coenen , Sehmon Burnam

分类：自然语言处理

2022-11-09

Recent developments in natural language generation (NLG) using neural language models have brought us closer than ever to the goal of building AI-powered creative writing tools. However, most prior work on human-AI collaboration in the creative writing domain has evaluated new systems with amateur writers, typically in contrived user studies of limited scope. In this work, we commissioned 13 professional, published writers from a diverse set of creative writing backgrounds to craft stories using Wordcraft, a text editor with built-in AI-powered writing assistance tools. Using interviews and participant journals, we discuss the potential of NLG to have significant impact in the creative writing domain--especially with respect to brainstorming, generation of story details, world-building, and research assistance. Experienced writers, more so than amateurs, typically have well-developed systems and methodologies for writing, as well as distinctive voices and target audiences. Our work highlights the challenges in building for these writers; NLG technologies struggle to preserve style and authorial voice, and they lack deep understanding of story contents. In order for AI-powered writing assistants to realize their full potential, it is essential that they take into account the diverse goals and expertise of human writers.

translated by 谷歌翻译

Survey of Generative Methods for Social Media Analysis

Stan Matwin , Aristides Milios , Paweł Prałat , Amilcar Soares , François Théberge

分类：机器学习

2021-12-13

本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片（Sota）。它填补了空白，因为现有的调查文章在其范围内或被约会。我们包括两个重要方面，目前正在挖掘和建模社交媒体的重要性：动态和网络。社会动态对于了解影响影响或疾病的传播，友谊的形成，友谊的形成等，另一方面，可以捕获各种复杂关系，提供额外的洞察力和识别否则将不会被注意的重要模式。

translated by 谷歌翻译

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Jiahui Yu , Yuanzhong Xu , Jing Yu Koh , Thang Luong , Gunjan Baid , Zirui Wang , Vijay Vasudevan , Alexander Ku , Yinfei Yang , Burcu Karagol Ayan

分类：计算机视觉 | 机器学习

2022-06-22

我们介绍了自回归文本到图像（Parti）模型的途径，该模型生成高保真的影像图像并支持涉及复杂组成和世界知识的内容丰富的合成。 Parti将文本对图像生成视为类似于机器翻译的序列到序列建模问题，图像令牌的序列是目标输出，而不是其他语言的文本令牌。这种策略自然可以利用大型语言模型的先前工作，通过扩展数据和模型尺寸，能力和性能的持续进展。我们的方法很简单：首先，Parti使用基于变压器的图像令牌VIT-VQGAN将图像编码为离散令牌的序列。其次，我们通过将编码器二次变压器模型缩放到20B参数来实现一致的质量改进，其新的最新零弹药FID得分为7.23，而MS-Coco的FIDED得分为3.22。我们对本地化叙述以及党的详细分析（P2），这是1600多个英语提示的新的整体基准，证明了Parti在各种类别和难度方面的有效性。我们还探索并突出了我们的模型的局限性，以定义和体现关注重点领域以进一步改进。有关高分辨率图像，请参见https://parti.research.google/。

translated by 谷歌翻译

Generative Transformers for Design Concept Generation

Qihao Zhu , Jianxi Luo

分类：自然语言处理

2022-11-07

Generating novel and useful concepts is essential during the early design stage to explore a large variety of design opportunities, which usually requires advanced design thinking ability and a wide range of knowledge from designers. Growing works on computer-aided tools have explored the retrieval of knowledge and heuristics from design data. However, they only provide stimuli to inspire designers from limited aspects. This study explores the recent advance of the natural language generation (NLG) technique in the artificial intelligence (AI) field to automate the early-stage design concept generation. Specifically, a novel approach utilizing the generative pre-trained transformer (GPT) is proposed to leverage the knowledge and reasoning from textual data and transform them into new concepts in understandable language. Three concept generation tasks are defined to leverage different knowledge and reasoning: domain knowledge synthesis, problem-driven synthesis, and analogy-driven synthesis. The experiments with both human and data-driven evaluation show good performance in generating novel and useful concepts.

translated by 谷歌翻译

Foundation models in brief: A historical, socio-technical focus

Johannes Schneider

分类：人工智能

2022-12-17

Foundation models can be disruptive for future AI development by scaling up deep learning in terms of model size and training data's breadth and size. These models achieve state-of-the-art performance (often through further adaptation) on a variety of tasks in domains such as natural language processing and computer vision. Foundational models exhibit a novel {emergent behavior}: {In-context learning} enables users to provide a query and a few examples from which a model derives an answer without being trained on such queries. Additionally, {homogenization} of models might replace a myriad of task-specific models with fewer very large models controlled by few corporations leading to a shift in power and control over AI. This paper provides a short introduction to foundation models. It contributes by crafting a crisp distinction between foundation models and prior deep learning models, providing a history of machine learning leading to foundation models, elaborating more on socio-technical aspects, i.e., organizational issues and end-user interaction, and a discussion of future research.

translated by 谷歌翻译

Artificial Intelligence for Health Message Generation: Theory, Method, and an Empirical Study Using Prompt Engineering

Sue Lim , Ralf Schmälzle

分类：自然语言处理

2022-12-14

This study introduces and examines the potential of an AI system to generate health awareness messages. The topic of folic acid, a vitamin that is critical during pregnancy, served as a test case. Using prompt engineering, we generated messages that could be used to raise awareness and compared them to retweeted human-generated messages via computational and human evaluation methods. The system was easy to use and prolific, and computational analyses revealed that the AI-generated messages were on par with human-generated ones in terms of sentiment, reading ease, and semantic content. Also, the human evaluation study showed that AI-generated messages ranked higher in message quality and clarity. We discuss the theoretical, practical, and ethical implications of these results.

translated by 谷歌翻译

The Turing Deception

David Noever , Matt Ciolino

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-09

This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges -- summarization, and question answering -- prompt ChatGPT to produce original content (98-99%) from a single text entry and also sequential questions originally posed by Turing in 1950. The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, and overall quality. While Turing's original prose scores at least 14% below the machine-generated output, the question of whether an algorithm displays hints of Turing's truly original thoughts (the "Lovelace 2.0" test) remains unanswered and potentially unanswerable for now.

translated by 谷歌翻译

Creative Wand: A System to Study Effects of Communications in Co-Creative Settings

Zhiyu Lin , Rohan Agarwal , Mark Riedl

分类：人工智能

2022-08-04

最近的神经生成系统已经证明了程序性生成游戏内容，图像，故事等的潜力。但是，大多数神经生成算法是“不受控制的”，因为用户在最初的及时规范之外的创意决策中几乎没有发言权。共同创造性的混合定位系统需要以用户为中心的影响算法，尤其是当用户不太可能拥有机器学习专业知识时。共同创造系统的关键是能够从用户到代理以及从代理到用户传达想法和意图的能力。共同创造的AI中的关键问题包括：用户如何表达自己的创造意图？ Creative AI系统如何传达他们的信念，解释他们的举动或指示用户代表他们采取行动？ Creative AI系统何时应该采取主动？此类问题的答案以及更多的答案将使我们能够开发出更好的共同创造系统，从而使人类更有能力表达自己的创造意图。我们介绍了Creative-Wand，这是一个可定制的框架，用于调查共同创造的混合发电生成。 Creative-Wand可以将生成模型和人类代理通信渠道的插入式注射到基于聊天的接口中。它提供了许多维度，在共同创造过程中，AI发生器和人类可以进行交流。我们通过使用该框架来研究共同创造性通信全球广播的一个维度与本地创意意图通过讲故事的上下文来说明创意范围的框架。

translated by 谷歌翻译