智能论文笔记

Generative Colorization of Structured Mobile Web Pages

Kotaro Kikuchi , Naoto Inoue , Mayu Otani , Edgar Simo-Serra , Kota Yamaguchi

分类：计算机视觉

2022-12-22

Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due to the lack of a formalization of the web page colorization problem, datasets, and evaluation protocols. In this work, we propose a new dataset consisting of e-commerce mobile web pages in a tractable format, which are created by simplifying the pages and extracting canonical color styles with a common web browser. The web page colorization problem is then formalized as a task of estimating plausible color styles for a given web page content with a given hierarchical structure of the elements. We present several Transformer-based methods that are adapted to this task by prepending structural message passing to capture hierarchical relationships between elements. Experimental results, including a quantitative evaluation designed for this task, demonstrate the advantages of our methods over statistical and image colorization methods. The code is available at https://github.com/CyberAgentAILab/webcolor.

translated by 谷歌翻译

E-commerce users' preferences for delivery options

Yuki Oyama , Daisuke Fukuda , Naoto Imura , Katsuhiro Nishinari

分类：机器学习

2022-12-30

Many e-commerce marketplaces offer their users fast delivery options for free to meet the increasing needs of users, imposing an excessive burden on city logistics. Therefore, understanding e-commerce users' preference for delivery options is a key to designing logistics policies. To this end, this study designs a stated choice survey in which respondents are faced with choice tasks among different delivery options and time slots, which was completed by 4,062 users from the three major metropolitan areas in Japan. To analyze the data, mixed logit models capturing taste heterogeneity as well as flexible substitution patterns have been estimated. The model estimation results indicate that delivery attributes including fee, time, and time slot size are significant determinants of the delivery option choices. Associations between users' preferences and socio-demographic characteristics, such as age, gender, teleworking frequency and the presence of a delivery box, were also suggested. Moreover, we analyzed two willingness-to-pay measures for delivery, namely, the value of delivery time savings (VODT) and the value of time slot shortening (VOTS), and applied a non-semiparametric approach to estimate their distributions in a data-oriented manner. Although VODT has a large heterogeneity among respondents, the estimated median VODT is 25.6 JPY/day, implying that more than half of the respondents would wait an additional day if the delivery fee were increased by only 26 JPY, that is, they do not necessarily need a fast delivery option but often request it when cheap or almost free. Moreover, VOTS was found to be low, distributed with the median of 5.0 JPY/hour; that is, users do not highly value the reduction in time slot size in monetary terms. These findings on e-commerce users' preferences can help in designing levels of service for last-mile delivery to significantly improve its efficiency.

translated by 谷歌翻译

Local Differential Privacy Image Generation Using Flow-based Deep Generative Models

Hisaichi Shibata , Shouhei Hanaoka , Yang Cao , Masatoshi Yoshikawa , Tomomi Takenaga , Yukihiro Nomura , Naoto Hayashi , Osamu Abe

分类：计算机视觉

2022-12-20

Diagnostic radiologists need artificial intelligence (AI) for medical imaging, but access to medical images required for training in AI has become increasingly restrictive. To release and use medical images, we need an algorithm that can simultaneously protect privacy and preserve pathologies in medical images. To develop such an algorithm, here, we propose DP-GLOW, a hybrid of a local differential privacy (LDP) algorithm and one of the flow-based deep generative models (GLOW). By applying a GLOW model, we disentangle the pixelwise correlation of images, which makes it difficult to protect privacy with straightforward LDP algorithms for images. Specifically, we map images onto the latent vector of the GLOW model, each element of which follows an independent normal distribution, and we apply the Laplace mechanism to the latent vector. Moreover, we applied DP-GLOW to chest X-ray images to generate LDP images while preserving pathologies.

translated by 谷歌翻译

Fixed-Weight Difference Target Propagation

Tatsukichi Shibuya , Nakamasa Inoue , Rei Kawakami , Ikuro Sato

分类：神经与进化计算 | 机器学习

2022-12-19

Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter tuning is required to synchronize the feedforward and feedback training, and frequent updates of the feedback path are usually required than that of the feedforward path. Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work? We answer this question by presenting Fixed-Weight Difference Target Propagation (FW-DTP) that keeps the feedback weights constant during training. We confirmed that this simple method, which naturally resolves the abovementioned problems of TP, can still deliver informative target values to hidden layers for a given task; indeed, FW-DTP consistently achieves higher test performance than a baseline, the Difference Target Propagation (DTP), on four classification datasets. We also present a novel propagation architecture that explains the exact form of the feedback function of DTP to analyze FW-DTP.

translated by 谷歌翻译

Learning State Transition Rules from Hidden Layers of Restricted Boltzmann Machines

Koji Watanabe , Katsumi Inoue

分类：机器学习 | 人工智能

2022-12-07

Understanding the dynamics of a system is important in many scientific and engineering domains. This problem can be approached by learning state transition rules from observations using machine learning techniques. Such observed time-series data often consist of sequences of many continuous variables with noise and ambiguity, but we often need rules of dynamics that can be modeled with a few essential variables. In this work, we propose a method for extracting a small number of essential hidden variables from high-dimensional time-series data and for learning state transition rules between these hidden variables. The proposed method is based on the Restricted Boltzmann Machine (RBM), which treats observable data in the visible layer and latent features in the hidden layer. However, real-world data, such as video and audio, include both discrete and continuous variables, and these variables have temporal relationships. Therefore, we propose Recurrent Temporal GaussianBernoulli Restricted Boltzmann Machine (RTGB-RBM), which combines Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM) to handle continuous visible variables, and Recurrent Temporal Restricted Boltzmann Machine (RT-RBM) to capture time dependence between discrete hidden variables. We also propose a rule-based method that extracts essential information as hidden variables and represents state transition rules in interpretable form. We conduct experiments on Bouncing Ball and Moving MNIST datasets to evaluate our proposed method. Experimental results show that our method can learn the dynamics of those physical systems as state transition rules between hidden variables and can predict unobserved future states from observed state transitions.

translated by 谷歌翻译

Camelira: An Arabic Multi-Dialect Morphological Disambiguator

Ossama Obeid , Go Inoue , Nizar Habash

分类：自然语言处理

2022-11-30

We present Camelira, a web-based Arabic multi-dialect morphological disambiguation tool that covers four major variants of Arabic: Modern Standard Arabic, Egyptian, Gulf, and Levantine. Camelira offers a user-friendly web interface that allows researchers and language learners to explore various linguistic information, such as part-of-speech, morphological features, and lemmas. Our system also provides an option to automatically choose an appropriate dialect-specific disambiguator based on the prediction of a dialect identification component. Camelira is publicly accessible at http://camelira.camel-lab.com.

translated by 谷歌翻译

Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following

Yuki Inoue , Hiroki Ohashi

分类：机器人 | 计算机视觉

2022-11-07

Embodied Instruction Following (EIF) studies how mobile manipulator robots should be controlled to accomplish long-horizon tasks specified by natural language instructions. While most research on EIF are conducted in simulators, the ultimate goal of the field is to deploy the agents in real life. As such, it is important to minimize the data cost required for training an agent, to help the transition from sim to real. However, many studies only focus on the performance and overlook the data cost -- modules that require separate training on extra data are often introduced without a consideration on deployability. In this work, we propose FILM++ which extends the existing work FILM with modifications that do not require extra data. While all data-driven modules are kept constant, FILM++ more than doubles FILM's performance. Furthermore, we propose Prompter, which replaces FILM++'s semantic search module with language model prompting. Unlike FILM++'s implementation that requires training on extra sets of data, no training is needed for our prompting based implementation while achieving better or at least comparable performance. Prompter achieves 42.64% and 45.72% on the ALFRED benchmark with high-level instructions only and with step-by-step instructions, respectively, outperforming the previous state of the art by 6.57% and 10.31%.

translated by 谷歌翻译

EOD: The IEEE GRSS Earth Observation Database

Michael Schmitt , Pedram Ghamisi , Naoto Yokoya , Ronny Hänsch

分类：计算机视觉

2022-09-26

在深度学习时代，注释的数据集已成为遥感社区的关键资产。在过去的十年中，发表了许多不同的数据集，每个数据集都为特定的数据类型以及特定的任务或应用程序设计。在遥感数据集的丛林中，很难跟踪已经可用的内容。在本文中，我们介绍了EOD -IEEE GRSS地球观察数据库（EOD） - 一个交互式在线平台，用于分类不同类型的数据集利用遥感图像。

translated by 谷歌翻译

MR4MR: Mixed Reality for Melody Reincarnation

Atsuya Kobayashi , Ryogo Ishino , Ryuku Nobusue , Takumi Inoue , Keisuke Okazaki , Shoma Sawa , Nao Tokui

分类：人工智能

2022-09-15

有一段漫长的历史，努力与我们周围的实体和空间探索音乐元素，例如Musique Concr \'Ete和Ambient Music。在计算机音乐和数字艺术的背景下，还设计了集中在周围物体和物理空间上的互动体验。近年来，随着设备的开发和普及，在扩展现实中设计了越来越多的作品，以创造这种音乐体验。在本文中，我们描述了MR4MR，这是一项声音安装工作，使用户可以在混合现实的背景下体验与周围空间相互作用产生的旋律（MR）。用户使用HoloLens，用户可以撞击周围环境中真实对象的虚拟对象。然后，通过遵循物体发出的声音并使用音乐生成机器学习模型进行随机变化并逐渐改变旋律的声音，用户可以感觉到其环境旋律“转世”。

translated by 谷歌翻译

CMOS-based area-and-power-efficient neuron and synapse circuits for time-domain analog spiking neural networks

Xiangyu Chen , Takeaki Yajima , Hisashi Inoue , Isao H. Inoue , Zolboo Byambadorj , Tetsuya Iizuka

分类：神经与进化计算

2022-08-25

传统的神经结构倾向于通过类似数量（例如电流或电压）进行通信，但是，随着CMOS设备收缩和供应电压降低，电压/电流域模拟电路的动态范围变得更窄，可用的边缘变小，噪声免疫力降低。不仅如此，在常规设计中使用操作放大器（运算放大器）和时钟或异步比较器会导致高能量消耗和大型芯片区域，这将不利于构建尖峰神经网络。鉴于此，我们提出了一种神经结构，用于生成和传输时间域信号，包括神经元模块，突触模块和两个重量模块。所提出的神经结构是由晶体管三极区域的泄漏电流驱动的，不使用操作放大器和比较器，因此与常规设计相比，能够提供更高的能量和面积效率。此外，由于内部通信通过时间域信号，该结构提供了更大的噪声免疫力，从而简化了模块之间的接线。提出的神经结构是使用TSMC 65 nm CMOS技术制造的。拟议的神经元和突触分别占据了127 UM2和231 UM2的面积，同时达到了毫秒的时间常数。实际芯片测量表明，所提出的结构成功地用毫秒的时间常数实现了时间信号通信函数，这是迈向人机交互的硬件储层计算的关键步骤。

translated by 谷歌翻译

HTML版本