智能论文笔记

Realistic Endoscopic Image Generation Method Using Virtual-to-real Image-domain Translation

Masahiro Oda , Kiyohito Tanaka , Hirotsugu Takabatake , Masaki Mori , Hiroshi Natori , Kensaku Mori

分类：计算机视觉

2022-01-13

本文提出了一种用于内窥镜仿真系统中的可视化的现实图像生成方法。在许多医院进行内窥镜诊断和治疗。为了减少与内窥镜插入相关的并发症，内窥镜仿真系统用于内窥镜插入的训练或排练。然而，电流模拟系统产生非现实的虚拟内窥镜图像。为了提高仿真系统的值，需要提高其生成的图像的现实。我们提出了一种用于内窥镜仿真系统的现实图像生成方法。通过使用来自患者的CT体积的体积渲染方法来生成虚拟内窥镜图像。我们使用虚拟到实图像域翻译技术改善虚拟内窥镜图像的现实。图像域转换器实现为完全卷积网络（FCN）。我们通过最小化循环一致性损失函数来训练FCN。使用未配对的虚拟和真实内窥镜图像训练FCN。为了获得高质量的图像域翻译结果，我们执行将图像清理到真实内窥镜图像集。我们测试了使用浅U-Net，U-Net，Deep U-Net和U-Net作为图像域转换器的剩余单元。具有剩余单位的深U-Net和U-Net产生了非常现实的图像。

translated by 谷歌翻译

Depth Estimation from Single-shot Monocular Endoscope Image Using Image Domain Adaptation And Edge-Aware Depth Estimation

Masahiro Oda , Hayato Itoh , Kiyohito Tanaka , Hirotsugu Takabatake , Masaki Mori , Hiroshi Natori , Kensaku Mori

分类：计算机视觉

2022-01-12

我们通过使用多尺度边缘损耗，通过使用域改性和深度估计来提出一种从单次单眼镜片图像中的深度估计方法。我们采用了两步估计过程，包括来自未配对数据和深度估计的兰伯语表面平移。器官表面上的纹理和镜面反射降低了深度估计的准确性。我们将Lambertian表面翻译应用于内窥镜图像以消除这些纹理和反射。然后，我们通过使用完全卷积网络（FCN）来估计深度。在FCN的训练期间，改善估计图像和地面真理深度图像之间的对象边缘相似性对于获得更好的结果是重要的。我们介绍了一个Muti-Scale边缘损耗功能，以提高深度估计的准确性。我们定量评估了使用真实的结肠镜片图像的所提出的方法。估计的深度值与真实深度值成比例。此外，我们将估计的深度图像应用于使用卷积神经网络自动解剖学位置识别的结肠镜图像。通过使用估计的深度图像，网络的识别精度从69.2％提高到74.1％。

translated by 谷歌翻译

Supervised Anomaly Detection Method Combining Generative Adversarial Networks and Three-Dimensional Data in Vehicle Inspections

Yohei Baba , Takuro Hoshi , Ryosuke Mori , Gaurang Gavai

分类：计算机视觉 | 机器学习

2022-12-22

The external visual inspections of rolling stock's underfloor equipment are currently being performed via human visual inspection. In this study, we attempt to partly automate visual inspection by investigating anomaly inspection algorithms that use image processing technology. As the railroad maintenance studies tend to have little anomaly data, unsupervised learning methods are usually preferred for anomaly detection; however, training cost and accuracy is still a challenge. Additionally, a researcher created anomalous images from normal images by adding noise, etc., but the anomalous targeted in this study is the rotation of piping cocks that was difficult to create using noise. Therefore, in this study, we propose a new method that uses style conversion via generative adversarial networks on three-dimensional computer graphics and imitates anomaly images to apply anomaly detection based on supervised learning. The geometry-consistent style conversion model was used to convert the image, and because of this the color and texture of the image were successfully made to imitate the real image while maintaining the anomalous shape. Using the generated anomaly images as supervised data, the anomaly detection model can be easily trained without complex adjustments and successfully detects anomalies.

translated by 谷歌翻译

Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction

Masaki Asada

分类：自然语言处理

2022-12-21

The development of deep neural networks has improved representation learning in various domains, including textual, graph structural, and relational triple representations. This development opened the door to new relation extraction beyond the traditional text-oriented relation extraction. However, research on the effectiveness of considering multiple heterogeneous domain information simultaneously is still under exploration, and if a model can take an advantage of integrating heterogeneous information, it is expected to exhibit a significant contribution to many problems in the world. This thesis works on Drug-Drug Interactions (DDIs) from the literature as a case study and realizes relation extraction utilizing heterogeneous domain information. First, a deep neural relation extraction model is prepared and its attention mechanism is analyzed. Next, a method to combine the drug molecular structure information and drug description information to the input sentence information is proposed, and the effectiveness of utilizing drug molecular structures and drug descriptions for the relation extraction task is shown. Then, in order to further exploit the heterogeneous information, drug-related items, such as protein entries, medical terms and pathways are collected from multiple existing databases and a new data set in the form of a knowledge graph (KG) is constructed. A link prediction task on the constructed data set is conducted to obtain embedding representations of drugs that contain the heterogeneous domain information. Finally, a method that integrates the input sentence information and the heterogeneous KG information is proposed. The proposed model is trained and evaluated on a widely used data set, and as a result, it is shown that utilizing heterogeneous domain information significantly improves the performance of relation extraction from the literature.

translated by 谷歌翻译

Estimating truncation effects of quantum bosonic systems using sampling algorithms

Masanori Hanada , Junyu Liu , Enrico Rinaldi , Masaki Tezuka

分类：人工智能 | 机器学习

2022-12-16

To simulate bosons on a qubit- or qudit-based quantum computer, one has to regularize the theory by truncating infinite-dimensional local Hilbert spaces to finite dimensions. In the search for practical quantum applications, it is important to know how big the truncation errors can be. In general, it is not easy to estimate errors unless we have a good quantum computer. In this paper we show that traditional sampling methods on classical devices, specifically Markov Chain Monte Carlo, can address this issue with a reasonable amount of computational resources available today. As a demonstration, we apply this idea to the scalar field theory on a two-dimensional lattice, with a size that goes beyond what is achievable using exact diagonalization methods. This method can be used to estimate the resources needed for realistic quantum simulations of bosonic theories, and also, to check the validity of the results of the corresponding quantum simulations.

translated by 谷歌翻译

Multi-objective Tree-structured Parzen Estimator Meets Meta-learning

Shuhei Watanabe , Noow Awad , Masaki Onishi , Frank Hutter

分类：机器学习 | 人工智能

2022-12-13

Hyperparameter optimization (HPO) is essential for the better performance of deep learning, and practitioners often need to consider the trade-off between multiple metrics, such as error rate, latency, memory requirements, robustness, and algorithmic fairness. Due to this demand and the heavy computation of deep learning, the acceleration of multi-objective (MO) optimization becomes ever more important. Although meta-learning has been extensively studied to speedup HPO, existing methods are not applicable to the MO tree-structured parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function to the meta-learning setting, using a task similarity defined by the overlap in promising domains of each task. In a comprehensive set of experiments, we demonstrate that our method accelerates MO-TPE on tabular HPO benchmarks and yields state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers".

translated by 谷歌翻译

Efficient stereo matching on embedded GPUs with zero-means cross correlation

Qiong Chang , Aolong Zha , Weimin Wang , Xin Liu , Masaki Onishi , Lei Lei , Meng Joo Er , Tsutomu Maruyama

分类：计算机视觉

2022-12-01

Mobile stereo-matching systems have become an important part of many applications, such as automated-driving vehicles and autonomous robots. Accurate stereo-matching methods usually lead to high computational complexity; however, mobile platforms have only limited hardware resources to keep their power consumption low; this makes it difficult to maintain both an acceptable processing speed and accuracy on mobile platforms. To resolve this trade-off, we herein propose a novel acceleration approach for the well-known zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU. In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels; this reduces the amount of data transmission and increases the utilization of on-chip registers, thus increasing the processing speed. As a result, our method is 2X faster than the traditional image scanning method, and 26% faster than the latest NCC method. By combining this technique with the domain transformation (DT) algorithm, our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128. Additionally, the evaluation results on the KITTI 2015 benchmark show that our combined system is more accurate than the same algorithm combined with census by 7.26%, while maintaining almost the same processing speed.

translated by 谷歌翻译

Spatiotemporal forecasting of track geometry irregularities with exogenous factors

Katsuya Kosukegawa , Yasukuni Mori , Hiroki Suyari , Kazuhiko Kawamoto

分类：机器学习 | 人工智能

2022-11-07

To ensure the safety of railroad operations, it is important to monitor and forecast track geometry irregularities. A higher safety requires forecasting with a higher spatiotemporal frequency. For forecasting with a high spatiotemporal frequency, it is necessary to capture spatial correlations. Additionally, track geometry irregularities are influenced by multiple exogenous factors. In this study, we propose a method to forecast one type of track geometry irregularity, vertical alignment, by incorporating spatial and exogenous factor calculations. The proposed method embeds exogenous factors and captures spatiotemporal correlations using a convolutional long short-term memory (ConvLSTM). In the experiment, we compared the proposed method with other methods in terms of the forecasting performance. Additionally, we conducted an ablation study on exogenous factors to examine their contribution to the forecasting performance. The results reveal that spatial calculations and maintenance record data improve the forecasting of the vertical alignment.

translated by 谷歌翻译

Recipe Generation from Unsegmented Cooking Videos

Taichi Nishimura , Atsushi Hashimoto , Yoshitaka Ushiku , Hirotaka Kameko , Shinsuke Mori

分类：自然语言处理 | 计算机视觉

2022-09-21

本文从未分割的烹饪视频中解决了食谱生成，该任务要求代理（1）提取完成盘子时提取关键事件，以及（2）为提取的事件生成句子。我们的任务类似于密集的视频字幕（DVC），该字幕旨在彻底检测事件并为其生成句子。但是，与DVC不同，在食谱生成中，食谱故事意识至关重要，模型应以正确的顺序输出适当数量的关键事件。我们分析了DVC模型的输出，并观察到，尽管（1）几个事件可作为食谱故事采用，但（2）此类事件的生成句子并未基于视觉内容。基于此，我们假设我们可以通过从DVC模型的输出事件中选择Oracle事件并为其重新生成句子来获得正确的配方。为了实现这一目标，我们提出了一种基于变压器的新型训练事件选择器和句子生成器的联合方法，用于从DVC模型的输出中选择Oracle事件并分别为事件生成接地句子。此外，我们通过包括成分来生成更准确的配方来扩展模型。实验结果表明，所提出的方法优于最先进的DVC模型。我们还确认，通过以故事感知方式对食谱进行建模，提出的模型以正确的顺序输出适当数量的事件。

translated by 谷歌翻译

A Few-shot Approach to Resume Information Extraction via Prompts

Chengguang Gan , Tatsunori Mori

分类：自然语言处理

2022-09-20

已显示迅速学习可以在大多数文本分类任务中实现近调调节性能，但很少有培训示例。对于样品稀缺的NLP任务是有利的。在本文中，我们试图将其应用于实际情况，即恢复信息提取，并增强现有方法，以使其更适用于简历信息提取任务。特别是，我们根据简历的文本特征创建了多组手动模板和语言器。此外，我们比较了蒙版语言模型（MLM）预培训语言模型（PLM）和SEQ2SEQ PLM在此任务上的性能。此外，我们改进了口头设计的设计方法，用于知识渊博的及时调整，以便为其他基于应用程序的NLP任务的迅速模板和语言设计的设计提供了示例。在这种情况下，我们提出了手动知识渊博的语言器（MKV）的概念。构造与应用程序方案相对应的知识渊博的口头表的规则。实验表明，基于我们的规则设计的模板和言语器比现有的手动模板更有效，更强大，并自动生成及时方法。已经确定，当前可用的自动提示方法无法与手动设计的及时模板竞争一些现实的任务方案。最终混淆矩阵的结果表明，我们提出的MKV显着解决了样本不平衡问题。

translated by 谷歌翻译