智能论文笔记

3D scene reconstruction from monocular spherical video with motion parallax

Kenji Tanaka

分类：计算机视觉 | 机器人

2022-06-14

在本文中，我们描述了一种使用带有运动视差的单个球形视频中的两个相邻帧捕获几乎完全球形（360度）深度信息的方法。在使用两个球形摄像头说明了球形深度信息检索之后，我们通过使用稳定的第一人称视频录像来证明单眼球形立体声。实验表明，在整个球体的97％上以实体角度检索了深度信息。以30 km/h的速度，我们能够估计距相机30 m以上物体的深度。我们还使用获得的深度数据重建了3D结构（点云），并确认可以清楚地观察到结构。我们可以将此方法应用于周围环境的3D结构检索，例如1）预言，胶片的位置狩猎/计划，2）真实场景/计算机图形合成和3）运动捕获。由于其简单性，该方法可以应用于各种视频。由于没有其他前条件，除了要进行360个带有运动视差的视频，因此我们可以使用任何360个视频，包括Internet上的视频来重建周围环境。摄像机可以轻巧，以安装在无人机上。我们还展示了此类应用。

translated by 谷歌翻译

Attempt to Predict Failure Case Classification in a Failure Database by using Neural Network Models

Koichi Bando , Kenji Tanaka

分类：机器学习

2021-08-29

随着最近的信息技术进步，网络信息系统的使用迅速扩展。银行和公司之间的电子商务和电子支付，以及公众使用的在线购物和社交网络服务是此类系统的示例。因此，为了维护和提高这些系统的可靠性，我们正在构建来自过去故障情况的故障数据库。将新故障案例导入数据库时，必须根据失败类型对这些情况进行分类。问题是分类的准确性和效率。特别是在使用多个个人时，需要统一分类。因此，我们试图使用机器学习自动化分类。作为评估模型，我们选择了多层的Perceptron（MLP），卷积神经网络（CNN）和经常性神经网络（RNN），其是使用神经网络的模型。结果，在精度方面的最佳模型首先是CNN之后的MLP，并且分类的处理时间是实用的。

translated by 谷歌翻译

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

分类：机器学习 | 计算机视觉

2022-12-27

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

translated by 谷歌翻译

Co-evolving morphology and control of soft robots using a single genome

Fabio Tanaka , Claus Aranha

分类：人工智能

2022-12-22

When simulating soft robots, both their morphology and their controllers play important roles in task performance. This paper introduces a new method to co-evolve these two components in the same process. We do that by using the hyperNEAT algorithm to generate two separate neural networks in one pass, one responsible for the design of the robot body structure and the other for the control of the robot. The key difference between our method and most existing approaches is that it does not treat the development of the morphology and the controller as separate processes. Similar to nature, our method derives both the "brain" and the "body" of an agent from a single genome and develops them together. While our approach is more realistic and doesn't require an arbitrary separation of processes during evolution, it also makes the problem more complex because the search space for this single genome becomes larger and any mutation to the genome affects "brain" and the "body" at the same time. Additionally, we present a new speciation function that takes into consideration both the genotypic distance, as is the standard for NEAT, and the similarity between robot bodies. By using this function, agents with very different bodies are more likely to be in different species, this allows robots with different morphologies to have more specialized controllers since they won't crossover with other robots that are too different from them. We evaluate the presented methods on four tasks and observe that even if the search space was larger, having a single genome makes the evolution process converge faster when compared to having separated genomes for body and control. The agents in our population also show morphologies with a high degree of regularity and controllers capable of coordinating the voxels to produce the necessary movements.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

On Mini-Batch Training with Varying Length Time Series

Brian Kenji Iwana

分类：机器学习 | 计算机视觉

2022-12-13

In real-world time series recognition applications, it is possible to have data with varying length patterns. However, when using artificial neural networks (ANN), it is standard practice to use fixed-sized mini-batches. To do this, time series data with varying lengths are typically normalized so that all the patterns are the same length. Normally, this is done using zero padding or truncation without much consideration. We propose a novel method of normalizing the lengths of the time series in a dataset by exploiting the dynamic matching ability of Dynamic Time Warping (DTW). In this way, the time series lengths in a dataset can be set to a fixed size while maintaining features typical to the dataset. In the experiments, all 11 datasets with varying length time series from the 2018 UCR Time Series Archive are used. We evaluate the proposed method by comparing it with 18 other length normalization methods on a Convolutional Neural Network (CNN), a Long-Short Term Memory network (LSTM), and a Bidirectional LSTM (BLSTM).

translated by 谷歌翻译

Smoothly Connected Preemptive Impact Reduction and Contact Impedance Control

Hikaru Arita , Hayato Nakamura , Takuto Fujiki , Kenji Tahara

分类：机器人

2022-12-07

This study proposes novel control methods that lower impact force by preemptive movement and smoothly transition to conventional contact impedance control. These suggested techniques are for force control-based robots and position/velocity control-based robots, respectively. Strong impact forces have a negative influence on multiple robotic tasks. Recently, preemptive impact reduction techniques that expand conventional contact impedance control by using proximity sensors have been examined. However, a seamless transition from impact reduction to contact impedance control has not yet been accomplished. The proposed methods utilize a serial combined impedance control framework to solve this problem. The preemptive impact reduction feature can be added to the already implemented impedance controller because the parameter design is divided into impact reduction and contact impedance control. There is no undesirable contact force during the transition. Furthermore, even though the preemptive impact reduction employs a crude optical proximity sensor, the influence of reflectance is minimized using a virtual viscous force. Analyses and real-world experiments confirm these benefits.

translated by 谷歌翻译

Resilience Evaluation of Entropy Regularized Logistic Networks with Probabilistic Cost

Koshi Oishi , Yota Hashizume , Tomohiko Jimbo , Hirotaka Kaji , Kenji Kashima

分类：机器学习

2022-12-05

The demand for resilient logistics networks has increased because of recent disasters. When we consider optimization problems, entropy regularization is a powerful tool for the diversification of a solution. In this study, we proposed a method for designing a resilient logistics network based on entropy regularization. Moreover, we proposed a method for analytical resilience criteria to reduce the ambiguity of resilience. First, we modeled the logistics network, including factories, distribution bases, and sales outlets in an efficient framework using entropy regularization. Next, we formulated a resilience criterion based on probabilistic cost and Kullback--Leibler divergence. Finally, our method was performed using a simple logistics network, and the resilience of the three logistics plans designed by entropy regularization was demonstrated.

translated by 谷歌翻译

Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models

Naoki Matsunaga , Masato Ishii , Akio Hayakawa , Kenji Suzuki , Takuya Narihira

分类：计算机视觉 | 机器学习

2022-12-05

Generative models, particularly GANs, have been utilized for image editing. Although GAN-based methods perform well on generating reasonable contents aligned with the user's intentions, they struggle to strictly preserve the contents outside the editing region. To address this issue, we use diffusion models instead of GANs and propose a novel image-editing method, based on pixel-wise guidance. Specifically, we first train pixel-classifiers with few annotated data and then estimate the semantic segmentation map of a target image. Users then manipulate the map to instruct how the image is to be edited. The diffusion model generates an edited image via guidance by pixel-wise classifiers, such that the resultant image aligns with the manipulated map. As the guidance is conducted pixel-wise, the proposed method can create reasonable contents in the editing region while preserving the contents outside this region. The experimental results validate the advantages of the proposed method both quantitatively and qualitatively.

translated by 谷歌翻译

Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

Wentai Zhang , Joe Joseph , Yue Yin , Liuyue Xie , Tomotake Furuhata , Soji Yamakawa , Kenji Shimada , Levent Burak Kara

分类：计算机视觉 | 机器学习

2022-12-01

We present a data-driven framework to automate the vectorization and machine interpretation of 2D engineering part drawings. In industrial settings, most manufacturing engineers still rely on manual reads to identify the topological and manufacturing requirements from drawings submitted by designers. The interpretation process is laborious and time-consuming, which severely inhibits the efficiency of part quotation and manufacturing tasks. While recent advances in image-based computer vision methods have demonstrated great potential in interpreting natural images through semantic segmentation approaches, the application of such methods in parsing engineering technical drawings into semantically accurate components remains a significant challenge. The severe pixel sparsity in engineering drawings also restricts the effective featurization of image-based data-driven methods. To overcome these challenges, we propose a deep learning based framework that predicts the semantic type of each vectorized component. Taking a raster image as input, we vectorize all components through thinning, stroke tracing, and cubic bezier fitting. Then a graph of such components is generated based on the connectivity between the components. Finally, a graph convolutional neural network is trained on this graph data to identify the semantic type of each component. We test our framework in the context of semantic segmentation of text, dimension and, contour components in engineering drawings. Results show that our method yields the best performance compared to recent image, and graph-based segmentation methods.

translated by 谷歌翻译