智能论文笔记

BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

Sunwoo Kim , Minje Kim

分类：机器学习

2021-11-17

在本文中，我们介绍了一种用于屏蔽的网络（Bloom-Net）的块优化方法，用于训练可扩展语音增强网络。这里，我们用残差学习方案设计我们的网络，并顺序地训练内部分离器块，以获得用于语音增强的可伸缩掩蔽基础神经网络。其可伸缩性允许它根据测试时间资源约束来调整运行时复杂度：部署一旦部署，该模型可以根据测试时间环境动态改变其复杂性。为此，我们模块化了我们的模型，因为它们可以灵活地适应增强性能和资源限制的不同需求，导致最小的内存或由于增加的可扩展性而训练开销。我们对语音增强的实验表明，所提出的块状优化方法与相应的模型相比，仅具有轻微的性能下降，与端到端的相应模型相比，实现了所需的可扩展性。

translated by 谷歌翻译

Efficient Personalized Speech Enhancement through Self-Supervised Learning

Aswin Sivaraman , Minje Kim

分类：机器学习

2021-04-05

这项工作介绍了开发单声扬声器特定（即个性化）语音增强模型的自我监督学习方法。尽管通才模型必须广泛地解决许多扬声器，但专业模型可以将其增强功能调整到特定说话者的声音上，并希望解决狭窄的问题。因此，除了降低计算复杂性外，专家还能够实现更佳的性能。但是，幼稚的个性化方法可能需要目标用户的干净语音，这是不方便的，例如由于记录条件不足。为此，我们将个性化作为零拍的任务，其中不使用目标扬声器的其他干净演讲来培训，或者不使用几次学习任务，在该任务中，目标是最大程度地减少清洁的持续时间用于转移学习的语音。在本文中，我们提出了自我监督的学习方法，以解决零和少量个性化任务的解决方案。所提出的方法旨在从未知的无标记数据（即，来自目标用户的内在嘈杂录音）中学习个性化的语音功能，而无需知道相应的清洁资源。我们的实验研究了三种不同的自我监督学习机制。结果表明，使用较少的模型参数以及来自目标用户的较少的清洁数据实现了零拍摄的模型，从而实现了数据效率和模型压缩目标。

translated by 谷歌翻译

Scalable and Efficient Neural Speech Coding: A Hybrid Design

Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

分类：机器学习

2021-03-27

我们提出了一种可扩展高效的神经波形编码系统，用于语音压缩。我们将语音编码问题作为一种自动汇总任务，其中卷积神经网络（CNN）在其前馈例程期间执行编码和解码作为神经波形编解码器（NWC）。所提出的NWC还将量化和熵编码定义为可培训模块，因此在优化过程期间处理编码伪像和比特率控制。通过将紧凑的模型组件引入NWC，如Gated Reseal Networks和深度可分离卷积，我们实现了效率。此外，所提出的模型具有可扩展的架构，跨模块残差学习（CMRL），以覆盖各种比特率。为此，我们采用残余编码概念来连接多个NWC自动汇总模块，其中每个NWC模块执行残差编码以恢复其上一模块已创建的任何重建损失。 CMRL也可以缩小以覆盖下比特率，因为它采用线性预测编码（LPC）模块作为其第一自动化器。混合设计通过将LPC的量化作为可分散的过程重新定义LPC和NWC集成，使系统培训端到端的方式。所提出的系统的解码器在低至中等比特率范围（12至20kbps）或高比特率（32kbps）中的两个NWC中的一个NWC（0.12百万个参数）。尽管解码复杂性尚不低于传统语音编解码器的复杂性，但是从其他神经语音编码器（例如基于WVENET的声码器）显着降低。对于宽带语音编码质量，我们的系统对AMR-WB的性能相当或卓越的性能，并在低和中等比特率下的速度试验话题上的表现。所提出的系统可以扩展到更高的比特率以实现近透明性能。

translated by 谷歌翻译

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

Aswin Sivaraman , Minje Kim

分类：机器学习

2020-11-06

这项工作探讨了如何普遍使用自我监督的学习来发现特定于扬声器的特征以实现个性化的语音增强模型。我们专门讨论了几次学习的方案，其中访问测试时间扬声器的清洁录音仅限几秒钟，但演讲者的嘈杂录音很丰富。我们开发了一个简单的对比度学习程序，该程序通过成对噪声注入将丰富的嘈杂数据视为临时训练目标：该模型经过预定，以最大程度地达到不同变形相同的话语对之间的一致性，并最大程度地减少了类似变形的非身份的非同一性异常说法之间的一致性。我们的实验将所提出的预训练方法与两种基线替代方法进行了比较：说话者不合时宜的预定训练和特定于扬声器的自我监督预定训练，而没有对比损失项。在所有三种方法中，发现使用对比度混合物的建议方法最适合模型压缩（使用较少的参数）和简洁的言语减少（仅需要3秒）。

translated by 谷歌翻译

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Yunjey Choi , Minje Choi , Munyoung Kim , Jung-Woo Ha , Sunghun Kim , Jaegul Choo

分类：

2017-11-24

Figure 1. Multi-domain image-to-image translation results on the CelebA dataset via transferring knowledge learned from the RaFD dataset. The first and sixth columns show input images while the remaining columns are images generated by StarGAN. Note that the images are generated by a single generator network, and facial expression labels such as angry, happy, and fearful are from RaFD, not CelebA.

translated by 谷歌翻译

Class-Continuous Conditional Generative Neural Radiance Field

Jiwook Kim , Minhyeok Lee

分类：计算机视觉 | 人工智能

2023-01-03

The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.

translated by 谷歌翻译

A contrastive learning approach for individual re-identification in a wild fish population

Ørjan Langøy Olsen , Tonje Knutsen Sørdalen , Morten Goodwin , Ketil Malde , Kristian Muri Knausgård , Kim Tallaksen Halvorsen

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.

translated by 谷歌翻译

Learning to Maximize Mutual Information for Dynamic Feature Selection

Ian Covert , Wei Qiu , Mingyu Lu , Nayoon Kim , Nathan White , Su-In Lee

分类：机器学习 | (统计)机器学习

2023-01-02

Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.

translated by 谷歌翻译

Design, Modeling, and Evaluation of Separable Tendon-Driven Robotic Manipulator with Long, Passive, Flexible Proximal Section

Christian DeBuys , Florin C. Ghesu , Jagadeesan Jayender , Reza Langari , Young-Ho Kim

分类：机器人

2023-01-01

The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.

translated by 谷歌翻译

Situation-Aware Deep Reinforcement Learning for Autonomous Nonlinear Mobility Control in Cyber-Physical Loitering Munition Systems

Hyunsoo Lee , Soohyun Park , Won Joon Yun , Soyi Jung , Joongheon Kim

分类：机器人

2022-12-31

According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.

translated by 谷歌翻译