智能论文笔记

High-resolution embedding extractor for speaker diarisation

Hee-Soo Heo , Youngki Kwon , Bong-Jin Lee , You Jin Kim , Jee-weon Jung

分类：自然语言处理

2022-11-08

Speaker embedding extractors significantly influence the performance of clustering-based speaker diarisation systems. Conventionally, only one embedding is extracted from each speech segment. However, because of the sliding window approach, a segment easily includes two or more speakers owing to speaker change points. This study proposes a novel embedding extractor architecture, referred to as a high-resolution embedding extractor (HEE), which extracts multiple high-resolution embeddings from each speech segment. Hee consists of a feature-map extractor and an enhancer, where the enhancer with the self-attention mechanism is the key to success. The enhancer of HEE replaces the aggregation process; instead of a global pooling layer, the enhancer combines relative information to each frame via attention leveraging the global context. Extracted dense frame-level embeddings can each represent a speaker. Thus, multiple speakers can be represented by different frame-level features in each segment. We also propose an artificially generating mixture data training framework to train the proposed HEE. Through experiments on five evaluation sets, including four public datasets, the proposed HEE demonstrates at least 10% improvement on each evaluation set, except for one dataset, which we analyse that rapid speaker changes less exist.

translated by 谷歌翻译

Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

Sung Hwan Mun , Jee-weon Jung , Min Hyun Han , Nam Soo Kim

分类：人工智能

2022-04-03

大多数最新的说话者验证架构都采用了多尺度处理和频道注意机制。这些模型的卷积层通常具有固定的内核大小，例如3或5。在本研究中，我们进一步为这一研究采用了选择性核心注意（SKA）机制。SKA机制允许每个卷积层以数据驱动的方式自适应地选择内核大小。它基于利用频率和通道域的注意机制。我们首先将现有的SKA模块应用于我们的基线。然后，我们提出了两个SKA变体，其中第一个变体在ECAPA-TDNN模型的前面应用，另一个变体与RES2NET骨干块结合使用。通过广泛的实验，我们证明了我们提出的两个SKA变体始终提高性能，并在三个不同的评估方案上进行测试时是互补的。

translated by 谷歌翻译

Attentive max feature map and joint training for acoustic scene classification

Hye-jin Shim , Jee-weon Jung , Ju-ho Kim , Ha-Jin Yu

分类：机器学习 | 人工智能

2021-04-15

各种关注机制被广泛应用于声学场景分类。但是，我们经验发现，尽管提高了性能，所以注意力机制可能会过度丢弃潜在的有价值的信息。我们提出了殷勤的最大特征图，即结合了两个有效的技术，关注和最大特征图，进一步详细阐述了注意力机制并减轻了上述现象。我们还探讨了各种联合培训方法，包括多任务学习，为每个音频录制分配额外的抽象标签。我们所提出的系统通过使用相对较少的参数应用两种提出的技术来展示DCASE 2020挑战的子任务A上的单个系统的最先进的性能。此外，采用拟议的秘密最大特征图，我们的团队在最近的DCEAD 2021挑战中排名第四。

translated by 谷歌翻译

Situation-Aware Deep Reinforcement Learning for Autonomous Nonlinear Mobility Control in Cyber-Physical Loitering Munition Systems

Hyunsoo Lee , Soohyun Park , Won Joon Yun , Soyi Jung , Joongheon Kim

分类：机器人

2022-12-31

According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.

translated by 谷歌翻译

HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization

Sungyeon Kim , Boseung Jung , Suha Kwak

分类：计算机视觉 | 人工智能

2022-12-29

Supervision for metric learning has long been given in the form of equivalence between human-labeled classes. Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances of the field. In this regard, we propose a new regularization method, dubbed HIER, to discover the latent semantic hierarchy of training data, and to deploy the hierarchy to provide richer and more fine-grained supervision than inter-class separability induced by common metric learning losses. HIER achieved this goal with no annotation for the semantic hierarchy but by learning hierarchical proxies in hyperbolic spaces. The hierarchical proxies are learnable parameters, and each of them is trained to serve as an ancestor of a group of data or other proxies to approximate the semantic hierarchy among them. HIER deals with the proxies along with data in hyperbolic space since geometric properties of the space are well-suited to represent their hierarchical structure. The efficacy of HIER was evaluated on four standard benchmarks, where it consistently improved performance of conventional methods when integrated with them, and consequently achieved the best records, surpassing even the existing hyperbolic metric learning technique, in almost all settings.

translated by 谷歌翻译

Critic-Guided Decoding for Controlled Text Generation

Minbeom Kim , Hwanhee Lee , Kang Min Yoo , Joonsuk Park , Hwaran Lee , Kyomin Jung

分类：自然语言处理

2022-12-21

Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding. Specifically, we adopt the actor-critic framework to train an LM-steering critic from non-differentiable reward models. And similar to weighted decoding, our method freezes the language model and manipulates the output token distribution using called critic, improving training efficiency and stability. Evaluation of our method on three controlled generation tasks, namely topic control, sentiment control, and detoxification, shows that our approach generates more coherent and well-controlled texts than previous methods. In addition, CriticControl demonstrates superior generalization ability in zero-shot settings. Human evaluation studies also corroborate our findings.

translated by 谷歌翻译

Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?

Sang-Woo Lee , Sungdong Kim , Donghyeon Ko , Donghoon Ham , Youngki Hong , Shin Ah Oh , Hyunhoon Jung , Wangkyo Jung , Kyunghyun Cho , Donghyun Kwak

分类：自然语言处理

2022-12-20

Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.

translated by 谷歌翻译

HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

HyunJun Jung , Shun-Cheng Wu , Patrick Ruhkamp , Hannah Schieber , Pengyuan Wang , Giulia Rizzoli , Hongcheng Zhao , Sven Damian Meier , Daniel Roth , Nassir Navab

分类：计算机视觉

2022-12-20

Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.

translated by 谷歌翻译

Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning

Kyuyong Shin , Hanock Kwak , Wonjae Kim , Jisu Jeong , Seungjae Jung , Kyung-Min Kim , Jung-Woo Ha , Sang-Woo Lee

分类：自然语言处理

2022-12-07

Recent studies have proposed a unified user modeling framework that leverages user behavior data from various applications. Most benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.

translated by 谷歌翻译

Quantum Federated Learning with Entanglement Controlled Circuits and Superposition Coding

Won Joon Yun , Jae Pyoung Kim , Hankyul Baek , Soyi Jung , Jihong Park , Mehdi Bennis , Joongheon Kim

分类：机器学习

2022-12-04

While witnessing the noisy intermediate-scale quantum (NISQ) era and beyond, quantum federated learning (QFL) has recently become an emerging field of study. In QFL, each quantum computer or device locally trains its quantum neural network (QNN) with trainable gates, and communicates only these gate parameters over classical channels, without costly quantum communications. Towards enabling QFL under various channel conditions, in this article we develop a depth-controllable architecture of entangled slimmable quantum neural networks (eSQNNs), and propose an entangled slimmable QFL (eSQFL) that communicates the superposition-coded parameters of eS-QNNs. Compared to the existing depth-fixed QNNs, training the depth-controllable eSQNN architecture is more challenging due to high entanglement entropy and inter-depth interference, which are mitigated by introducing entanglement controlled universal (CU) gates and an inplace fidelity distillation (IPFD) regularizer penalizing inter-depth quantum state differences, respectively. Furthermore, we optimize the superposition coding power allocation by deriving and minimizing the convergence bound of eSQFL. In an image classification task, extensive simulations corroborate the effectiveness of eSQFL in terms of prediction accuracy, fidelity, and entropy compared to Vanilla QFL as well as under different channel conditions and various data distributions.

translated by 谷歌翻译