智能论文笔记

HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement

Pavel Andreev , Aibek Alanov , Oleg Ivanov , Dmitry Vetrov

分类：机器学习

2022-03-24

生成的对抗网络最近在神经声音中表现出了出色的表现，表现优于最佳自动回归和基于流动的模型。在本文中，我们表明这种成功可以扩展到有条件音频的其他任务。特别是，在HIFI Vocoders的基础上，我们为带宽扩展和语音增强的新型HIFI ++一般框架提出了新颖的一般框架。我们表明，通过改进的生成器体系结构和简化的多歧视培训，HIFI ++在这些任务中的最先进的情况下表现更好或与之相提并论，同时花费大量的计算资源。通过一系列广泛的实验，我们的方法的有效性得到了验证。

translated by 谷歌翻译

Optimal-er Auctions through Attention

Dmitry Ivanov , Iskander Safiulin , Igor Filippov , Ksenia Balabaeva

分类：机器学习

2022-02-26

遗憾是最近在收入最大化拍卖的自动设计方面的突破。它将深度学习的表现力与基于遗憾的方法相结合，以放松激励兼容的限制（参与者如今受益于竞标）。我们提出了对遗憾的两种独立修改，即基于注意力机制的神经结构，称为“遗憾形式”，以及对超参数敏感的可解释损失函数。我们在一项广泛的实验研究中研究了两者的修改，其中包括具有恒定和不同的项目和参与者的设置，新颖的验证程序以及设定的概括。我们发现，Realformer在收入中始终优于现有体系结构，与现有架构不同，当输入大小是可变时，适用于现有体系结构。关于我们的损失修改，我们通过改变单个可解释的超参数来确认其在控制收入归还权衡权衡方面的有效性。

translated by 谷歌翻译

Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations

Dmitry Ivanov , Mikhail Kiselev , Denis Larionov

分类：机器学习 | 神经与进化计算

2022-01-07

本文提出了一种基于稀疏的计算方法，用于优化用于加强学习（RL）任务的神经网络。该方法结合了两个想法：神经网络修剪并考虑到输入数据相关;只有在它们的变化超过某个阈值时才可以更新神经元状态。运行神经网络时，它显着降低了乘法的数量。我们测试了不同的RL任务，并在乘法数量下实现了20-150倍。没有大量的性能损失;有时表现甚至有所改善。

translated by 谷歌翻译

HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Dmitry Yudin , Yaroslav Solomentsev , Ruslan Musaev , Aleksei Staroverov , Aleksandr I. Panov

分类：计算机视觉 | 人工智能

2022-12-30

We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc.

translated by 谷歌翻译

An Optimal Algorithm for Strongly Convex Min-min Optimization

Dmitry Kovalev , Alexander Gasnikov , Grigory Malinovsky

分类：机器学习

2022-12-29

In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x,y)$. The existing optimal first-order methods require $\mathcal{O}(\sqrt{\max\{\kappa_x,\kappa_y\}} \log 1/\epsilon)$ of computations of both $\nabla_x f(x,y)$ and $\nabla_y f(x,y)$, where $\kappa_x$ and $\kappa_y$ are condition numbers with respect to variable blocks $x$ and $y$. We propose a new algorithm that only requires $\mathcal{O}(\sqrt{\kappa_x} \log 1/\epsilon)$ of computations of $\nabla_x f(x,y)$ and $\mathcal{O}(\sqrt{\kappa_y} \log 1/\epsilon)$ computations of $\nabla_y f(x,y)$. In some applications $\kappa_x \gg \kappa_y$, and computation of $\nabla_y f(x,y)$ is significantly cheaper than computation of $\nabla_x f(x,y)$. In this case, our algorithm substantially outperforms the existing state-of-the-art methods.

translated by 谷歌翻译

StyleDomain: Analysis of StyleSpace for Domain Adaptation of StyleGAN

Aibek Alanov , Vadim Titov , Maksim Nakhodnov , Dmitry Vetrov

分类：计算机视觉 | 机器学习

2022-12-20

Domain adaptation of GANs is a problem of fine-tuning the state-of-the-art GAN models (e.g. StyleGAN) pretrained on a large dataset to a specific domain with few samples (e.g. painting faces, sketches, etc.). While there are a great number of methods that tackle this problem in different ways there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. First, we perform a detailed exploration of the most important parts of StyleGAN that are responsible for adapting the generator to a new domain depending on the similarity between the source and target domains. In particular, we show that affine layers of StyleGAN can be sufficient for fine-tuning to similar domains. Second, inspired by these findings, we investigate StyleSpace to utilize it for domain adaptation. We show that there exist directions in the StyleSpace that can adapt StyleGAN to new domains. Further, we examine these directions and discover their many surprising properties. Finally, we leverage our analysis and findings to deliver practical improvements and applications in such standard tasks as image-to-image translation and cross-domain morphing.

translated by 谷歌翻译

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Esaú Villatoro-Tello , Srikanth Madikeri , Juan Zuluaga-Gomez , Bidisha Sharma , Seyyed Saeed Sarfjoo , Iuliia Nigmatulina , Petr Motlicek , Alexei V. Ivanov , Aravind Ganapathiraju

分类：自然语言处理 | 人工智能

2022-12-16

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.

translated by 谷歌翻译

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan , Noah Brown , Justice Carbajal , Yevgen Chebotar , Joseph Dabis , Chelsea Finn , Keerthana Gopalakrishnan , Karol Hausman , Alex Herzog , Jasmine Hsu

分类：机器人 | 人工智能 | 自然语言处理 | 计算机视觉 | 机器学习

2022-12-13

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io

translated by 谷歌翻译

Deep Learning Generates Synthetic Cancer Histology for Explainability and Education

James M. Dolezal , Rachelle Wolk , Hanna M. Hieromnimon , Frederick M. Howard , Andrew Srisuwananukorn , Dmitry Karpeyev , Siddhi Ramesh , Sara Kochanny , Jung Woo Kwon , Meghana Agni

分类：计算机视觉

2022-11-12

Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.

translated by 谷歌翻译

Toward Human-AI Co-creation to Accelerate Material Discovery

Dmitry Zubarev , Carlos Raoni Mendes , Emilio Vital Brazil , Renato Cerqueira , Kristin Schmidt , Vinicius Segura , Juliana Jansen Ferreira , Dan Sanders

分类：机器学习 | 人工智能

2022-11-05

There is an increasing need in our society to achieve faster advances in Science to tackle urgent problems, such as climate changes, environmental hazards, sustainable energy systems, pandemics, among others. In certain domains like chemistry, scientific discovery carries the extra burden of assessing risks of the proposed novel solutions before moving to the experimental stage. Despite several recent advances in Machine Learning and AI to address some of these challenges, there is still a gap in technologies to support end-to-end discovery applications, integrating the myriad of available technologies into a coherent, orchestrated, yet flexible discovery process. Such applications need to handle complex knowledge management at scale, enabling knowledge consumption and production in a timely and efficient way for subject matter experts (SMEs). Furthermore, the discovery of novel functional materials strongly relies on the development of exploration strategies in the chemical space. For instance, generative models have gained attention within the scientific community due to their ability to generate enormous volumes of novel molecules across material domains. These models exhibit extreme creativity that often translates in low viability of the generated candidates. In this work, we propose a workbench framework that aims at enabling the human-AI co-creation to reduce the time until the first discovery and the opportunity costs involved. This framework relies on a knowledge base with domain and process knowledge, and user-interaction components to acquire knowledge and advise the SMEs. Currently,the framework supports four main activities: generative modeling, dataset triage, molecule adjudication, and risk assessment.

translated by 谷歌翻译