近年来,机器学习已被广​​泛采用以自动化音频混合过程。自动混合系统已应用于各种音频效应,例如增益调整,均衡和混响。可以通过视觉接口来控制这些系统,使用旋钮和语义描述符提供音频示例。使用语义描述符或文本信息来控制这些系统是艺术家传达其创意目标的有效方法。在本文中,我们探讨了使用单词嵌入代表语义描述符的新颖想法。通常通过在大型书面文本中培训神经网络来获得单词嵌入。这些嵌入是神经网络的输入层,以创建从单词到eq设置的翻译。使用此技术,机器学习模型还可以生成以前从未见过的语义描述符的EQ设置。我们将人类的EQ设置与神经网络的预测进行比较,以评估预测的质量。结果表明,嵌入层使神经网络能够了解语义描述符。我们观察到,具有嵌入层的模型的性能要比没有嵌入层的模型更好,但仍然不如人类标签。
translated by 谷歌翻译
音频分割和声音事件检测是机器聆听中的关键主题,旨在检测声学类别及其各自的边界。它对于音频分析,语音识别,音频索引和音乐信息检索非常有用。近年来,大多数研究文章都采用分类。该技术将音频分为小帧,并在这些帧上单独执行分类。在本文中,我们提出了一种新颖的方法,叫您只听一次(Yoho),该方法受到计算机视觉中普遍采用的Yolo算法的启发。我们将声学边界的检测转换为回归问题,而不是基于框架的分类。这是通过具有单独的输出神经元来检测音频类的存在并预测其起点和终点来完成的。与最先进的卷积复发性神经网络相比,Yoho的F量的相对改善范围从多个数据集中的1%到6%不等,以进行音频分段和声音事件检测。由于Yoho的输出更端到端,并且可以预测的神经元更少,因此推理速度的速度至少比逐个分类快6倍。另外,由于这种方法可以直接预测声学边界,因此后处理和平滑速度约为7倍。
translated by 谷歌翻译
Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.
translated by 谷歌翻译
Semantic segmentation works on the computer vision algorithm for assigning each pixel of an image into a class. The task of semantic segmentation should be performed with both accuracy and efficiency. Most of the existing deep FCNs yield to heavy computations and these networks are very power hungry, unsuitable for real-time applications on portable devices. This project analyzes current semantic segmentation models to explore the feasibility of applying these models for emergency response during catastrophic events. We compare the performance of real-time semantic segmentation models with non-real-time counterparts constrained by aerial images under oppositional settings. Furthermore, we train several models on the Flood-Net dataset, containing UAV images captured after Hurricane Harvey, and benchmark their execution on special classes such as flooded buildings vs. non-flooded buildings or flooded roads vs. non-flooded roads. In this project, we developed a real-time UNet based model and deployed that network on Jetson AGX Xavier module.
translated by 谷歌翻译
The problem of generating an optimal coalition structure for a given coalition game of rational agents is to find a partition that maximizes their social welfare and is known to be NP-hard. This paper proposes GCS-Q, a novel quantum-supported solution for Induced Subgraph Games (ISGs) in coalition structure generation. GCS-Q starts by considering the grand coalition as initial coalition structure and proceeds by iteratively splitting the coalitions into two nonempty subsets to obtain a coalition structure with a higher coalition value. In particular, given an $n$-agent ISG, the GCS-Q solves the optimal split problem $\mathcal{O} (n)$ times using a quantum annealing device, exploring $\mathcal{O}(2^n)$ partitions at each step. We show that GCS-Q outperforms the currently best classical solvers with its runtime in the order of $n^2$ and an expected worst-case approximation ratio of $93\%$ on standard benchmark datasets.
translated by 谷歌翻译
Tuberculosis (TB), an infectious bacterial disease, is a significant cause of death, especially in low-income countries, with an estimated ten million new cases reported globally in $2020$. While TB is treatable, non-adherence to the medication regimen is a significant cause of morbidity and mortality. Thus, proactively identifying patients at risk of dropping off their medication regimen enables corrective measures to mitigate adverse outcomes. Using a proxy measure of extreme non-adherence and a dataset of nearly $700,000$ patients from four states in India, we formulate and solve the machine learning (ML) problem of early prediction of non-adherence based on a custom rank-based metric. We train ML models and evaluate against baselines, achieving a $\sim 100\%$ lift over rule-based baselines and $\sim 214\%$ over a random classifier, taking into account country-wide large-scale future deployment. We deal with various issues in the process, including data quality, high-cardinality categorical data, low target prevalence, distribution shift, variation across cohorts, algorithmic fairness, and the need for robustness and explainability. Our findings indicate that risk stratification of non-adherent patients is a viable, deployable-at-scale ML solution.
translated by 谷歌翻译
我们提出了一个\下划线{d} oully \下划线{o} \下划线{s} afe- \ \ useverline {l} inline {l} inear- \ usew suespline {b}和doslb的问题。安全的线性匪徒问题是使用随机的强盗反馈和动作安全风险的动作来优化未知的线性奖励,同时满足动作的未知圆形安全限制。与先前在汇总资源约束方面的工作相反,我们的公式明确要求控制环形安全风险。与现有的对安全匪徒的乐观态度范式不同,DOSLB练习至高无上,使用对奖励和安全得分的乐观估计来选择动作。然而,令人惊讶的是,我们表明doslb很少采取风险的行动,并获得了$ \ tilde {o}(d \ sqrt {t})$遗憾,在这里,我们对遗憾的概念既说明效率低下又缺乏行动的安全性。我们首先尤其表明$ \ sqrt {t} $ - 即使有较大的差距也无法改善遗憾的绑定,然后确定我们显示紧密的实例依赖性$ O(\ log(\ log),也无法改善,我们首先表明$ \ sqrt {t} $ - 遗憾的界限也无法改善,我们首先表明$ \ sqrt {t} $ - ^2 t)$边界。我们进一步认为,在这样的域中,播放过度风险的动作的次数也被限制为$ o(\ log^2t)$。
translated by 谷歌翻译
已经证明,深层合奏将典型的集体学习中看到的积极效果扩展到神经网络和增强学习(RL)。但是,要提高此类整体模型的效率仍然有很多事情要做。在这项工作中,我们介绍了在RL(feft)中快速传输的各种合奏,这是一种基于合奏的新方法,用于在高度多模式环境中进行增强学习,并改善了转移到看不见的环境。该算法分为两个主要阶段:合奏成员的培训,以及合成成员的合成(或微调)成员,以在新环境中起作用。该算法的第一阶段涉及并行培训常规的政策梯度或参与者 - 批评者,但增加了鼓励这些政策彼此不同的损失。这会导致单个单峰剂探索最佳策略的空间,并捕获与单个参与者相比,捕获环境的多模式的更多。 DEFT的第二阶段涉及将组件策略综合为新的策略,该策略以两种方式之一在修改的环境中效果很好。为了评估DEFT的性能,我们从近端策略优化(PPO)算法的基本版本开始,并通过faft的修改将其扩展。我们的结果表明,预处理阶段可有效地在多模式环境中产生各种策略。除了替代方案,faft通常会收敛到高奖励的速度要快得多,例如随机初始化而无需faft和合奏成员的微调。虽然当然还有更多的工作来分析理论上的熟练并将其扩展为更强大,但我们认为,它为在环境中捕获多模式的框架提供了一个强大的框架,同时仍将使用简单策略表示的RL方法。
translated by 谷歌翻译
无数据知识蒸馏(DFKD)最近引起了人们的关注,这要归功于其在不使用培训数据的情况下将知识从教师网络转移到学生网络的吸引力。主要思想是使用发电机合成数据以培训学生。随着发电机的更新,合成数据的分布将发生变化。如果发电机和学生接受对手的训练,使学生忘记了先前一步获得的知识,则这种分配转换可能会很大。为了减轻这个问题,我们提出了一种简单而有效的方法,称为动量对抗蒸馏(MAD),该方法维持了发电机的指数移动平均值(EMA)副本,并使用发电机和EMA生成器的合成样品来培训学生。由于EMA发电机可以被视为发电机旧版本的合奏,并且与发电机相比,更新的更改通常会发生较小的变化,因此对其合成样本进行培训可以帮助学生回顾过去的知识,并防止学生适应太快的速度发电机的新更新。我们在六个基准数据集上进行的实验,包括ImageNet和Place365,表明MAD的性能优于竞争方法来处理大型分配转移问题。我们的方法还与现有的DFKD方法相比,甚至在某些情况下达到了最新的方法。
translated by 谷歌翻译
转置卷积在许多深度学习应用中都表现出突出。但是,由于在每个行和列中的每个元素之后添加零之后,特征映射的大小增加,因此转置卷积层在计算范围内都在计算密集型。因此,在扩展的输入特征图上进行的卷积操作导致硬件资源的利用率不佳。不必要的乘法操作的主要原因是在输入特征映射中的预定位置处的零。我们提出了一种算法级优化技术,用于有效的转置卷积实施以解决这些问题。基于内核激活,我们将原始内核隔离为四个子内核。该方案可以减少内存需求和不必要的乘法。我们提出的方法是使用Kaggle网站上的Flower DataSet使用Titan X GPU(Intel Dual Core CPU)的$ 3.09(3.02)\ Times $ $更快的计算。此外,提出的优化方法可以推广到现有设备,而无需其他硬件要求。一个简单的深度学习模型,其中包含一个转齿卷积层来评估优化方法。它显示出使用具有Intel双核CPU的MNIST数据集的$ 2.2 \ times $ $更快的培训。
translated by 谷歌翻译