智能论文笔记

Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

Jaejin Cho , Raghavendra Pappagari , Piotr Żelasko , Laureano Moro-Velazquez , Jesús Villalba , Najim Dehak

分类：机器学习

2022-08-10

考虑到大量未标记的语音数据和高标签成本，无监督的学习方法对于更好的系统开发至关重要。最成功的方法之一是对比度的自我监督方法，这些方法需要负采样：采样替代样品与当前样品（锚）对比。但是，很难确保所有负样本属于与没有标签的锚类别不同的类别。本文在未标记的语音语料库上应用了一种非对抗性的自我监督学习方法来学习话语级的嵌入。我们使用没有标签的蒸馏（Dino），在计算机视觉中提出，并将其改编为语音域。与对比度方法不同，Dino不需要负采样。这些嵌入是根据说话者验证和情感识别评估的。在说话者验证中，无监督的恐龙与余弦评分嵌入了voxceleb1测试试验中的4.38％EER。这表现优于最佳的对比度自我监督方法，而EER中的相对相对40％。不需要扬声器标签的迭代伪标记训练管道将EER进一步提高到1.89％。在情感识别中，Iemocap，Crema-D和MSP播客的Micro-F1得分分别进行了60.87、79.21和56.98％的恐龙。结果暗示着恐龙嵌入到不同语音应用中的普遍性。

translated by 谷歌翻译

Finger-NestNet: Interpretable Fingerphoto Verification on Smartphone using Deep Nested Residual Network

Raghavendra Ramachandra , Hailin Li

分类：计算机视觉

2022-12-09

Fingerphoto images captured using a smartphone are successfully used to verify the individuals that have enabled several applications. This work presents a novel algorithm for fingerphoto verification using a nested residual block: Finger-NestNet. The proposed Finger-NestNet architecture is designed with three consecutive convolution blocks followed by a series of nested residual blocks to achieve reliable fingerphoto verification. This paper also presents the interpretability of the proposed method using four different visualization techniques that can shed light on the critical regions in the fingerphoto biometrics that can contribute to the reliable verification performance of the proposed method. Extensive experiments are performed on the fingerphoto dataset comprised of 196 unique fingers collected from 52 unique data subjects using an iPhone6S. Experimental results indicate the improved verification of the proposed method compared to six different existing methods with EER = 1.15%.

translated by 谷歌翻译

Computationally Light Spectrally Normalized Memory Neuron Network based Estimator for GPS-Denied operation of Micro UAV

Nishanth Rao , Suresh Sundaram , Varun Raghavendra

分类：机器人

2022-11-12

This paper addresses the problem of position estimation in UAVs operating in a cluttered environment where GPS information is unavailable. A model learning-based approach is proposed that takes in the rotor RPMs and past state as input and predicts the one-step-ahead position of the UAV using a novel spectral-normalized memory neural network (SN-MNN). The spectral normalization guarantees stable and reliable prediction performance. The predicted position is transformed to global coordinate frame which is then fused along with the odometry of other peripheral sensors like IMU, barometer, compass etc., using the onboard extended Kalman filter to estimate the states of the UAV. The experimental flight data collected from a motion capture facility using a micro-UAV is used to train the SN-MNN. The PX4-ECL library is used to replay the flight data using the proposed algorithm, and the estimated position is compared with actual ground truth data. The proposed algorithm doesn't require any additional onboard sensors, and is computationally light. The performance of the proposed approach is compared with the current state-of-art GPS-denied algorithms, and it can be seen that the proposed algorithm has the least RMSE for position estimates.

translated by 谷歌翻译

Stochastic optimization on matrices and a graphon McKean-Vlasov limit

Zaid Harchaoui , Sewoong Oh , Soumik Pal , Raghav Somani , Raghavendra Tripathi

分类：机器学习 | (统计)机器学习

2022-10-02

We consider stochastic gradient descents on the space of large symmetric matrices of suitable functions that are invariant under permuting the rows and columns using the same permutation. We establish deterministic limits of these random curves as the dimensions of the matrices go to infinity while the entries remain bounded. Under a "small noise" assumption the limit is shown to be the gradient flow of functions on graphons whose existence was established in arXiv:2111.09459. We also consider limits of stochastic gradient descents with added properly scaled reflected Brownian noise. The limiting curve of graphons is characterized by a family of stochastic differential equations with reflections and can be thought of as an extension of the classical McKean-Vlasov limit for interacting diffusions. The proofs introduce a family of infinite-dimensional exchangeable arrays of reflected diffusions and a novel notion of propagation of chaos for large matrices of interacting diffusions.

translated by 谷歌翻译

A Uniform Representation Learning Method for OCT-based Fingerprint Presentation Attack Detection and Reconstruction

Wentian Zhang , Haozhe Liu , Feng Liu , Raghavendra Ramachandra

分类：计算机视觉

2022-09-25

光学相干断层扫描（OCT）对指纹成像的技术为捕获皮肤层深度信息的能力而为指纹识别开辟了新的研究潜力。如果可以充分利用深度信息，则可以开发健壮和高安全性自动指纹识别系统（AFRSS）。然而，在现有的研究中，基于深度信息的表现攻击检测（PAD）和地下指纹重建被视为两个独立的分支，从而导致AFRS构建的高计算和复杂性。因此，本文提出了一个基于OCT的统一表示模型指纹垫和地下指纹重建。首先，我们设计了一个新型的语义分割网络，该网络仅通过基于OCT的指纹的真实手指切片训练，以从这些切片（也称为B扫描）中提取多个地下结构。从网络中得出的潜在代码直接用于有效检测PA，因为它们包含丰富的地下生物学信息，该信息与PA材料独立，并且对未知PA具有强大的鲁棒性。同时，采用了分段的地下结构来重建多个地下2D指纹。通过使用基于传统2D指纹的现有成熟技术，可以轻松实现识别。广泛的实验是在我们自己已建立的数据库上进行的，该数据库是最大的基于OCT的指纹数据库，具有2449卷。在PAD任务中，我们的方法可以从最先进的方法中提高0.33％的ACC。对于重建性能，我们的方法以0.834 miou和0.937 pa的形式达到了最佳性能。通过与表面2D指纹的识别性能进行比较，我们提出的方法对高质量地下指纹重建的有效性得到了进一步证明。

translated by 谷歌翻译

Efficient Self-Supervision using Patch-based Contrastive Learning for Histopathology Image Segmentation

Nicklas Boserup , Raghavendra Selvan

分类：计算机视觉 | 机器学习

2022-08-23

学习无标记数据的判别性表示是一项具有挑战性的任务。对比性的自我监督学习提供了一个框架，可以使用简单的借口任务中的相似性措施来学习有意义的表示。在这项工作中，我们为使用图像贴片上的对比度学习而无需使用明确的借口任务或任何进一步标记的微调来提出一个简单有效的框架，用于使用对比度学习进行自我监督的图像分割。完全卷积的神经网络（FCNN）以自我监督的方式进行训练，以辨别输入图像中的特征并获得置信图，从而捕获网络对同一类的对象的信念。根据对比度学习的置信图中的平均熵对正 - 和负斑进行采样。当正面斑块之间的信息分离很小时，假定会收敛，而正阴对对很大。我们评估了从多个组织病理学数据集分割核的任务，并通过相关的自我监督和监督方法显示出可比的性能。所提出的模型仅由一个具有10.8K参数的简单FCNN组成，需要大约5分钟才能收敛于高分辨率显微镜数据集，该数据集比相关的自我监督方法小的数量级以获得相似的性能。

translated by 谷歌翻译

Time flies by: Analyzing the Impact of Face Ageing on the Recognition Performance with Synthetic Data

Marcel Grimmer , Haoyu Zhang , Raghavendra Ramachandra , Kiran Raja , Christoph Busch

分类：计算机视觉 | 人工智能

2022-08-17

合成图像合成的巨大进展使得面部图像在高分辨率和光真实主义中产生。在生物识别应用中，使用合成数据的主要动机是解决公共可用生物识别数据的短缺，同时在处理此类敏感信息时降低隐私风险。这些优点在这项工作中被利用，通过模拟近期面部年龄修饰算法以生成交配样本，从而研究衰老对开源生物识别识别系统的性能的影响。此外，实际数据集用于评估短期衰老的影响，将生物识别性能与合成结构域进行比较。主要发现表明，短期老化在1 - 5年的范围内仅对一般识别绩效产生较小的影响。但是，对长期年龄差异超过20年的配对面的正确验证仍然是一个重大挑战，需要进一步调查。

translated by 谷歌翻译

SYN-MAD 2022: Competition on Face Morphing Attack Detection Based on Privacy-aware Synthetic Training Data

Marco Huber , Fadi Boutros , Anh Thi Luu , Kiran Raja , Raghavendra Ramachandra , Naser Damer , Pedro C. Neto , Tiago Gonçalves , Ana F. Sequeira , Jaime S. Cardoso

分类：计算机视觉

2022-08-15

本文介绍了基于2022年国际生物识别技术联合会议（IJCB 2022）举行的基于隐私感知合成训练数据（SYN-MAD）的面部变形攻击检测的摘要。该竞赛吸引了来自学术界和行业的12个参与团队，并在11个不同的国家 /地区举行。最后，参与团队提交了七个有效的意见书，并由组织者进行评估。竞争是为了介绍和吸引解决方案的解决方案，这些解决方案涉及检测面部变形攻击的同时，同时出于道德和法律原因保护人们的隐私。为了确保这一点，培训数据仅限于组织者提供的合成数据。提交的解决方案提出了创新，导致在许多实验环境中表现优于所考虑的基线。评估基准现在可在以下网址获得：https：//github.com/marcohuber/syn-mad-2022。

translated by 谷歌翻译

Bayesian predictive modeling of multi-source multi-way data

Jonathan Kim , Brian J. Sandri , Raghavendra B. Rao , Eric F. Lock

分类： (统计)机器学习

2022-08-05

我们开发了一种贝叶斯方法，以预测从具有多通道（即多维张量）结构的多个来源收集的数据的连续或二元结果。作为一个激励示例，我们将来自多个'Omics源的分子数据考虑在多个发育时间点上测量，作为恒河猴模型中早期铁缺乏症（ID）的预测指标。我们在系数上使用具有低级别结构的线性模型来捕获多路依赖性，并在每个源分别对系数的方差进行建模以推断其相对贡献。共轭先验促进了有效的吉布斯采样算法以进行后推理，假设有正常误差的连续结果或具有概率链接的二元结果。模拟表明，我们的模型在错误分类速率和估计系数与真实系数的相关性方面的性能如预期的，在考虑到不同来源的不同信号大小时，通过合并多路结构和适度的增长，可以通过稳定的性能增长。此外，它为我们的激励应用提供了可靠的ID猴子分类。以R代码形式的软件可在https://github.com/biostatskim/bayesmsmw上获得。

translated by 谷歌翻译

Interpreting Latent Spaces of Generative Models for Medical Images using Unsupervised Methods

Julian Schön , Raghavendra Selvan , Jens Petersen

分类：计算机视觉 | 机器学习

2022-07-20

生成模型，例如生成对抗网络（GAN）和变异自动编码器（VAE）在医学图像分析中起着越来越重要的作用。这些模型的潜在空间通常显示出与人解剖图像转换相对应的语义有意义的方向。但是，到目前为止，由于有监督数据的要求，他们对医疗图像的探索一直受到限制。无监督在gan潜在空间中可解释方向的几种方法在自然图像上显示出有趣的结果。这项工作探讨了通过训练胸腔CT扫描的gan和vae将这些技术应用于医学图像的潜力，并使用一种无监督的方法在产生的潜在空间中发现可解释的方向。我们发现几个方向对应于非平凡的图像转换，例如旋转或乳房大小。此外，该说明表明，尽管仅显示2D数据，但生成模型捕获了3D结构。结果表明，无监督的方法发现甘恩斯的可解释方向概括为VAE，并可以应用于医学图像。这在医学图像分析中使用这些方法打开了许多未来的工作。

translated by 谷歌翻译