经典的机器学习范式需要在中心位置汇总用户数据,在该位置,机器学习实践者可以预处理数据,计算功能,调整模型并评估性能。这种方法的优点包括利用高性能硬件(例如GPU)以及机器学习实践者在深度数据分析中进行的能力以提高模型性能。但是,这些优势可能是为了支付数据隐私的费用。收集,汇总并存储在集中式服务器上以进行模型开发。数据集中构成风险,包括内部和外部安全事件的风险增加以及意外数据滥用。具有不同隐私的联合学习旨在通过将ML学习步骤带给用户的设备来避免服务器端集中化陷阱。学习是以联合方式完成的,每个移动设备都在模型的本地副本上运行一个训练循环。来自设备模型的更新通过加密通信和通过差异隐私发送到服务器,以改善全局模型。在此范式中,用户的个人数据仍在其设备上。令人惊讶的是,以这种方式培训模型培训的模型性能差异很小。但是,由于其分布式性质,异质计算环境和缺乏数据可见性,联邦学习带来了许多其他挑战。本文探讨了这些挑战,并概述了我们正在探索和测试的建筑设计解决方案,以在元评估中生产联合学习。
translated by 谷歌翻译
独立组件分析是一种无监督的学习方法,用于从多元信号或数据矩阵计算独立组件(IC)。基于权重矩阵与多元数据矩阵的乘法进行评估。这项研究提出了一个新型的Memristor横杆阵列,用于实施ACY ICA和快速ICA,以用于盲源分离。数据输入以脉冲宽度调制电压的形式应用于横梁阵列,并且已实现的神经网络的重量存储在Memristor中。来自Memristor列的输出电荷用于计算重量更新,该重量更新是通过电压高于Memristor SET/RESET电压执行的。为了证明其潜在应用,采用了基于ICA架构的基于ICA架构的拟议的Memristor横杆阵列用于图像源分离问题。实验结果表明,所提出的方法非常有效地分离图像源,并且与常规ACY的基于软件的ACY实施相比,与结构相似性的百分比相比,结构相似性的百分比为67.27%,图像的对比度得到了改进。 ICA和快速ICA算法。
translated by 谷歌翻译
无监督的语义细分需要将标签分配给每个像素,而无需任何人类注释。尽管在单个图像的自我监督表示学习方面取得了进步,但使用像素级表示的无监督语义细分仍然是一项艰巨的任务,并且仍然没有被淘汰。在这项工作中,我们通过使用视觉概念(即具有语义含义的像素组,例如零件,对象和场景)提出一种自我监督的像素表示学习方法,以进行语义分割。为了指导自我监督的学习,我们利用像素和概念之间的三种类型的关系,包括像素与本地概念之间的关系,本地和全球概念以及概念的共发生。我们在包括Pascal VOC 2012,Coco 2017和Davis 2017的三个数据集上评估了学识渊博的像素嵌入和视觉概念。我们的结果表明,提议的方法对最近的无监督语义细分方法进行了一致性和实质性改进,并证明了视觉概念的视觉概念。可以向图像数据集揭示洞察力。
translated by 谷歌翻译
卷积神经网络(CNNS)的出现导致了它们在若干域中的应用。一个值得注意的应用是自主驱动的感知系统,它依赖于来自CNN的预测。从业者通过在独立的测试数据集上计算各种指标来评估此类CNN的泛化能力。通常基于一个前提条件,即其元素不是培训数据的一部分来选择测试数据集。这样的数据集可能包含既具有相似和新颖的w.r.t的对象。训练数据集。尽管如此,现有的作品不会估计测试样品的新颖性,并同样对其进行评估以进行评估。这种新颖性的基于基于的评估具有重要性,以验证自主驾驶应用中CNN中的CNN的适应性。因此,我们提出了一种CNN泛化评分框架,其考虑了测试数据集中的对象的新颖性。我们从表示学习技术开始将图像数据减少到低维空间中。在这个空间上,我们估计了测试样本的新颖性。最后,我们计算概括得分作为测试数据预测性能和新颖性的组合。我们对我们的交通灯检测应用进行了一个实验研究。此外,我们系统地可视化了一种可解释的新奇概念的结果。
translated by 谷歌翻译
机器学习开始在一系列环境应用中提供最先进的性能,例如水文流域中的流量预测。但是,由于主要的水文工艺的可变性,在实践中建立准确的大规模模型在实践中仍然具有挑战性,这是通过一组与过程相关的盆地特征捕获的。现有的盆地特征遭受了噪音和不确定性的影响,以及许多其他事情,这会对模型性能产生不利影响。为了应对上述挑战,在本文中,我们提出了一种新颖的知识引导的自学学习(KGSSL)逆框架,以从驱动程序和响应数据中提取系统特征。即使特征被损坏,这个首先的框架即使在特征被损坏的情况下也达到了强大的性能。我们表明,KGSSL为骆驼的流量建模(大型研究的流域属性和气象学)实现了最新的结果,这是一个广泛使用的水文基准数据集。具体而言,KGSSL在重建特性中最多优于其他方法16 \%。此外,我们表明KGSSL比基线方法相对强大,并且在插入KGSSL推断的特征时,基线模型的表现优于35 \%。
translated by 谷歌翻译
本体对齐是应用于各种领域的重要研究问题,例如数据集成,数据传输,数据准备等。最先进的(SOTA)本体对准系统通常使用具有手工规则或域的天真域相关方法 - 具体的架构,使它们不可能和效率低下。在这项工作中,我们提出了一种基于深入的学习的模型,它使用了一种新的双重注意机制来计算概念的上下文化表示,这反过来依用于发现对齐。通过这样做,不仅是我们的方法能够利用在本体中编码的句法和语义信息,它也是通过设计,灵活和可扩展到不同的域,其努力最小。我们在来自不同领域和语言的四个不同数据集中评估我们的模型,并通过这些结果以及详细的消融研究来确定其优势。使用的代码和数据集可在https://github.com/remorax/vealigh中获得。
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.
translated by 谷歌翻译
Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译