行人意图预测问题是估计目标行人是否会过马路。最先进的方法在很大程度上依赖于使用自我车辆的前置摄像头收集的视觉信息来预测行人的意图。因此,当视觉信息不准确时,例如,当行人和自我车辆之间的距离远处或照明条件不够好时,现有方法的性能会显着降低。在本文中,我们根据与行人的智能手表(或智能手机)收集的运动传感器数据的集成,设计,实施和评估第一个行人意图预测模型。提出了一种新型的机器学习体系结构,以有效地合并运动传感器数据,以加强视觉信息,以显着改善视觉信息可能不可靠的不利情况的性能。我们还进行了大规模的数据收集,并介绍了与时间同步运动传感器数据集成的第一个行人意图预测数据集。该数据集由总共128个视频剪辑组成,这些视频片段具有不同的距离和不同级别的照明条件。我们使用广泛使用的JAAD和我们自己的数据集训练了模型,并将性能与最先进的模型进行了比较。结果表明,我们的模型优于最新方法,特别是当行人的距离远(超过70m)并且照明条件不足时。
translated by 谷歌翻译
沟通成为各种分布式机器学习设置中的瓶颈。在这里,我们提出了一个新颖的培训框架,可导致代理之间模型的高效通信。简而言之,我们将网络训练为许多伪随机生成的冷冻模型的线性组合。为了进行通信,源代理仅传输用于生成伪随机网络的“种子”标量以及学习的线性混合系数。我们的方法被称为Pranc,比Deep Models学习了近100美元的参数,并且在几个数据集和架构上仍然表现良好。 Pranc启用1)代理之间模型的有效通信,2)有效的模型存储,3)通过即时生成层的重量来加速推理。我们在CIFAR-10,CIFAR-100,TINYIMAGENET和IMAGENET-100上测试Pranc,并具有各种体系结构,例如Alexnet,Lenet,Resnet18,Resnet20和Resnet56,并显示出在这些基础数据集中的可满足性能的同时大大降低的,并显示出大量的降低。 。该代码可用\ href {https://github.com/ucdvision/pranc} {https://github.com/ucdvision/pranc}
translated by 谷歌翻译
从非平稳的输入数据流进行连续/终身学习是智力的基石。尽管在各种应用中表现出色,但深度神经网络仍容易在学习新信息时忘记他们以前学习的信息。这种现象称为“灾难性遗忘”,深深地植根于稳定性困境。近年来,克服深层神经网络中的灾难性遗忘已成为一个积极的研究领域。特别是,基于梯度投射的方法最近在克服灾难性遗忘时表现出了出色的表现。本文提出了基于稀疏性和异质辍学的两种受生物学启发的机制,这些机制在长期的任务上显着提高了持续学习者的表现。我们提出的方法建立在梯度投影内存(GPM)框架上。我们利用神经网络的每一层中的K-获奖者激活来为每个任务执行层次稀疏激活,以及任务间的异质辍学,鼓励网络在不同任务之间使用非重叠的激活模式。此外,我们引入了两个新的基准,用于在分配转移下连续学习,即连续的瑞士卷和Imagenet Superdog-40。最后,我们对我们提出的方法进行了深入的分析,并证明了各种基准持续学习问题的显着性能。
translated by 谷歌翻译
利用机器学习来促进优化过程是一个新兴领域,该领域有望绕过经典迭代求解器在需要接近实时优化的关键应用中引起的基本计算瓶颈。现有的大多数方法都集中在学习数据驱动的优化器上,这些优化器可在解决优化方面更少迭代。在本文中,我们采用了不同的方法,并建议将迭代求解器完全替换为可训练的参数集功能,该功能在单个feed向前输出优化问题的最佳参数/参数。我们将我们的方法表示为学习优化优化过程(循环)。我们显示了学习此类参数功能的可行性,以解决各种经典优化问题,包括线性/非线性回归,主成分分析,基于运输的核心和二次编程在供应管理应用程序中。此外,我们提出了两种学习此类参数函数的替代方法,在循环中有和没有求解器。最后,通过各种数值实验,我们表明训练有素的求解器的数量级可能比经典的迭代求解器快,同时提供了接近最佳的解决方案。
translated by 谷歌翻译
在过去的几十年中,已经进行了许多尝试来解决从其相应的低分辨率(LR)对应物中恢复高分辨率(HR)面部形象的问题,这是通常被称为幻觉的任务。尽管通过位置补丁和基于深度学习的方法实现了令人印象深刻的性能,但大多数技术仍然无法恢复面孔的特定特定功能。前一组算法通常在存在更高水平的降解存在下产生模糊和过天气输出,而后者产生的面部有时绝不使得输入图像中的个体类似于个体。在本文中,将引入一种新的面部超分辨率方法,其中幻觉面被迫位于可用训练面跨越的子空间中。因此,与大多数现有面的幻觉技术相比,由于这种面部子空间之前,重建是为了回收特定人的面部特征,而不是仅仅增加图像定量分数。此外,通过最近的3D面部重建领域的进步启发,还呈现了一种有效的3D字典对齐方案,通过该方案,该算法能够处理在不受控制的条件下拍摄的低分辨率面。在几个众所周知的面部数据集上进行的广泛实验中,所提出的算法通过生成详细和接近地面真理结果来显示出色的性能,这在定量和定性评估中通过显着的边距来实现了最先进的面部幻觉算法。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(m\times\left( 2+ \alpha (G) \right)+n)$ to $O(m+n)$ for any given value of $\varepsilon$ for a complete signed graph with $n$ vertices and $m$ positive edges where $\alpha(G)$ is the arboricity of the graph. Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting where edge sign flipping and vertex addition/removal are allowed. Constructing this index costs $O(m)$ memory and $O(m\times\alpha(G))$ time. We also studied the structural properties of the non-agreement measure used in the approximation algorithm. The theoretical results are accompanied by a full set of experiments concerning seven real-world graphs. These results shows superiority of our index-based algorithm to the non-index one by a decrease of %34 in time on average.
translated by 谷歌翻译
This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Finding and localizing the conceptual changes in two scenes in terms of the presence or removal of objects in two images belonging to the same scene at different times in special care applications is of great significance. This is mainly due to the fact that addition or removal of important objects for some environments can be harmful. As a result, there is a need to design a program that locates these differences using machine vision. The most important challenge of this problem is the change in lighting conditions and the presence of shadows in the scene. Therefore, the proposed methods must be resistant to these challenges. In this article, a method based on deep convolutional neural networks using transfer learning is introduced, which is trained with an intelligent data synthesis process. The results of this method are tested and presented on the dataset provided for this purpose. It is shown that the presented method is more efficient than other methods and can be used in a variety of real industrial environments.
translated by 谷歌翻译