深层神经网络最近使用具有高平行性的香草卷积层成功地完成了数字化力。但是,现有的深层方法无法生成具有令人满意的蓝色属性的半半来,并且需要复杂的训练方案。在本文中,我们提出了一种基于多代理深钢筋学习的半强化方法,称为Halftoners,该方法学会了共同的政策来生成高质量的半半突。具体而言,我们将每个二进制像素值的决定视为虚拟代理的动作,该策略由低变义的策略梯度培训。此外,蓝噪性特性是通过新颖的各向异性抑制损失函数来实现的。实验表明,我们的半强化方法在保持速度相对较快的同时产生高质量的半身。
translated by 谷歌翻译
合成伪样品当前是解决广义零局学习(GZSL)问题的最有效方法。大多数模型都达到了竞争性能,但仍然遇到两个问题:(1)功能令人困惑,整体表示混淆了与任务相关和与任务无关的功能,并且现有模型以生成的方式将它们分解,但是它们是不合理的,无法合成可靠的可靠伪样品样本样品有限; (2)分布不确定性,当现有模型合成不确定分布的样本时,需要大量数据,这在有限的可见类样品中导致性能差。在本文中,我们提出了一个非生成模型,以在两个模块中相应地解决这些问题:(1)与任务相关的功能分离,将任务相关的功能从任务无关的功能中排除,通过对域的对抗性学习域对合理合成的适应性; (2)可控的伪样品合成,以合成具有某些特征的边缘伪钉和中心假样品,以产生更多的多样性和直观的传递。此外,为了描述在培训过程中看到的限制类样本的新场景,我们进一步制定了一个新的ZSL任务,名为“几乎看不见的类别和零射门的唯一类别学习”(FSZU)(FSZU)。对四个基准测试的广泛实验验证了所提出的方法在GZSL和FSZU任务中具有竞争力。
translated by 谷歌翻译
现代软件系统和产品越来越依赖机器学习模型,以基于与用户和系统的交互进行数据驱动的决策,例如计算基础架构。对于更广泛的采用,这种做法必须(i)容纳没有ML背景的软件工程师,并提供(ii)提供优化产品目标的机制。在这项工作中,我们描述了一般原则和特定的端到端毫升平台,为决策和反馈集合提供易于使用的API。循环仪支持从在线数据收集到模拟培训,部署,推理的完整端到端ML生命周期,并扩展支持和调整产品目标的评估和调整。我们概述了平台架构和生产部署的整体影响 - 循环仪当前托管700毫升型号,每秒达到600万决定。我们还描述了学习曲线并总结了平台采用者的经验。
translated by 谷歌翻译
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
translated by 谷歌翻译
The space-air-ground integrated network (SAGIN), one of the key technologies for next-generation mobile communication systems, can facilitate data transmission for users all over the world, especially in some remote areas where vast amounts of informative data are collected by Internet of remote things (IoRT) devices to support various data-driven artificial intelligence (AI) services. However, training AI models centrally with the assistance of SAGIN faces the challenges of highly constrained network topology, inefficient data transmission, and privacy issues. To tackle these challenges, we first propose a novel topology-aware federated learning framework for the SAGIN, namely Olive Branch Learning (OBL). Specifically, the IoRT devices in the ground layer leverage their private data to perform model training locally, while the air nodes in the air layer and the ring-structured low earth orbit (LEO) satellite constellation in the space layer are in charge of model aggregation (synchronization) at different scales.To further enhance communication efficiency and inference performance of OBL, an efficient Communication and Non-IID-aware Air node-Satellite Assignment (CNASA) algorithm is designed by taking the data class distribution of the air nodes as well as their geographic locations into account. Furthermore, we extend our OBL framework and CNASA algorithm to adapt to more complex multi-orbit satellite networks. We analyze the convergence of our OBL framework and conclude that the CNASA algorithm contributes to the fast convergence of the global model. Extensive experiments based on realistic datasets corroborate the superior performance of our algorithm over the benchmark policies.
translated by 谷歌翻译
Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume and may fail when the range is too large or unreliable. To address this problem, we propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS, which infers the depth information from the pixel movement between two views. The core of DispMVS is to construct a 2D cost volume on the image plane along the epipolar line between each pair (between the reference image and several source images) for pixel matching and fuse uncountable depths triangulated from each pair by multi-view geometry to ensure multi-view consistency. To be robust, DispMVS starts from a randomly initialized depth map and iteratively refines the depth map with the help of the coarse-to-fine strategy. Experiments on DTUMVS and Tanks\&Temple datasets show that DispMVS is not sensitive to the depth range and achieves state-of-the-art results with lower GPU memory.
translated by 谷歌翻译
Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel defined this way has transductive properties, and this leads to improved ability to learn on fewer training points, as well as better handling of highly non-Euclidean data. We demonstrate these advantages using synthetic data where the distribution of the whole graph can inform the pattern of the labels. Finally, by utilizing a flexible polynomial of the graph Laplacian within the kernel, the model also performed effectively in semi-supervised classification on graphs of various levels of homophily.
translated by 谷歌翻译
我们提出了EasyRec,这是一个易于使用,可扩展和高效的推荐框架,用于构建工业推荐系统。我们的EasyRec框架在以下方面是优越的:首先,EasyRec采用模块化和可插入的设计模式来减少建立定制模型的努力;其次,EasyRec实现了超参数优化和特征选择算法,以自动提高模型性能;第三,EasyRec应用在线学习,以快速适应不断变化的数据分布。该代码发布:https://github.com/alibaba/easyrec。
translated by 谷歌翻译
全景图像可以同时展示周围环境的完整信息,并且在虚拟旅游,游戏,机器人技术等方面具有许多优势。但是,全景深度估计的进度无法完全解决由常用的投射方法引起的失真和不连续性问题。本文提出了SphereDepth,这是一种新型的全景深度估计方法,该方法可直接预测球形网格的深度而无需投影预处理。核心思想是建立全景图像与球形网格之间的关系,然后使用深层神经网络在球形域上提取特征以预测深度。为了解决高分辨率全景数据带来的效率挑战,我们介绍了两个超参数,以平衡推理速度和准确性。在三个公共全景数据集中验证,SphereDepth通过全景深度估算的最新方法实现了可比的结果。从球形域设置中受益,球形部可以产生高质量的点云,并显着缓解失真和不连续性问题。
translated by 谷歌翻译
最近的高级研究花费了大量的人类努力来优化网络体系结构进行立体声匹配,但几乎无法实现高精度和快速推理速度。为了简化网络设计中的工作量,神经体系结构搜索(NAS)已在各种稀疏预测任务(例如图像分类和对象检测)上获得了巨大成功。但是,现有关于密集预测任务的NAS研究,尤其是立体声匹配,仍然无法在不同计算功能的设备上有效地部署。为此,我们建议对具有不同计算功能的设备上的各种3D体系结构设置进行立体匹配(EASNET)训练弹性和准确的网络,以支持各种3D体系结构设置。考虑到目标设备的部署延迟约束,我们可以在无需额外培训的情况下快速从全部EASNET中提取子网络,而仍可以维护子网的准确性。广泛的实验表明,在模型的准确性和推理速度方面,我们的Easnet优于现场流和MPI Sintel数据集的最先进的人设计和基于NAS的体系结构。特别是,部署在推理GPU上,Easnet在场景流数据集中以100毫秒的价格获得了新的SOTA EPE,比具有更好质量型号的Leastereo快4.5 $ \ times $。
translated by 谷歌翻译