用于自我监督的顺序行动对齐的最先进方法依赖于在时间上跨越视频的对应关系的深网络。它们要么学习横跨序列的帧到帧映射,但不利用时间信息,或者在每个视频对之间采用单调对齐,这忽略了动作顺序的变化。因此,这些方法无法处理涉及包含非单调动作序列的背景帧或视频的常见现实情景。在本文中,我们提出了一种方法来对齐野生序列动作,涉及不同的时间变化。为此,我们提出了一种方法来强制在最佳传输矩阵上强制执行时间前导者,该矩阵利用时间一致性,同时允许动作顺序变化。我们的模型占单调和非单调序列,并处理不应对齐的背景框架。我们展示了我们的方法在四个不同的基准数据集中始终如一地始终优于自我监督的顺序行动表示学习的最先进。
translated by 谷歌翻译
具有注释的缺乏大规模的真实数据集使转移学习视频活动的必要性。我们的目标是为少数行动分类开发几次拍摄转移学习的有效方法。我们利用独立培训的本地视觉提示来学习可以从源域传输的表示,该源域只能使用少数示例来从源域传送到不同的目标域。我们使用的视觉提示包括对象 - 对象交互,手掌和地区内的动作,这些地区是手工位置的函数。我们采用了一个基于元学习的框架,以提取部署的视觉提示的独特和域不变组件。这使得能够在使用不同的场景和动作配置捕获的公共数据集中传输动作分类模型。我们呈现了我们转让学习方法的比较结果,并报告了阶级阶级和数据间数据间际传输的最先进的行动分类方法。
translated by 谷歌翻译
We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. Unlike a recently proposed single-shot technique for this task [11] that only predicts an approximate 6D pose that must then be refined, ours is accurate enough not to require additional post-processing. As a result, it is much faster -50 fps on a Titan X (Pascal) GPU -and more suitable for real-time processing. The key component of our method is a new CNN architecture inspired by [28,29] that directly predicts the 2D image locations of the projected vertices of the object's 3D bounding box. The object's 6D pose is then estimated using a PnP algorithm.For single object and multiple object pose estimation on the LINEMOD and OCCLUSION datasets, our approach substantially outperforms other recent 26] when they are all used without postprocessing. During post-processing, a pose refinement step can be used to boost the accuracy of these two methods, but at 10 fps or less, they are much slower than our method.
translated by 谷歌翻译
Traditional approaches to extrinsic calibration use fiducial markers and learning-based approaches rely heavily on simulation data. In this work, we present a learning-based markerless extrinsic calibration system that uses a depth camera and does not rely on simulation data. We learn models for end-effector (EE) segmentation, single-frame rotation prediction and keypoint detection, from automatically generated real-world data. We use a transformation trick to get EE pose estimates from rotation predictions and a matching algorithm to get EE pose estimates from keypoint predictions. We further utilize the iterative closest point algorithm, multiple-frames, filtering and outlier detection to increase calibration robustness. Our evaluations with training data from multiple camera poses and test data from previously unseen poses give sub-centimeter and sub-deciradian average calibration and pose estimation errors. We also show that a carefully selected single training pose gives comparable results.
translated by 谷歌翻译
Multi-label ranking maps instances to a ranked set of predicted labels from multiple possible classes. The ranking approach for multi-label learning problems received attention for its success in multi-label classification, with one of the well-known approaches being pairwise label ranking. However, most existing methods assume that only partial information about the preference relation is known, which is inferred from the partition of labels into a positive and negative set, then treat labels with equal importance. In this paper, we focus on the unique challenge of ranking when the order of the true label set is provided. We propose a novel dedicated loss function to optimize models by incorporating penalties for incorrectly ranked pairs, and make use of the ranking information present in the input. Our method achieves the best reported performance measures on both synthetic and real world ranked datasets and shows improvements on overall ranking of labels. Our experimental results demonstrate that our approach is generalizable to a variety of multi-label classification and ranking tasks, while revealing a calibration towards a certain ranking ordering.
translated by 谷歌翻译
CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.
translated by 谷歌翻译
相干显微镜技术提供了跨科学和技术领域的材料的无与伦比的多尺度视图,从结构材料到量子设备,从综合电路到生物细胞。在构造更明亮的来源和高速探测器的驱动下,连贯的X射线显微镜方法(如Ptychography)有望彻底改变纳米级材料的特征。但是,相关的数据和计算需求显着增加意味着,常规方法不再足以从高速相干成像实验实时恢复样品图像。在这里,我们演示了一个工作流程,该工作流利用边缘的人工智能和高性能计算,以实现直接从检测器直接从检测器流出的X射线ptychography数据实时反演。拟议的AI支持的工作流程消除了传统的Ptychography施加的采样约束,从而使用比传统方法所需的数据较少的数据级允许低剂量成像。
translated by 谷歌翻译
肌电图信号可以通过机器学习模型用作训练数据,以对各种手势进行分类。我们试图制作一个模型,该模型可以将六个不同的手势分类为有限数量的样本,这些样本可以很好地概括为更广泛的受众,同时比较我们的功能提取结果对模型准确性的效果与其他更常规的方法(例如使用AR参数)在信号通道的滑动窗口上。我们诉诸于一组更基本的方法,例如在信号上使用随机界限,但是渴望在正在进行EMG分类的在线环境中展示这些力量,而不是更复杂的方法(例如使用傅立叶变换。为了增加我们有限的训练数据,我们使用了一种称为抖动的标准技术,在该技术中,以通道的方式将随机噪声添加到每个观察结果中。一旦使用上述方法生产了所有数据集,我们就进行了随机森林和XGBoost的网格搜索,以最终创建高精度模型。出于人类的计算机界面目的,高精度分类对于它们的功能特别重要,并且鉴于在大量的高量中积累任何形式的生物医学数据的困难和成本,具有低量工作的技术是有价值的具有较便宜的功能提取方法的高质量样品可以在在线应用中可靠地进行。
translated by 谷歌翻译
在线旅行社(OTA)的网站在元搜索竞标引擎上宣传。预测酒店将收到的单击数量的给定出价金额的问题是管理元搜索引擎上OTA广告活动的重要一步,因为出价时间的点击次数定义了要生成的成本。在这项工作中,各种回归器都结束了,以提高点击预测性能。按照预处理程序,将功能集分为火车和测试组,具体取决于样品的记录日期。然后,将数据收集进行基于XGBoost的缩小降低,从而大大降低了特征的维度。然后通过将贝叶斯高参数优化应用于XGBoost,LightGBM和SGD模型来找到最佳的高参数。单独测试了十种不同的机器学习模型,并将它们组合在一起以创建合奏模型。提出了三种替代合奏解决方案。相同的测试集用于测试单个和集合模型,46个模型组合的结果表明,堆栈集合模型得出所有的R2分数。总之,整体模型将预测性能提高了约10%。
translated by 谷歌翻译
多武装匪徒(MAB)在各种设置中进行广泛研究,其中目标是\ Texit {Maximize}随着时间的推移{Maximize}的措施(即,奖励)。由于安全在许多现实世界问题中至关重要,因此MAB算法的安全版本也获得了相当大的兴趣。在这项工作中,我们通过\ Texit {线性随机炸药杆}的镜头来解决不同的关键任务,其中目的是将动作靠近目标级别的结果,同时尊重\ Texit {双面}安全约束,我们调用\ textit {lecoling}。这种任务在许多域中普遍存在。例如,许多医疗保健问题要求在范围内保持生理变量,并且优选地接近目标水平。我们客观的激进变化需要一种新的采购策略,它是MAB算法的核心。我们提出Sale-LTS:通过线性汤普森采样算法进行安全调整,采用新的采集策略来适应我们的任务,并表明它达到了同一时间和维度依赖的索姆林的遗憾,因为以前的经典奖励最大化问题缺乏任何安全约束。我们通过彻底的实验展示并讨论了我们的算法的经验性能。
translated by 谷歌翻译