Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of a given matrix from sample entries, i.e. from small random submatrices. Although a number of sample-based methods exist for this problem (e.g. Oja's rule \citep{oja1982simplified}), these assume access to full columns of the matrix or particular matrix structure such as symmetry and cannot be combined as-is with neural networks \citep{baldi1989neural}. In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns. Our method consists in defining a loss function whose minimizer is the desired principal subspace, and constructing a gradient estimate of this loss whose bias can be controlled. We complement our theoretical analysis with a series of experiments on synthetic matrices, the MNIST dataset \citep{lecun2010mnist} and the reinforcement learning domain PuddleWorld \citep{sutton1995generalization} demonstrating the usefulness of our approach.
translated by 谷歌翻译
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
translated by 谷歌翻译
在模板和搜索区域之间学习强大的功能匹配对于3D暹罗跟踪至关重要。暹罗功能匹配的核心是如何在模板和搜索区域之间的相应点上分配高特征相似性,以进行精确的对象本地化。在本文中,我们提出了一个新颖的点云登记驱动的暹罗跟踪框架,直觉是空间对齐相应点(通过3D注册)倾向于实现一致的特征表示。具体而言,我们的方法由两个模块组成,包括特定于特定的非局部注册模块和一个注册辅助的sindhorn模板 - 特征聚合模块。登记模块在模板和搜索区域之间的精确空间对齐中进行目标。提出了跟踪特异性的空间距离约束,以优化非局部模块中的交叉注意权重,以进行判别特征学习。然后,我们使用加权SVD来计算模板和搜索区域之间的刚性转换,并对齐它们以实现所需的空间对齐相应点。对于特征聚合模型,我们将转换模板和搜索区域之间的特征匹配作为最佳传输问题,并利用Sinkhorn优化来搜索异常型匹配匹配解决方案。同样,建造了登记辅助空间距离图,以改善无法区分的区域(例如光滑的表面)的匹配鲁棒性。最后,在获得的功能匹配地图的指导下,我们将目标信息从模板中汇总到搜索区域中以构建特定于目标的特征,然后将其馈送到一个类似中心点的检测头中以进行对象定位。关于Kitti,Nuscenes和Waymo数据集的广泛实验验证了我们提出的方法的有效性。
translated by 谷歌翻译
基于暹罗网络的跟踪器将3D单一对象跟踪作为模板和搜索区域的点特征之间的互相关学习。由于跟踪过程中模板和搜索区域之间的外观差异很大,因此如何学习它们之间的稳健跨相关性以识别搜索区域中的潜在目标仍然是一个挑战性的问题。在本文中,我们明确使用变压器形成一个3D Siamese变压器网络,以学习模板和点云的搜索区域之间的强大互相关。具体来说,我们开发了一个暹罗点变压器网络,以了解目标的形状上下文信息。它的编码器使用自我注意力来捕获点云的非本地信息来表征对象的形状信息,而解码器则利用交叉注意来提取歧视点特征。之后,我们开发了一个迭代的粗到加密相关网络,以了解模板与搜索区域之间的稳健跨相关性。它通过交叉注意将模板与搜索区域中的潜在目标联系起来,制定了交叉功能的增强。为了进一步增强潜在目标,它采用了自我功能增强,该增强功能将自我注意力应用于特征空间的本地K-NN图来汇总目标特征。 Kitti,Nuscenes和Waymo数据集的实验表明,我们的方法在3D单一对象跟踪任务上实现了最先进的性能。
translated by 谷歌翻译
在本文中,我们使用混合整数编程(MIP)探索基于模型的培训鲁棒和可解释的二金属化回归模型的培训鲁棒和可解释的二值化回归模型。我们的MIP模型通过使用加权目标来余额来实现预测边距和模型大小的优化,即:最大限度地减少错误分类的培训实例的总余量,最大限度地提高了正确分类的培训实例的总余量,并最大限度地提高了整体模型正则化。我们进行两组实验,以便在多个分类数据集的标准和损坏版本上测试MIP模型的分类准确性。在第一组实验中,我们表明我们的MIP模型优于等效的伪布尔优化(PBO)模型,并在标准数据集中的分类精度方面实现了对逻辑回归(LR)和梯度下降(GD)的竞争结果。在第二组实验中,我们表明我们的MIP模型在分类准确性方面优于大多数损坏的数据集的分类准确性。最后,我们在目视展示了MIP模型在其在MNIST DataSet上的学习参数方面的可解释性。总体而言,我们展示了使用MIP培训培训稳健和可解释的二值化回归模型的有效性。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.
translated by 谷歌翻译
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.
translated by 谷歌翻译
Mitosis nuclei count is one of the important indicators for the pathological diagnosis of breast cancer. The manual annotation needs experienced pathologists, which is very time-consuming and inefficient. With the development of deep learning methods, some models with good performance have emerged, but the generalization ability should be further strengthened. In this paper, we propose a two-stage mitosis segmentation and classification method, named SCMitosis. Firstly, the segmentation performance with a high recall rate is achieved by the proposed depthwise separable convolution residual block and channel-spatial attention gate. Then, a classification network is cascaded to further improve the detection performance of mitosis nuclei. The proposed model is verified on the ICPR 2012 dataset, and the highest F-score value of 0.8687 is obtained compared with the current state-of-the-art algorithms. In addition, the model also achieves good performance on GZMH dataset, which is prepared by our group and will be firstly released with the publication of this paper. The code will be available at: https://github.com/antifen/mitosis-nuclei-segmentation.
translated by 谷歌翻译
Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline dataset collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob's private information, leading to a confounding bias when using standard RL methods, and (ii) a distributional mismatch between the behavior policy used to collect data and the desired policy we aim to learn. To tackle the confounding bias, we treat Bob's previous action as an instrumental variable for Alice's current decision making so as to adjust for the unmeasured confounding. We develop a novel identification result and use it to propose a new off-policy evaluation (OPE) method for evaluating policy pairs in this two-player turn-based game. To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob. Finally, we prove that under mild assumptions such as partial coverage of the offline data, the policy pair obtained through our method converges to the optimal one at a satisfactory rate.
translated by 谷歌翻译