Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period.
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis. This conventional training causes overfitting for both the discriminators and the generator, leading to the periodicity artifacts in the generated audio signal. In this work, we present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping. With our proposed method, we outperform baselines without any architecture modification. Code and audio samples will be available at https://github.com/mindslab-ai/phaseaug.
translated by 谷歌翻译
Recommender systems are a long-standing research problem in data mining and machine learning. They are incremental in nature, as new user-item interaction logs arrive. In real-world applications, we need to periodically train a collaborative filtering algorithm to extract user/item embedding vectors and therefore, a time-series of embedding vectors can be naturally defined. We present a time-series forecasting-based upgrade kit (TimeKit), which works in the following way: it i) first decides a base collaborative filtering algorithm, ii) extracts user/item embedding vectors with the base algorithm from user-item interaction logs incrementally, e.g., every month, iii) trains our time-series forecasting model with the extracted time- series of embedding vectors, and then iv) forecasts the future embedding vectors and recommend with their dot-product scores owing to a recent breakthrough in processing complicated time- series data, i.e., neural controlled differential equations (NCDEs). Our experiments with four real-world benchmark datasets show that the proposed time-series forecasting-based upgrade kit can significantly enhance existing popular collaborative filtering algorithms.
translated by 谷歌翻译
6多机器人抓钩是一个持久但未解决的问题。最近的方法利用强3D网络从深度传感器中提取几何抓握表示形式,表明对公共物体的准确性卓越,但对光度化挑战性物体(例如,透明或反射材料中的物体)进行不满意。瓶颈在于这些物体的表面由于光吸收或折射而无法反射准确的深度。在本文中,与利用不准确的深度数据相反,我们提出了第一个称为MonograspNet的只有RGB的6-DOF握把管道,该管道使用稳定的2D特征同时处理任意对象抓握,并克服由光学上具有挑战性挑战的对象引起的问题。 MonograspNet利用关键点热图和正常地图来恢复由我们的新型表示形式表示的6-DOF抓握姿势,该表示的2D键盘具有相应的深度,握把方向,抓握宽度和角度。在真实场景中进行的广泛实验表明,我们的方法可以通过在抓住光学方面挑战的对象方面抓住大量对象并超过基于深度的竞争者的竞争成果。为了进一步刺激机器人的操纵研究,我们还注释并开源一个多视图和多场景现实世界抓地数据集,其中包含120个具有20m精确握把标签的混合光度复杂性对象。
translated by 谷歌翻译
数字地形模型(DTM)是城市,环境和地球科学各种研究的基本地理空间数据。从此类研究获得的结果的可靠性可能会受到基础DTM的错误和不确定性的重大影响。已经开发了许多算法来减轻DTM的错误和不确定性。但是,大多数算法都涉及棘手的参数选择和使该算法的决策规则掩盖的复杂过程,因此通常很难解释和预测所得DTM的错误和不确定性。同样,以前的算法通常考虑每个点的局部邻域以区分非地面对象,这限制了搜索半径和上下文理解,并且可能会易于错误,尤其是在点密度变化的情况下。这项研究提出了一种用于机载激光雷达数据的开源DTM生成算法,该算法可以考虑到本地邻域之外,并且易于解释,可预测和可靠的结果。该算法的关键假设是,地面是平稳连接的,而非地面被带有急剧高度变化的区域所包围。与其他最先进的算法相比,通过平铺评估评估了所提出算法的鲁棒性和独特性。
translated by 谷歌翻译
随着深度学习(DL)的引入,常用心电图(ECG)诊断模型的性能改善。但是,尚未充分研究多个DL组件的各种组合和/或数据增强技术对诊断的作用的影响。这项研究提出了一种基于集合的多视图学习方法,采用ECG增强技术,比传统的12级ECG诊断方法获得更高的性能。数据分析结果表明,所提出的模型报告的F1得分为0.840,这表现优于文献中现有的最新方法。
translated by 谷歌翻译
Defocus Blur是大多数相机中使用的光学传感器的物理后果。尽管它可以用作摄影风格,但通常被视为图像降解,以形成模型的尖锐图像,并具有空间变化的模糊内核。在过去几年的模糊估计方法的推动下,我们提出了一种非盲方法来处理图像脱毛的方法,可以处理空间变化的核。我们介绍了两个编码器子网络网络,它们分别用模糊图像和估计的模糊图,并作为输出作为输出(Deconvolved)图像的输出。每个子网络都会呈现几个跳过连接,这些连接允许分开分开的数据传播,还可以通过划线跳过连接,以简化模块之间的通信。该网络经过合成的模糊内核训练,这些核被增强以模拟现有模糊估计方法产生的模糊图,我们的实验结果表明,当与多种模糊估计方法结合使用时,我们的方法很好地工作。
translated by 谷歌翻译
我们建议并研究一种具有内在网络结构的数据的新型图形聚类方法。与光谱聚类类似,我们利用数据的固有网络结构来构建欧几里得特征向量。然后可以将这些特征向量馈入基本的聚类方法,例如基于K均值或高斯混合模型(GMM)的软聚类。除了光谱聚类之外,我们的方法设定的原因是,我们不使用图形laplacian的特征向量来构建特征向量。取而代之的是,我们使用总变异最小化问题的解决方案来构建反映数据点之间连接性的特征向量。我们的动机是,总变异最小化的溶液在给定的一组种子节点周围是零件的常数。这些种子节点可以从域知识或基于数据网络结构的简单启发式方法中获得。我们的结果表明,我们的聚类方法可以应对某些对光谱聚类方法具有挑战性的图形结构。
translated by 谷歌翻译