堆叠提高了架子上的存储效率,但是缺乏可见性和可访问性使机器人难以揭示和提取目标对象的机械搜索问题。在本文中,我们将横向访问机械搜索问题扩展到带有堆叠项目的架子,并引入了两种新颖的政策 - 堆叠场景(DARSS)和Monte Carlo Tree搜索堆叠场景(MCTSSS)的分配区域减少 - 使用Destacking和恢复行动。 MCTSS通过在每个潜在行动后考虑未来的状态来改善先前的LookAhead政策。在1200次模拟和18个物理试验中进行的实验,配备了刀片和吸力杯,这表明命令和重新攻击动作可以揭示目标对象的模拟成功率为82---100%,而在物理实验中获得了66----100%对于搜索密集包装的架子至关重要。在仿真实验中,这两种策略的表现都优于基线,并获得相似的成功率,但与具有完整状态信息的Oracle政策相比采取了更多步骤。在模拟和物理实验中,DARS在中位数步骤中的表现优于MCTSS,以揭示目标,但是MCTSS在物理实验中的成功率更高,表明对感知噪声的稳健性。请参阅https://sites.google.com/berkeley.edu/stax-ray,以获取补充材料。
translated by 谷歌翻译
联邦学习是一种广泛采用的方法,可以通过分布式数据训练神经网络。一个主要限制是数据异构地分布时发生的性能下降。虽然许多作品已经尝试解决这个问题,但这些方法是因为它们的内容而不是对神经网络的理解。在这项工作中,我们验证了神经网络中只有某些重要层数需要正规化以获得有效的培训。我们还验证了中心内核对齐(CKA)最精确地计算在不同数据上培训的神经网络层之间的相似性。通过在培训期间将基于CKA的正则化应用于重要层,我们显着提高了异构环境的性能。我们展示了Fedcka:一个简单的框架,在各种深度学习任务上出于以前的最先进方法,同时提高了效率和可扩展性。
translated by 谷歌翻译
柔软的钳口尖端几乎普遍地与平行钳口机器人夹持器普遍使用,因为它们可以增加接触面积和钳口之间的摩擦和要操纵的物体。然而,符合曲面和刚性物体之间的相互作用是难以模拟的。我们介绍了一种使用增量潜在联系人(IPC)的新型模拟器的IPC-Graspsim - 一个用于计算机图形学的2020年的变形模型 - 这既在抓住期间就模拟了符合JAW提示的动态和变形。 IPC-Graspsim使用一组2,000个物理掌握在16个对手对象中进行评估,其中标准分析模型表现不佳。与分析Quasistatic接触型号(软点接触,REACH,6DFC)和动态掌握模拟器(ISAAC健身房)(具有Flex后端的ISAAC健身房,结果表明IPC-Graspsim更准确地模拟现实世界掌握,增加F1得分9%。所有数据,代码,视频和补充材料都可以在https://sites.google.com/berkeley.edu/ipcgraspsim中获得。
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this work, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in logarithmic amount of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence and show that a larger initialization can be used as more samples are available. We observe that implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.
translated by 谷歌翻译
The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can distinguish between inherent artifacts or noise and diseases, restricting the use of this imaging modality. The development of artificial intelligence (AI)-based computer-aided diagnosis methods for CBCT to overcome the shortage of experienced physicians has attracted substantial attention. However, advanced AI-based diagnosis addressing intrinsic noise in CBCT has not been devised, discouraging the practical use of AI solutions for CBCT. To address this issue, we propose an AI-based computer-aided diagnosis method using CBCT with a denoising module. This module is implemented before diagnosis to reconstruct the internal ground-truth full-dose scan corresponding to an input CBCT image and thereby improve the diagnostic performance. The external validation results for the unified diagnosis of sinus fungal ball, chronic rhinosinusitis, and normal cases show that the proposed method improves the micro-, macro-average AUC, and accuracy by 7.4, 5.6, and 9.6% (from 86.2, 87.0, and 73.4 to 93.6, 92.6, and 83.0%), respectively, compared with a baseline while improving human diagnosis accuracy by 11% (from 71.7 to 83.0%), demonstrating technical differentiation and clinical effectiveness. This pioneering study on AI-based diagnosis using CBCT indicates denoising can improve diagnostic performance and reader interpretability in images from the sinonasal area, thereby providing a new approach and direction to radiographic image reconstruction regarding the development of AI-based diagnostic solutions.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.
translated by 谷歌翻译
由于其高质量的重建以及将现有迭代求解器结合起来的易于性,因此最近将扩散模型作为强大的生成反问题解决器研究。但是,大多数工作都专注于在无噪声设置中解决简单的线性逆问题,这显着不足以使实际问题的复杂性不足。在这项工作中,我们将扩散求解器扩展求解器,以通过后采样的拉普拉斯近似有效地处理一般噪声(非)线性反问题。有趣的是,所得的后验采样方案是扩散采样的混合版本,具有歧管约束梯度,而没有严格的测量一致性投影步骤,与先前的研究相比,在嘈杂的设置中产生了更可取的生成路径。我们的方法表明,扩散模型可以结合各种测量噪声统计量,例如高斯和泊松,并且还有效处理嘈杂的非线性反问题,例如傅立叶相检索和不均匀的脱毛。
translated by 谷歌翻译
本文分析了三种具有不同韵律系统的语言的违反语音数据集:英语,韩语和泰米尔语。我们检查39个声学测量值,反映了三个语音维度,包括语音质量,发音和韵律。作为多语言分析,通过可理解水平对声学测量的平均值进行检查。此外,执行自动清晰度分类以审查语言设置的最佳功能。分析表明发音特征,例如正确的辅音百分比,正确的元音百分比以及正确的音素比例为语言无关的测量。但是,语音质量和韵律特征通常通过语言呈现不同的方面。实验结果还表明,不同的语音维度对不同的语言起着更大的作用:英语的韵律,韩语的发音,韵律和泰米尔语的发音。本文有助于言语病理学,因为它在英语,韩语和泰米尔语构想中的可理解分类中区分了与语言无关和语言依赖性测量。
translated by 谷歌翻译