数字地形模型(DTM)是城市,环境和地球科学各种研究的基本地理空间数据。从此类研究获得的结果的可靠性可能会受到基础DTM的错误和不确定性的重大影响。已经开发了许多算法来减轻DTM的错误和不确定性。但是,大多数算法都涉及棘手的参数选择和使该算法的决策规则掩盖的复杂过程,因此通常很难解释和预测所得DTM的错误和不确定性。同样,以前的算法通常考虑每个点的局部邻域以区分非地面对象,这限制了搜索半径和上下文理解,并且可能会易于错误,尤其是在点密度变化的情况下。这项研究提出了一种用于机载激光雷达数据的开源DTM生成算法,该算法可以考虑到本地邻域之外,并且易于解释,可预测和可靠的结果。该算法的关键假设是,地面是平稳连接的,而非地面被带有急剧高度变化的区域所包围。与其他最先进的算法相比,通过平铺评估评估了所提出算法的鲁棒性和独特性。
translated by 谷歌翻译
2D和3D建筑图提供了宝贵的信息,以了解人类活动及其对地球及其环境的影响。尽管为提高建筑地图的质量而做出了巨大努力,但自动化方法产生的当前大规模建筑地图仍存在许多错误和不确定性,并且通常仅限于提供2D建筑信息。这项研究提出了一种开源无监督的2D和3D建筑物提取算法,并带有适用于大型建筑物映射的机载LIDAR数据。我们的算法以完全无监督的方式运行,不需要任何培训标签或培训程序。我们的算法由形态过滤和基于平面的过滤组成。因此,计算是有效的,结果易于预测,这可以大大减少所得建筑图中的不确定性。丹佛和纽约市的大规模数据集(> 550 $ km^2 $)的定量和定性评估表明,我们的算法比通过基于深度学习的方法生成的Microsoft Building Footprints可以产生更准确的建筑图。在不同条件下进行的广泛评估证实,我们的算法是可扩展的,可以通过适当的参数选择进一步改进。我们还详细介绍了参数和潜在错误来源的影响,以帮助我们算法的潜在用户。我们的基于激光雷达的算法具有优势,即生成2D和3D构建图在计算上有效,而它产生了准确且可解释的结果。我们提出的算法为带有机载激光雷达数据的全球尺度2D和3D建筑物映射提供了巨大的潜力。
translated by 谷歌翻译
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
translated by 谷歌翻译
Supervision for metric learning has long been given in the form of equivalence between human-labeled classes. Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances of the field. In this regard, we propose a new regularization method, dubbed HIER, to discover the latent semantic hierarchy of training data, and to deploy the hierarchy to provide richer and more fine-grained supervision than inter-class separability induced by common metric learning losses. HIER achieved this goal with no annotation for the semantic hierarchy but by learning hierarchical proxies in hyperbolic spaces. The hierarchical proxies are learnable parameters, and each of them is trained to serve as an ancestor of a group of data or other proxies to approximate the semantic hierarchy among them. HIER deals with the proxies along with data in hyperbolic space since geometric properties of the space are well-suited to represent their hierarchical structure. The efficacy of HIER was evaluated on four standard benchmarks, where it consistently improved performance of conventional methods when integrated with them, and consequently achieved the best records, surpassing even the existing hyperbolic metric learning technique, in almost all settings.
translated by 谷歌翻译
Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding. Specifically, we adopt the actor-critic framework to train an LM-steering critic from non-differentiable reward models. And similar to weighted decoding, our method freezes the language model and manipulates the output token distribution using called critic, improving training efficiency and stability. Evaluation of our method on three controlled generation tasks, namely topic control, sentiment control, and detoxification, shows that our approach generates more coherent and well-controlled texts than previous methods. In addition, CriticControl demonstrates superior generalization ability in zero-shot settings. Human evaluation studies also corroborate our findings.
translated by 谷歌翻译
Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
Recent studies have proposed a unified user modeling framework that leverages user behavior data from various applications. Most benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.
translated by 谷歌翻译
While witnessing the noisy intermediate-scale quantum (NISQ) era and beyond, quantum federated learning (QFL) has recently become an emerging field of study. In QFL, each quantum computer or device locally trains its quantum neural network (QNN) with trainable gates, and communicates only these gate parameters over classical channels, without costly quantum communications. Towards enabling QFL under various channel conditions, in this article we develop a depth-controllable architecture of entangled slimmable quantum neural networks (eSQNNs), and propose an entangled slimmable QFL (eSQFL) that communicates the superposition-coded parameters of eS-QNNs. Compared to the existing depth-fixed QNNs, training the depth-controllable eSQNN architecture is more challenging due to high entanglement entropy and inter-depth interference, which are mitigated by introducing entanglement controlled universal (CU) gates and an inplace fidelity distillation (IPFD) regularizer penalizing inter-depth quantum state differences, respectively. Furthermore, we optimize the superposition coding power allocation by deriving and minimizing the convergence bound of eSQFL. In an image classification task, extensive simulations corroborate the effectiveness of eSQFL in terms of prediction accuracy, fidelity, and entropy compared to Vanilla QFL as well as under different channel conditions and various data distributions.
translated by 谷歌翻译
The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can distinguish between inherent artifacts or noise and diseases, restricting the use of this imaging modality. The development of artificial intelligence (AI)-based computer-aided diagnosis methods for CBCT to overcome the shortage of experienced physicians has attracted substantial attention. However, advanced AI-based diagnosis addressing intrinsic noise in CBCT has not been devised, discouraging the practical use of AI solutions for CBCT. To address this issue, we propose an AI-based computer-aided diagnosis method using CBCT with a denoising module. This module is implemented before diagnosis to reconstruct the internal ground-truth full-dose scan corresponding to an input CBCT image and thereby improve the diagnostic performance. The external validation results for the unified diagnosis of sinus fungal ball, chronic rhinosinusitis, and normal cases show that the proposed method improves the micro-, macro-average AUC, and accuracy by 7.4, 5.6, and 9.6% (from 86.2, 87.0, and 73.4 to 93.6, 92.6, and 83.0%), respectively, compared with a baseline while improving human diagnosis accuracy by 11% (from 71.7 to 83.0%), demonstrating technical differentiation and clinical effectiveness. This pioneering study on AI-based diagnosis using CBCT indicates denoising can improve diagnostic performance and reader interpretability in images from the sinonasal area, thereby providing a new approach and direction to radiographic image reconstruction regarding the development of AI-based diagnostic solutions.
translated by 谷歌翻译