Stairs are common building structures in urban environment, and stair detection is an important part of environment perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a neural network architecture with inputs of both RGB map and depth map. Specifically, we design the selective module which can make the network learn the complementary relationship between RGB map and depth map and effectively combine the information from RGB map and depth map in different scenes. In addition, we also design a line clustering algorithm for the post-processing of detection results, which can make full use of the detection results to obtain the geometric parameters of stairs. Experiments on our dataset show that our method can achieve better accuracy and recall compared with the previous state-of-the-art deep learning method, which are 5.64% and 7.97%, respectively. Our method also has extremely fast detection speed, and a lightweight version can achieve 300 + frames per second with the same resolution, which can meet the needs of most real-time detection scenes.
translated by 谷歌翻译
最近,自我监督的学习技术已经应用于计算单眼视频的深度和自我运动,实现了自动驾驶场景中的显着性能。一种广泛采用的深度和自我运动自我监督学习的假设是图像亮度在附近框架内保持恒定。遗憾的是,内窥镜场景不符合这种假设,因为在数据收集期间的照明变化,非灯泡反射和孤立性引起的严重亮度波动,并且这些亮度波动不可避免地恶化深度和自我运动估计精度。在这项工作中,我们介绍了一个新颖的概念,称为外观流动,以解决亮度不一致问题。外观流程考虑了亮度图案中的任何变型,使我们能够开发广义动态图像约束。此外,我们建立一个统一的自我监督框架,以在内窥镜场景中同时估计单眼深度和自我运动,该内窥镜场景包括结构模块,运动模块,外观模块和对应模块,以准确地重建外观并校准图像亮度。广泛的实验是在害怕的数据集和内酷数据集上进行的,拟议的统一框架超过了大幅度的其他自我监控方法。为了验证我们在不同患者和相机上的框架的泛化能力,我们训练我们的模型害怕,但在没有任何微调的情况下测试它在Serv-CT和Hamlyn数据集上,并且卓越的结果揭示了其强大的泛化能力。代码将可用:\ url {https://github.com/shuweishao/af-sfmlearner}。
translated by 谷歌翻译
深度估计在计算机视觉社区中越来越受欢迎,并且仍然很难仅使用一个单个RGB图像恢复精确的深度图。在这项工作中,我们观察了现有方法倾向于表现出不对称误差的现象,这可能会为准确和坚固的深度估计开辟一个新的方向。我们仔细调查了该现象,并构建了一个两级合奏计划Nenet,将多种预测的多种预测集成到不同的基础预测。 NENET形成更可靠的深度估计器,这大大提升了基础预测器的性能。值得注意的是,这是第一次尝试引入集成学习,并评估其符合我们知识中的单眼深度估计的效用。广泛的实验表明,拟议的NENET比NYU-Deaft-V2和Kitti数据集上以前的最先进方法实现了更好的结果。特别是,我们的方法将先前最先进的方法从0.365到0.349上的NYU数据集上的公制RMSE提高到0.349。为了验证相机的概括性,我们直接将培训的型号应用于NYU数据集的模型到Sun RGB-D数据集,而无需任何微调,并且实现了卓越的结果,这表明其具有强大的普遍性。源代码和培训的型号将公开接受。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
We present SODA: the first publicly available, million-scale high-quality social dialogue dataset. Using SODA, we train COSMO: a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets. In contrast to most existing crowdsourced, small-scale dialogue corpora, we distill 1.5M socially-grounded dialogues from a pre-trained language model (InstructGPT; Ouyang et al., 2022). Dialogues are distilled by contextualizing social commonsense knowledge from a knowledge graph (Atomic10x; West et al., 2022). Human evaluation shows that dialogues in SODA are more consistent, specific, and (surprisingly) natural than prior human-authored datasets - e.g., DailyDialog (Li et al., 2017), BlendedSkillTalk (Smith et al., 2020). In addition, extensive evaluations show that COSMO is significantly more natural and consistent on unseen datasets than best-performing dialogue models - e.g., GODEL (Peng et al., 2022), BlenderBot (Roller et al., 2021), DialoGPT (Zhang et al., 2020). Furthermore, it is sometimes even preferred to the original human-written gold responses. We make our data, models, and code public.
translated by 谷歌翻译
We propose a novel task, G4C (Goal-driven Guidance Generation in Grounded Communication), for studying goal-driven and grounded natural language interactions. Specifically, we choose Dungeons and Dragons (D&D) -- a role-playing game consisting of multiple player characters and a Dungeon Master (DM) who collaborate to achieve a set of goals that are beneficial to the players -- as a testbed for this task. Here, each of the player characters is a student, with their own personas and abilities, and the DM is the teacher, an arbitrator of the rules of the world and responsible for assisting and guiding the students towards a global goal. We propose a theory-of-mind-inspired methodology for training such a DM with reinforcement learning (RL), where a DM: (1) learns to predict how the players will react to its utterances using a dataset of D&D dialogue transcripts; and (2) uses this prediction as a reward function providing feedback on how effective these utterances are at guiding the players towards a goal. Human and automated evaluations show that a DM trained with RL to generate guidance by incorporating a theory-of-mind of the players significantly improves the players' ability to achieve goals grounded in their shared world.
translated by 谷歌翻译
Aiming at the current problems of theory-oriented,practice-light,and lack of innovation ability in the teaching of postgraduate software engineering courses,a multi-stage feedback teaching mode for software engineering postgraduates based on competition project_driven is proposed. The model is driven by the competition project,and implementing suggestions are given in terms of stage allocation of software engineering course tasks and ability cultivation,competition case design and process evaluation improvement,etc. Through the implementation of this teaching mode,students enthusiasm and initiative are expected to be stimulated,and the overall development of students professional skills and comprehension ability would be improved to meet the demand of society for software engineering technical talents.
translated by 谷歌翻译
We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests). Experiments over two annotation tasks suggest that POTATO improves labeling speed through its specially-designed productivity features, especially for long documents and complex tasks. POTATO is available at https://github.com/davidjurgens/potato and will continue to be updated.
translated by 谷歌翻译
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervision requirements. As a result, these models provide patients only with a completely-certain point estimation of time-to-event, and they could only learn from the well-annotated WSI data currently at a small scale. To tackle these problems, we propose a novel adversarial multiple instance learning (AdvMIL) framework. This framework is based on adversarial time-to-event modeling, and it integrates the multiple instance learning (MIL) that is much necessary for WSI representation learning. It is a plug-and-play one, so that most existing WSI-based models with embedding-level MIL networks can be easily upgraded by applying this framework, gaining the improved ability of survival distribution estimation and semi-supervised learning. Our extensive experiments show that AdvMIL could not only bring performance improvement to mainstream WSI models at a relatively low computational cost, but also enable these models to learn from unlabeled data with semi-supervised learning. Our AdvMIL framework could promote the research of time-to-event modeling in computational pathology with its novel paradigm of adversarial MIL.
translated by 谷歌翻译