UAS Navigation in the Real World Using Visual Observation

Yuci Han, Jianli Wei, Alper Yilmaz, Photogrammetric Computer Vision Lab., The Ohio State University, Columbus, OH, USA
{han.1489, wei.909, yilmaz.15}@osu.edu

Abstract

This paper presents a novel end-to-end Unmanned Aerial System (UAS) navigation approach for long-range visual navigation in the real world. Inspired by dual-process visual navigation system of human’s instinct: environment understanding and landmark recognition, we formulate the UAS navigation task into two same phases. Our system combines the reinforcement learning (RL) and image matching approaches. First, the agent learns the navigation policy using RL in the specified environment. To achieve this, we design an interactive UASNAV environment for the training process. Once the agent learns the navigation policy, which means ’familiarized themselves with the environment’, we let the UAS fly in the real world to recognize the landmarks using image matching method and take action according to the learned policy. During the navigation process, the UAS is embedded with single camera as the only visual sensor. We demonstrate that the UAS can learn navigating to the destination hundreds meters away from the starting point with the shortest path in the real world scenario.

deep reinforcement learning, image matching, visual navigation

I Introduction

UAS Navigation in a GPS-denied environment is a very challenging cognitive task almost out of reach in the past few decades. With the development of modern deep learning, recent research has demonstrated the agent’s visual navigation ability without GPS in the virtual simulation environment such as AI2THOR [11] and Gibson [10]. Unfortunately, the real world environment is more complicated and challenging due to illumination variation, seasonal difference and construction changes, which are always consistent in the virtual environment. Since the visual navigation strongly relies on robust visual feature representation of the environment, it makes the real world UAS navigation even harder.

Fig. 1: Two phases of our UAS navigation system. The agent learns the navigation policy using RL during the training process. While navigating in the real world, the agent recognize the landmarks and take action based on the learned policy.

In this work, we propose an architecture that integrates the reinforcement learning and image matching methods that can effectively address the challenging real world UAS navigation task using only visual observations without knowing the map. This is inspired by human’s navigation capability in a familiar environment. Imaging that we are walking following the guidance on the GoogleMap to a target place near our house, even though we lose the GPS and are not able to localize our position on the map, we can still find the way by recognizing the landmarks that we know and take action based on what we see to reach to the destination. Our navigation system is developed in two phases (see Fig.1). First, we make the UASNAV environment and train the agent to learn the navigation policy by interacting with the environment using reinforcement learning. During the learning process, the agent familiarizes itself with the predefined landmarks in this environment and generates the navigation policy based on these landmarks to find a shortest way to the target destination. Then, we validate the navigation ability of our agent in the real scenario using UAS. While flying at a certain level, the UAS continuously observes the environment with a down-looking view camera. We adopt SuperGlue [9] algorithm to match the observed image with our predefined landmarks, even though the features at the same location always change over time due to illumination variance or moving vehicles, we can always match two images sharing similar texture. Once the observation matches with the landmark, the UAS takes actions based on learned policy such as move forward, backward, left or right until reaching the goal point. Our experiment shows the agent’s ability to navigate to the target location hundreds meters away.

The contributions of this paper are (a) to present a RL framework for the UAS navigation task, (b) to develop a real scenario interactive environment for the learning process, and (c) to test the proposed approach in the real world using image matching for landmark recognition.

Ii Related work

Ii-a Visual Navigation

Visual navigation research in the virtual environment has shown promising results to traverse to the target with the shortest trajectory [11], [2], [4]. Some recent work focus on introducing memory to the navigation task, such as graph attention memory (GAM) based system [6] and visual graph memory (VGM) structure to embed its navigation history information [5]. However, the majority of work in this field is dedicated to use simulation environment and lack validation in the real world .

Ii-B Image Matching

Image matching as a fundamental concept in computer vision and pattern recognition is the general name of keypoint detection, description and matching. The goal is to recognize the same item in images taken from any angles, with any lighting and scale. Image matching are generally categorized as handcrafted and deep learning. Handcrafted image matching requires expert acknowledge to model local keypoint texture regardless of rotation or scale variation. Over past decades, growing amount of methods have been proposed [8]. SIFT and SURF [7, 1] are pieces of the art among them. Nowadays, many applications are still using them. Apart from handcrafted methods, deep descriptors are the main stream in recent years. Those descriptors keep the same vector structure but learnt by a deep learning framework. Paul-Edouard et al. proposed SuperGlue [9], an end-to-end framework to find matched features between two or more images.

Iii Methodology

Our proposed method consists of the policy learning and environment recognition phases. This section introduces the RL model, image matching model and the UASNAV environment.

Iii-a Policy Learning using Deep Reinforcement Learning

We formulate the navigation task as a Markov decision process (MDP) and implement the RL approach. The RL algorithm we adopt is the DCQN model proposed by [3]. The goal of the agent is to exploit a policy that rewards reaching the destination starting from any location. The navigation process contains a sequence of states, actions and rewards $(s_{0}, a_{0}, r_{0}, s_{1}, . . ., a_{N - 1}, r_{N - 1}, s_{N})$ . The state in this task is the UAS’s observation which is a RGB image. We make it a policy that the UAS makes decision on where to go while encountering the landmark. The actions are moving forward, backward, left and right. There are three situations for reward design: 0.1 upon reaching the goal, -0.001 while colliding and -0.0001 for time penalty to encourage finding the shortest trajectory. The agent continuously interacts in the environment to explore the optimal navigation policy by trial and error and maximize the accumulated reward.

Iii-B UASNAV Environment

The UASNAV environment is designed for training the agent in the policy learning phase. This environment spans a 400 meters by 300 meters area of residential block. We select 100 landmarks distributed in a 10 by 10 pattern in this area. The horizontal spacing is 40 meters and the vertical spacing is 30 meters between each landmark. These landmarks act as ground control points guiding the UAS to the goal. We collect satellite images from Google Maps at these locations to represent the landmark observation. This image is used as the input to the RL algorithm. The output of the DRL module is the optimal action the agent should take at this location (see Fig.2). The raw image resolution is $1280 \times 720$ , we use the $640 \times 480$ resized image in the image matching phase.

Fig. 2: The UASNAV environment spans a 400 meters by 300 meters residential area. We select 100 landmarks and get corresponding satellite image representations. This image is used as the input to the RL algorithm. The output of the DRL module is the optimal action the agent should take at this location

Iii-C Environment Recognition using Image Matching

In the UASNAV environment, we have the satellite image as a descriptor corresponded to each landmark. While navigating in the real environment, the UAS takes real-time observation using the embedded downward-looking camera in a certain frequency. In order to make decision on actions at these ground control point based on the learned policy, we need to enable the UAS to recognize the landmark during the navigation process. Since the UAS observation and satellite image at the same landmark position would contain similar texture. We therefore utilized image matching approach to match the environment landmark descriptor (satellite image) and UAS observation (real-time taken image).

Fig. 3: The landmark satellite image and UAS observations in different seasons

During the navigation process, the UAS continuously matches the current observation with four landmark (front, back, left, right) descriptors. We choose the one with the most matching points as the matched landmark. To ensure the UAS exactly arrived at the matched landmark position, we use affine matrix to calculate the center distance between the current observation and landmark satellite image. Once it has the shortest center distance, we consider the UAS has arrived at that landmark position and should take action correspondingly.

We adopt SuperGlue [9], which is a state of the art algorithm for image matching. It has robust performance with regard to illumination variation or seasonal changes (see Fig.3). Our method takes this advantage, therefore, shows the ability to resist environment variation and is applicable in different seasons and most weather conditions.

Iv Experiments

We conduct the experiment in two phases. We first train the agent in UASNAV environment to learn the navigation policy. Then, to validate our approach, we operate the UAS to do the goal reaching task in the real world environment to demonstrate its navigation capability without knowing the map.

Iv-a Navigation Policy Learning

We implement the DCQN [3] model for policy learning and observe that the agent successfully learns the navigation task. The training progress in Fig.4 shows the fast convergence of our algorithm. The reward in this figure indicates the cumulative reward computed at each episode as the agent reaches the destination. As shown in Fig.4, the agent learns the navigation policy within 200 episodes. We also run evaluation for 100 episodes in UASNAV environment and demonstrate that the agent is able to find the target in 6.53 steps in average.

Fig. 4: Training curve tracking the agent’s episode reward

Iv-B Real World Navigation

For the real world test, we use UAS to fly in the residential area which is the same as the UASNAV environment. The UAS is first set to a known starting point and then uses only a down-looking camera to get observation during the whole process. Once it matches with the landmark, the drone will choose an action to either fly forward, backward, left or right based on the output policy of RL algorithm. We adopt the pretrained SuperGlue algorithm for the purpose of image matching in our approach without additional training. The UAS trajectory and image matching results at the landmark location are shown in Fig.5.

Fig. 5: UAS navigation trajectory in the real environment and image matching results at example landmark locations (landmark satellite image vs. UAS observation)

V Conclusion

We propose a dual phases system for UAS navigation in the real world environment. We formulate the navigation task as MDP and implement RL approach for policy learning. During the real application, we introduce the ’ground control point’ concept and adopt image matching method for landmark recognition. Our approach achieves UAS navigation with shortest path without GPS support in an end-to-end manner and has robust performance with regard to the scene changes due to season and weather variation. We demonstrate the UAS’s ability to reach the goal hundreds meters away. While conducting the real test, we notice that the UAS has only four action choices at each landmark location, which is inflexible and disables the shortest path to a straight line. The UAS is set to be north oriented during the test, which somehow limits its real application. Besides, our approach is applicable to larger scale environment. To achieve that, we need to set more landmarks, therefore, the limit actions will increase the flying distance which is inefficient. Also, adding more landmarks makes the policy learning process time-consuming. We will tackle these problems by enlarging the action space and optimize the RL algorithm in the future work.

Vi Acknowledgements

This work is supported by the funding provided from the US Army Office of Research grant AWD-110906.

References

[1] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool (2008) Speeded-up robust features (surf). Computer vision and image understanding 110 (3), pp. 346–359. Cited by: §II-B.
[2] J. Bruce, N. Sünderhauf, P. W. Mirowski, R. Hadsell, and M. Milford (2017) One-shot reinforcement learning for robot navigation with interactive replay. ArXiv abs/1711.10137. Cited by: §II-A.
[3] Y. Han and A. Yilmaz (2021-06) Dynamic Routing for Navigation in Changing Unknown Maps Using Deep Reinforcement Learning. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 51, pp. 145–150. External Links: Document Cited by: §III-A, §IV-A.
[4] J. Kulhánek, E. Derner, and R. Babuška (2021) Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning. IEEE Robotics and Automation Letters 6 (3), pp. 4345–4352. Cited by: §II-A.
[5] O. Kwon, N. Kim, Y. Choi, H. Yoo, J. Park, and S. Oh (2021-10) Visual graph memory with unsupervised representation for visual navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15890–15899. Cited by: §II-A.
[6] D. Li, D. Zhao, Q. Zhang, Y. Zhuang, and B. Wang (2019) Graph attention memory for visual navigation. ArXiv abs/1905.13315. Cited by: §II-A.
[7] D. G. Lowe (2004) Distinctive image features from scale-invariant keypoints. International journal of computer vision 60 (2), pp. 91–110. Cited by: §II-B.
[8] J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan (2021) Image matching from handcrafted to deep features: a survey. International Journal of Computer Vision 129 (1), pp. 23–79. Cited by: §II-B.
[9] P. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich (2020) SuperGlue: learning feature matching with graph neural networks. In CVPR, Cited by: §I, §II-B, §III-C.
[10] F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese (2018-06) Gibson env: real-world perception for embodied agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
[11] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. K. Gupta, L. Fei-Fei, and A. Farhadi (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3357–3364. Cited by: §I, §II-A.