Monocular depth estimation, which plays a crucial role in understanding 3Dscene geometry, is an ill-posed problem. Recent methods have gained significantimprovement by exploring image-level information and hierarchical features fromdeep convolutional neural networks (DCNNs). These methods model depthestimation as a regression problem and train the regression networks byminimizing mean squared error, which suffers from slow convergence andunsatisfactory local solutions. Besides, existing depth estimation networksemploy repeated spatial pooling operations, resulting in undesirablelow-resolution feature maps. To obtain high-resolution depth maps,skip-connections or multi-layer deconvolution networks are required, whichcomplicates network training and consumes much more computations. To eliminateor at least largely reduce these problems, we introduce a spacing-increasingdiscretization (SID) strategy to discretize depth and recast depth networklearning as an ordinal regression problem. By training the network using anordinary regression loss, our method achieves much higher accuracy and\dd{faster convergence in synch}. Furthermore, we adopt a multi-scale networkstructure which avoids unnecessary spatial pooling and captures multi-scaleinformation in parallel. The method described in this paper achieves state-of-the-art results on fourchallenging benchmarks, i.e., KITTI [17], ScanNet [9], Make3D [50], and NYUDepth v2 [42], and win the 1st prize in Robust Vision Challenge 2018. Code hasbeen made available at: https://github.com/hufu6371/DORN.
translated by 谷歌翻译