题名

基於深度學習之視覺定位-以故宮南院為例

并列篇名

VISUAL LOCALIZATION BASED ON DEEP LEARNING - TAKE SOUTHERN BRANCH OF THE NATIONAL PALACE MUSEUM FOR EXAMPLE

DOI

10.6652/JoCICHE.202205_34(3).0004

作者

凃嘉濠(Chia-Hao Tu);呂學展(Eric Hsueh-Chan Lu)

关键词

視覺定位 ; 深度學習 ; 卷積神經網路 ; visual localization ; deep learning ; convolutional neural network

期刊名称

中國土木水利工程學刊

卷期/出版年月

34卷3期(2022 / 05 / 01)

页次

215 - 220

内容语文

繁體中文

中文摘要

視覺定位利用影像迴歸相機的位置與方位,在電腦視覺領域上有非常多的應用,例如自動駕駛、擴增實境(AR)與虛擬實境(VR)等等。而深度學習中的卷積神經網路模擬生物的視覺有很好的影像特徵萃取能力,因此在視覺定位上使用卷積神經網路能更有效萃取特徵進而提升迴歸準確度。雖然本團隊已對故宮南院場域建立過基於深度學習的影像室內定位模型,但後續有更好的卷積神經網路及視覺定位損失函數被提出,因此本論文嘗試使用新的網路及損失函數以取得更好的定位準確度。在本論文中,我們使用ResNet-50作為骨幹網路,將輸出層改為3維的位置和4維的方位四元數,並使用具有可學習權重的損失函數結合位置與方位。我們使用過去收集的故宮南院資料集進行實驗,比較了不同的預訓練模型與正規化方法對定位準確度的影響,實驗的最佳結果提升了約60%的定位準確度。

英文摘要

Visual localization uses images to regress camera position and orientation. It has many applications in computer vision such as autonomous driving, augmented reality (AR) and virtual reality (VR), and so on. The convolutional neural network simulates biological vision and has a good image feature extraction ability, so using it in visual localization can improve regression accuracy. Although our team has built an image indoor localization model for Southern Branch of the National Palace Museum, this paper tries to use new network and loss function to achieve better positioning accuracy. In this paper, we use ResNet-50 as backbone network, and change the output layer to 3-dimensional position and 4-dimensional orientation quaternion, and use learnable weights loss function. We compare different pretrained models and normalization methods, and the best result improves the positioning accuracy by about 60%.

主题分类 工程學 > 土木與建築工程
工程學 > 水利工程
工程學 > 市政與環境工程
参考文献
  1. Brahmbhatt, S.,Gu, J.,Kim, K.,Hays, J.,Kautz, J.(2018).Geometry-aware learning of maps for camera localization.IEEE Conference on Computer Vision and Pattern Recognition
  2. Clark, R.,Wang, S.,Markham, A.,Trigoni, N.,Wen, H.(2017).Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization.IEEE Conference on Computer Vision and Pattern Recognition
  3. He, K.,Zhang, X.,Ren, S.,Sun, J.(2016).Deep residual learning for image recognition.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  4. Kendall, A.,Cipolla, R.(2016).Modelling uncertainty in deep learning for camera relocalization.2016 IEEE international conference on Robotics and Automation (ICRA)
  5. Kendall, A.,Cipolla, R.(2017).Geometric loss functions for camera pose regression with deep learning.Proceedings of the IEEE conference on computer vision and pattern recognition
  6. Kendall, A.,Grimes, M.,Cipolla, R.(2015).Posenet: A convolutional network for real-time 6-dof camera relocalization.Proceedings of the IEEE international conference on computer vision
  7. Krizhevsky, A.,Sutskever, I.,Hinton, G. E.(2012).Imagenet classification with deep convolutional neural networks.Advances in Neural Information Processing Systems,25,1097-1105.
  8. LeCun, Y.,Bottou, L.,Bengio, Y.,Haffner, P.(1998).Gradient-based learning applied to document recognition.Proceedings of the IEEE,86(11),2278-2324.
  9. Lu, E. H. C.,Ciou, J. M.(2020).Integration of convolutional neural network and error correction for indoor positioning.ISPRS International Journal of Geo-Information,9(2),74.
  10. Melekhov, I.,Ylioinas, J.,Kannala, J.,Rahtu, E.(2017).Image-based localization using hourglass networks.IEEE International Conference on Computer Vision Workshops
  11. Naseer, T.,Burgard, W.(2017).Deep regression for monocular camera-based 6-dof global localization in outdoor environments.IEEE/RSJ International Conference on Intelligent Robots and Systems
  12. Radwan, N.,Valada, A.,Burgard, W.(2018).Vlocnet++: Deep multitask learning for semantic visual localization and odometry.IEEE Robotics and Automation Letters,3(4),4407-4414.
  13. Simonyan, K.,Zisserman, A.(2014).,未出版
  14. Szegedy, C.,Liu, W.,Jia, Y.,Sermanet, P.,Reed, S.,Anguelov, D.,Rabinovich, A.(2015).Going deeper with convolutions.Proceedings of the IEEE conference on comput-er vision and pattern recognition
  15. Valada, A.,Radwan, N.,Burgard, W.(2018).Deep auxiliary learning for visual localiza-tion and odometry.IEEE International Conference on Robotics and Automation
  16. Walch, F.,Hazirbas, C.,Leal-Taixe, L.,Sattler, T.,Hilsenbeck, S.,Cremers, D.(2017).Image-based localization using lstms for structured feature correlation.IEEE International Conference on Computer Vision
  17. Wang, S.,Clark, R.,Wen, H.,Trigoni, N.(2017).Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks.IEEE International Conference on Robotics and Automation
  18. Wu, J.,Ma, L.,Hu, X.(2017).Delving deeper into convolutional neural networks for camera relocalization.IEEE International Conference on Robotics and Automation