题名

以多視圖序列學習作基於圖像之三維模型跨域搜索

并列篇名

Cross-Domain Image-Based 3D Shape Retrieval by View Sequence Learning

DOI

10.6342/NTU201703061

作者

李唐

关键词

三維模型 ; 卷積神經網路 ; 三元神經網路 ; 跨域度量學習 ; 3D Shape ; Convolutional Neural Network ; Triplet Neural Network ; Cross-Domain Metric Learning

期刊名称

國立臺灣大學電機工程學系學位論文

卷期/出版年月

2017年

学位类别

碩士

导师

徐宏民

内容语文

英文

中文摘要

我們提出一個用於跨領域基於自然圖片之三維模型搜尋的方法,可端對端學習圖片及三維模型共同的特徵空間。我們可根據圖片和三維模型之相似度搜尋,相似度則可由二者在特徵空間中的距離求得。首先,我們提出一個三維模型的特徵抽取方法,稱為跨視圖卷積 (cross-view convolution, CVC)。跨視圖卷積將三維模型之不同角度的二維視圖特徵根據其順序結合,以得出三維模型的整體特徵。為拉近二維自然圖片特徵和三維模型特徵之間領域的差異,我們提出了跨領域 三元神經網路 (cross-domain triplet neural network, CDTNN)。該模型在類神經網路中加入一個轉換層,使得圖片特徵經過轉換後能直接與三維模型特徵比較。該模型可以端對端地訓練。最後,我們提出加速版本的跨領域三元神經網路訓練的方法,大幅減少訓練時間。為實驗模型有效性,我們建立了一個龐大的資料集,其中包含自然圖片和三維模型。實驗結果顯示,我們的方法勝過其他當前最好的方法。同時我們也實驗了各種不同的網路結構設計,以減少記憶體及計算資源的使用。

英文摘要

We propose a cross-domain image-based 3D shape retrieval method, which learns a joint embedding space for natural images and 3D shapes in an end-to-end manner. The similarities between images and 3D shapes can be computed as the distances in this embedding space. To better encode a 3D shape, we propose a new feature aggregation method, Cross-View Convolution (CVC), which models a 3D shape as a sequence of rendered views. For bridging the gaps between images and 3D shapes, we propose a Cross-Domain Triplet Neural Network (CDTNN) that incorporates an adaptation layer to match the features from different domains better and can be trained end-to-end. In addition, we speed up the triplet training process by presenting a new fast cross-domain triplet neural network architecture. We evaluate our method on a new image to 3D shape dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches in terms of retrieval performance. We also provide in-depth analysis of various design choices to further reduce the memory storage and computational cost.

主题分类 電機資訊學院 > 電機工程學系
工程學 > 電機工程
参考文献
  1. [1] M. Allen, L. Girod, R. Newton, S. Madden, D. T. Blumstein, and D. Estrin. Voxnet: An interactive, rapidly-deployable acoustic monitoring platform. In IPSN, 2008.
    連結:
  2. [3] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014.
    連結:
  3. [4] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV, 2016.
    連結:
  4. [6] R. Girdhar, D. F. Fouhey, M. Rodriguez, and A. Gupta. Learning a predictable and generative vector representation for objects. In ECCV, 2016.
    連結:
  5. [8] A. Krizhevsky, I. Sutskever, and G. E. H. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    連結:
  6. [9] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck, M. Aono, M. Burtscher, Q. Chen, N. K. Chowdhury, B. Fang, et al. A comparison of 3d shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015.
    連結:
  7. [11] Y. Li, H. Su, C. R. Qi, N. Fish, D. Cohen-Or, and L. J. Guibas. Joint embeddings of shapes and images via cnn image purification. ACM Transactions on Graph, 2015.
    連結:
  8. [12] J. J. Lim, H. Pirsiavash, and A. Torralba. Parsing ikea objects: Fine pose estimation. In ICCV, 2013.
    連結:
  9. [13] F. Massa, B. Russell, and M. Aubry. Deep exemplar 2d-3d detection by adapting from real to rendered views. In CVPR, 2016.
    連結:
  10. [14] B. T. Phong. Illumination for computer generated pictures. Communications of the ACM, 1975.
    連結:
  11. [15] C.R.Qi,H.Su,M.Niessner,A.Dai,M.Yan,andL.J.Guibas.Volumetricandmulti- view cnns for object classification on 3d data. arXiv preprint arXiv:1604.03265, 2016.
    連結:
  12. [16] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa- thy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, 2015.
    連結:
  13. [18] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
    連結:
  14. [19] B. Shi, S. Bai, Z. Zhou, and X. Bai. Deeppano: Deep panoramic representation for 3d shape recognition. IEEE Signal Processing Letters, 2015.
    連結:
  15. [20] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In ICCV, 2015.
    連結:
  16. [21] F. P. Tasse and N. Dodgson. Shape2vec: semantic-based descriptors for 3d shapes, sketches and images. TOG, 2016.
    連結:
  17. [22] F. Wang, L. Kang, and Y. Li. Sketch-based 3d shape retrieval using convolutional neural networks. In CVPR, 2015.
    連結:
  18. [23] J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y. Wu. Learning fine-grained image similarity with deep ranking. In CVPR, 2014.
    連結:
  19. [24] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, 2015.
    連結:
  20. [25] Y. Xiang, R. Mottaghi, and S. Savarese. Beyond pascal: A benchmark for 3d object detection in the wild. In WACV, 2014.
    連結:
  21. [2] B. Amos, B. Ludwiczuk, and M. Satyanarayanan. Openface: A general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16- 118, CMU School of Computer Science, 2016.
  22. [5] M. A. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2015.
  23. [7] E. Hoffer and N. Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, 2015.
  24. [10] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck, M. Aono, M. Burtscher, H. Fu, T. Furuya, H. Johan, et al. Shrec’14 track: extended large scale sketch-based 3d shape retrieval. In Eurographics workshop on 3D object retrieval, volume 2014, 2014.
  25. [17] A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the non- linear dynamics of learning in deep linear neural networks. In arXiv preprint arXiv:1312.6120, 2013.
  26. [26] Q. Yu, F. Liu, Y.-Z. Song, T. Xiang, T. M. Hospedales, and C. C. Loy. Sketch me that shoe. In CVPR, 2016.