题名

利用立體相機之三維互動使用者介面之演算法與硬體架構設計

并列篇名

Algorithm and Architecture Design of 3D Interactive User Interface by Stereo Camera

DOI

10.6342/NTU.2013.00124

作者

柯政遠

关键词

三維使用者介面 ; 距離偵測 ; 使用者介面 ; 3DUI ; distance estimation ; user interface

期刊名称

臺灣大學電子工程學研究所學位論文

卷期/出版年月

2013年

学位类别

碩士

导师

陳良基

内容语文

英文

中文摘要

在今日生活中,數位視頻技術扮演重要腳色。隨著顯示器科技的演進,顯示器能提供給人們越來越好的觀賞品質。立體顯示器比起傳統平面顯示器給使用者提供了更佳的觀賞經驗。立體影像技術在許多應用下豐富了這些應用中的內容,比方說電視廣播、電影、遊戲、攝影、教育…等。在現今立體影像已是如此真實的情況下,人們不會只滿足於觀賞立體影片。使用者會想要和如此逼真的立體虛擬影像有所接觸互動,比方說丟擲、觸摸、推…等。 在這篇論文中,我們提出了利用雙眼相機來進行「虛擬觸碰」互動的概念。目前一般的互動方式為使用者在電視或裝置前面來比出特定手勢或是一些身體姿勢,接著系統判斷出為哪種姿勢後便會將相對應的反應表現出來。此類的研究數量已經相當多,而且我們認為它的功能更像是取代遙控器而已。在現今立體影像已是如此真實的情況下,人們不會只滿足於觀賞立體影片。使用者會想要和如此逼真的立體虛擬影像有所接觸互動,比方說丟擲、觸摸、推…等。我們提出了一個基於雙眼相機的立體互動使用者介面,此介面能偵測使用者距離以及手部距離。當使用者手部距離與立體虛擬物件在空間座標位置到達一致時,此系統則判斷使用者達成了虛擬觸碰的條件,接著辨別使用者的操作來給出相對應虛擬觸碰的反應。立體互動使用者介面分成兩部分來探討:免校正使用者距離估計以及利用信心傳遞法來進行手部三維空間定位。 免校正使用者距離偵測是立體互動使用者介面的第一步。主要的概念就是將使用者視為一個物體,利用雙眼相機拍攝到的左圖及右圖,計算出代表使用者的視差。最後,利用這個視差便能算出使用者距離。 利用信心傳遞法來進行手部三維空間定位是立體互動使用者介面的另一部分。當我們只有使用者距離的資訊時,我們只能做一些相當簡單的互動。由於手是人類與機器最直觀也最有效的互動方式,系統必須取得手部三維空間定位,如此使用者才能進行更複雜或是精確的互動。我們利用深度以及彩色影像的資訊來達到手部三維空間定位以及一些簡單手勢的判別。 我們也提出了一個三階管線硬體架構,實現結果表明了此架構能在操作頻率200Mhz輸入左右影像皆為1080p時達到30fps之即時速度。

英文摘要

Digital video technology has played an important role in our daily life. With the evolution of the display technologies, display systems can provide higher visual quality to enrich human life. Immersive 3D displays provide better visual experience than conventional 2D displays. 3D technology enriches the contents of many applications, such as broadcasting, movie, gaming, photographing, camcorder, education, etc. However, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, such as slapping, sliding, throwing…. In this thesis, we proposed “virtual touch” interaction by using stereo camera. Common interactive way is that user can do some hand gesture or body gesture in front of TV or other devices, and then the system recognizes the gesture and some reaction which is corresponded to this gesture will be appeared. This kind of research is already quite mature, and its function more likes the remote control. Nowadays, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, the so-called "virtual touch" is such as slapping, sliding, throwing…. We proposed a 3D interactive user interface by stereo camera which can detect the user's hand and body's location. When the position of user’s hand and position of virtual object are consistent, then the system considers that the user achieve the “virtual touch”, and then the system will recognize the user’s operation, and therefore give the user a so-called "virtual touch" interaction. The 3D interactive user interface by stereo camera is discussed in two different parts: distance estimation by calibration-free captures and 3D hand localization by using belief propagation. The distance estimation by calibration-free captures is the first step of 3D interactive user interface. The main concept is that treats that user as an object, and from the left capture and right capture from stereo camera, calculates the disparity of the user. Finally, the user’s distance can be estimated by disparity of the user. 3D hand localization by using belief propagation is another part of interactive 3D user interface. When we only have the user’s distance from system, we can just do some simple interaction with system. Because of hand gesture is one of the most intuitive and nature ways for people to communicate with machines, so system have to get the user’s hand 3D localization, and thus the user can do more complex control or interaction with system. We use only depth and color information to get the hand’s 3D localization and do some simple gesture recognition to judge the reaction. We also proposed 3-stage architecture for hardware design, and the implementation result shows that the architecture can achieve real-time interaction of input Fll-HD1080p@30fps stereo images when operating at 200MHz.

主题分类 電機資訊學院 > 電子工程學研究所
工程學 > 電機工程
工程學 > 電機工程
参考文献
  1. [1] C. Fehn "A 3DTV system based on video plus depth information", 37th Asilomar Conf. Signals, Syst. Comp., 2003.
    連結:
  2. [4] Shotton, J., et al. "Real-time human pose recognition in parts from single depth images." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
    連結:
  3. [5] Zimmerman, Thomas G., et al. "A hand gesture interface device." ACM SIGCHI Bulletin. Vol. 18. No. 4. ACM, 1987.
    連結:
  4. [6] Wang, Robert Y., and Jovan Popović. "Real-time hand-tracking with a color glove." ACM Transactions on Graphics (TOG). Vol. 28. No. 3. ACM, 2009.
    連結:
  5. [7] Stenger, Bjorn, et al. "Model-based hand tracking using a hierarchical bayesian filter." Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.9 (2006): 1372-1384.
    連結:
  6. [9] Yoon, Ho-Sub, et al. "Hand gesture recognition using combined features of location, angle and velocity." Pattern Recognition 34.7 (2001): 1491-1501.
    連結:
  7. [10] Bretzner, Lars, Ivan Laptev, and Tony Lindeberg. "Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering." Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on. IEEE, 2002.
    連結:
  8. [11] Huang, Chung-lin, and Sheng-Hung Jeng. "A model-based hand gesture recognition system." Machine vision and applications 12.5 (2001): 243-258.
    連結:
  9. [12] Holte, Michael Boelstoft, Thomas B. Moeslund, and Preben Fihl. "View-invariant gesture recognition using 3D optical flow and harmonic motion context." Computer Vision and Image Understanding 114.12 (2010): 1353-1361.
    連結:
  10. [13] Ren, Zhou, et al. "Robust hand gesture recognition with kinect sensor." Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.
    連結:
  11. [14] Van den Bergh, Michael, and Luc Van Gool. "Combining RGB and ToF cameras for real-time 3D hand gesture interaction." Applications of Computer Vision (WACV), 2011 IEEE Workshop on. IEEE, 2011.
    連結:
  12. [15] Liu, Xia, and Kikuo Fujimura. "Hand gesture recognition using depth data." Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on. IEEE, 2004.
    連結:
  13. [18] Minvielle, P., Doucet, A., Marrs, A., & Maskell, S. (2010). A Bayesian approach to joint tracking and identification of geometric shapes in video sequences. Image and Vision Computing, 28(1), 111-123.
    連結:
  14. [20] Zhang, Z. (1999). Flexible camera calibration by viewing a plane from unknown orientations. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on (Vol. 1, pp. 666-673). Ieee.
    連結:
  15. [25] Cheng-Yuan Ko, and Liang-Gee Chen, “Acquire User’s Distance by Face Detection, in IEEE 17th International Symposium on Consumer Electronics (ISCE), Hsinchu, Taiwan, June 2013.
    連結:
  16. [27] Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International journal of computer vision, 57(2), 137-154.
    連結:
  17. [34] Liang, C. K., Cheng, C. C., Lai, Y. C., Chen, L. G., & Chen, H. H. (2009, June). Hardware-efficient belief propagation. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 80-87). IEEE.
    連結:
  18. [36] Dahan, M. J., Chen, N., Shamir, A., & Cohen-Or, D. (2012). Combining color and depth for enhanced image segmentation and retargeting. The Visual Computer, 28(12), 1181-1193.
    連結:
  19. [42] Shibata, T., Kim, J., Hoffman, D. M., & Banks, M. S. (2011). The zone of comfort: Predicting visual discomfort with stereo displays. Journal of vision, 11(8).
    連結:
  20. [43] Tynan, P. D., & Sekuler, R. (1982). Motion processing in peripheral vision: Reaction time and perceived velocity. Vision Research, 22(1), 61-68.
    連結:
  21. [2] D. Marr, “Vision,” Freeman, San Francisco, 1982.
  22. [3] E. H. Adelson and J. Y. A. Wang., “Single lens stereo with plenoptic camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 99-106, February 1992.
  23. [8] Garg, Pragati, Naveen Aggarwal, and Sanjeev Sofat. "Vision based hand gesture recognition." World Academy of Science, Engineering and Technology 49.1 (2009): 972-977.
  24. [16] Benko, Hrvoje, Ricardo Jota, and Andrew Wilson. "Miragetable: freehand interaction on a projected augmented reality tabletop." Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems. ACM, 2012.
  25. [17] Hilliges, O., Kim, D., Izadi, S., Weiss, M., & Wilson, A. (2012, May). HoloDesk: direct 3d interactions with a situated see-through display. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (pp. 2421-2430). ACM.
  26. [19] http://en.wikipedia.org/wiki/Pinhole_camera_model
  27. [21] Loop, C., & Zhang, Z. (1999). Computing rectifying homographies for stereo vision. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on. (Vol. 1). IEEE.
  28. [22] Strecha, C., & Van Gool, L. (2002). Motion—Stereo Integration for Depth Estimation. In Computer Vision—ECCV 2002 (pp. 170-185). Springer Berlin Heidelberg.
  29. [23] Ko, C. Y., Li, C. T., Wu, C., & Chen, L. G. (2012, June). An Efficient Method for Extracting the Depth Data from the User. In International Conference on 3D systems and Applications (3DSA), Hsinchu, Taiwan.
  30. [24] Piccardi, M. (2004, October). Background subtraction techniques: a review. In Systems, Man and Cybernetics, 2004 IEEE International Conference on (Vol. 4, pp. 3099-3104). IEEE.
  31. [26] http://opencv.org/
  32. [28] Wan, L. C., Sebastian, P., & Voon, Y. V. (2009, April). Stereo vision tracking system. In Future Computer and Communication, 2009. ICFCC 2009. International Conference on (pp. 487-491). IEEE.
  33. [29] Ko, C. Y., Li, C. T., Chung, C. H., & Chen, L. G. (2013, June). High Accuracy User’s Distance Estimation by Low Cost Cameras. Best Paper Award, In International Conference on 3D systems and Applications (3DSA), Osaka, Japan.
  34. [30] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1, pp. I-511). IEEE.
  35. [31] Ko, C. Y., Li, C. T., Chung, C. H., & Chen, L. G. (2013, March). 3D hand localization by low-cost webcams. In IS&T/SPIE Electronic Imaging (pp. 86500W-86500W). International Society for Optics and Photonics.
  36. [32] Sun, J., Zheng, N. N., & Shum, H. Y. (2003). Stereo matching using belief propagation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(7), 787-800.
  37. [33] Koschan, A. (1993, September). Dense stereo correspondence using polychromatic block matching. In Proc. of the 5th Int. Conf. on Computer Analysis of Images and Patterns CAIP (Vol. 93, pp. 538-542).
  38. [35] Li, C. T., Lai, Y. C., Wu, C., Tsai, S. F., & Chen, L. G. (2012, January). 3D image correction by Hilbert Huang decomposition. In Consumer Electronics (ICCE), 2012 IEEE International Conference on (pp. 271-272). IEEE.
  39. [37] http://en.wikipedia.org/wiki/Minoru_3D_Webcam
  40. [38] Zhong, R., Hu, R., Shi, Y., Wang, Z., Han, Z., Liu, L., & Hu, J. (2012). Just noticeable difference for 3d images with depth saliency. In Advances in Multimedia Information Processing–PCM 2012 (pp. 414-423). Springer Berlin Heidelberg.
  41. [39] Didyk, P., Ritschel, T., Eisemann, E., Myszkowski, K., & Seidel, H. P. (2011). A perceptual model for disparity. ACM Transactions on Graphics (TOG), 30(4),96.
  42. [40] http://www.middlebury.edu
  43. [41] http://www.hdhes.com/tv/hdtvviewdistance.aspx