英文摘要
|
Digital video technology has played an important role in our daily life. With the evolution of the display technologies, display systems can provide higher visual quality to enrich human life. Immersive 3D displays provide better visual experience than conventional 2D displays. 3D technology enriches the contents of many applications, such as broadcasting, movie, gaming, photographing, camcorder, education, etc. However, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, such as slapping, sliding, throwing….
In this thesis, we proposed “virtual touch” interaction by using stereo camera. Common interactive way is that user can do some hand gesture or body gesture in front of TV or other devices, and then the system recognizes the gesture and some reaction which is corresponded to this gesture will be appeared. This kind of research is already quite mature, and its function more likes the remote control. Nowadays, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, the so-called "virtual touch" is such as slapping, sliding, throwing…. We proposed a 3D interactive user interface by stereo camera which can detect the user's hand and body's location. When the position of user’s hand and position of virtual object are consistent, then the system considers that the user achieve the “virtual touch”, and then the system will recognize the user’s operation, and therefore give the user a so-called "virtual touch" interaction. The 3D interactive user interface by stereo camera is discussed in two different parts: distance estimation by calibration-free captures and 3D hand localization by using belief propagation.
The distance estimation by calibration-free captures is the first step of 3D interactive user interface. The main concept is that treats that user as an object, and from the left capture and right capture from stereo camera, calculates the disparity of the user. Finally, the user’s distance can be estimated by disparity of the user.
3D hand localization by using belief propagation is another part of interactive 3D user interface. When we only have the user’s distance from system, we can just do some simple interaction with system. Because of hand gesture is one of the most intuitive and nature ways for people to communicate with machines, so system have to get the user’s hand 3D localization, and thus the user can do more complex control or interaction with system. We use only depth and color information to get the hand’s 3D localization and do some simple gesture recognition to judge the reaction.
We also proposed 3-stage architecture for hardware design, and the implementation result shows that the architecture can achieve real-time interaction of input Fll-HD1080p@30fps stereo images when operating at 200MHz.
|
参考文献
|
-
[1] C. Fehn "A 3DTV system based on video plus depth information", 37th Asilomar Conf. Signals, Syst. Comp., 2003.
連結:
-
[4] Shotton, J., et al. "Real-time human pose recognition in parts from single depth images." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
連結:
-
[5] Zimmerman, Thomas G., et al. "A hand gesture interface device." ACM SIGCHI Bulletin. Vol. 18. No. 4. ACM, 1987.
連結:
-
[6] Wang, Robert Y., and Jovan Popović. "Real-time hand-tracking with a color glove." ACM Transactions on Graphics (TOG). Vol. 28. No. 3. ACM, 2009.
連結:
-
[7] Stenger, Bjorn, et al. "Model-based hand tracking using a hierarchical bayesian filter." Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.9 (2006): 1372-1384.
連結:
-
[9] Yoon, Ho-Sub, et al. "Hand gesture recognition using combined features of location, angle and velocity." Pattern Recognition 34.7 (2001): 1491-1501.
連結:
-
[10] Bretzner, Lars, Ivan Laptev, and Tony Lindeberg. "Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering." Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on. IEEE, 2002.
連結:
-
[11] Huang, Chung-lin, and Sheng-Hung Jeng. "A model-based hand gesture recognition system." Machine vision and applications 12.5 (2001): 243-258.
連結:
-
[12] Holte, Michael Boelstoft, Thomas B. Moeslund, and Preben Fihl. "View-invariant gesture recognition using 3D optical flow and harmonic motion context." Computer Vision and Image Understanding 114.12 (2010): 1353-1361.
連結:
-
[13] Ren, Zhou, et al. "Robust hand gesture recognition with kinect sensor." Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.
連結:
-
[14] Van den Bergh, Michael, and Luc Van Gool. "Combining RGB and ToF cameras for real-time 3D hand gesture interaction." Applications of Computer Vision (WACV), 2011 IEEE Workshop on. IEEE, 2011.
連結:
-
[15] Liu, Xia, and Kikuo Fujimura. "Hand gesture recognition using depth data." Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on. IEEE, 2004.
連結:
-
[18] Minvielle, P., Doucet, A., Marrs, A., & Maskell, S. (2010). A Bayesian approach to joint tracking and identification of geometric shapes in video sequences. Image and Vision Computing, 28(1), 111-123.
連結:
-
[20] Zhang, Z. (1999). Flexible camera calibration by viewing a plane from unknown orientations. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on (Vol. 1, pp. 666-673). Ieee.
連結:
-
[25] Cheng-Yuan Ko, and Liang-Gee Chen, “Acquire User’s Distance by Face Detection, in IEEE 17th International Symposium on Consumer Electronics (ISCE), Hsinchu, Taiwan, June 2013.
連結:
-
[27] Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International journal of computer vision, 57(2), 137-154.
連結:
-
[34] Liang, C. K., Cheng, C. C., Lai, Y. C., Chen, L. G., & Chen, H. H. (2009, June). Hardware-efficient belief propagation. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 80-87). IEEE.
連結:
-
[36] Dahan, M. J., Chen, N., Shamir, A., & Cohen-Or, D. (2012). Combining color and depth for enhanced image segmentation and retargeting. The Visual Computer, 28(12), 1181-1193.
連結:
-
[42] Shibata, T., Kim, J., Hoffman, D. M., & Banks, M. S. (2011). The zone of comfort: Predicting visual discomfort with stereo displays. Journal of vision, 11(8).
連結:
-
[43] Tynan, P. D., & Sekuler, R. (1982). Motion processing in peripheral vision: Reaction time and perceived velocity. Vision Research, 22(1), 61-68.
連結:
-
[2] D. Marr, “Vision,” Freeman, San Francisco, 1982.
-
[3] E. H. Adelson and J. Y. A. Wang., “Single lens stereo with plenoptic camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 99-106, February 1992.
-
[8] Garg, Pragati, Naveen Aggarwal, and Sanjeev Sofat. "Vision based hand gesture recognition." World Academy of Science, Engineering and Technology 49.1 (2009): 972-977.
-
[16] Benko, Hrvoje, Ricardo Jota, and Andrew Wilson. "Miragetable: freehand interaction on a projected augmented reality tabletop." Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems. ACM, 2012.
-
[17] Hilliges, O., Kim, D., Izadi, S., Weiss, M., & Wilson, A. (2012, May). HoloDesk: direct 3d interactions with a situated see-through display. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (pp. 2421-2430). ACM.
-
[19] http://en.wikipedia.org/wiki/Pinhole_camera_model
-
[21] Loop, C., & Zhang, Z. (1999). Computing rectifying homographies for stereo vision. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on. (Vol. 1). IEEE.
-
[22] Strecha, C., & Van Gool, L. (2002). Motion—Stereo Integration for Depth Estimation. In Computer Vision—ECCV 2002 (pp. 170-185). Springer Berlin Heidelberg.
-
[23] Ko, C. Y., Li, C. T., Wu, C., & Chen, L. G. (2012, June). An Efficient Method for Extracting the Depth Data from the User. In International Conference on 3D systems and Applications (3DSA), Hsinchu, Taiwan.
-
[24] Piccardi, M. (2004, October). Background subtraction techniques: a review. In Systems, Man and Cybernetics, 2004 IEEE International Conference on (Vol. 4, pp. 3099-3104). IEEE.
-
[26] http://opencv.org/
-
[28] Wan, L. C., Sebastian, P., & Voon, Y. V. (2009, April). Stereo vision tracking system. In Future Computer and Communication, 2009. ICFCC 2009. International Conference on (pp. 487-491). IEEE.
-
[29] Ko, C. Y., Li, C. T., Chung, C. H., & Chen, L. G. (2013, June). High Accuracy User’s Distance Estimation by Low Cost Cameras. Best Paper Award, In International Conference on 3D systems and Applications (3DSA), Osaka, Japan.
-
[30] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1, pp. I-511). IEEE.
-
[31] Ko, C. Y., Li, C. T., Chung, C. H., & Chen, L. G. (2013, March). 3D hand localization by low-cost webcams. In IS&T/SPIE Electronic Imaging (pp. 86500W-86500W). International Society for Optics and Photonics.
-
[32] Sun, J., Zheng, N. N., & Shum, H. Y. (2003). Stereo matching using belief propagation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(7), 787-800.
-
[33] Koschan, A. (1993, September). Dense stereo correspondence using polychromatic block matching. In Proc. of the 5th Int. Conf. on Computer Analysis of Images and Patterns CAIP (Vol. 93, pp. 538-542).
-
[35] Li, C. T., Lai, Y. C., Wu, C., Tsai, S. F., & Chen, L. G. (2012, January). 3D image correction by Hilbert Huang decomposition. In Consumer Electronics (ICCE), 2012 IEEE International Conference on (pp. 271-272). IEEE.
-
[37] http://en.wikipedia.org/wiki/Minoru_3D_Webcam
-
[38] Zhong, R., Hu, R., Shi, Y., Wang, Z., Han, Z., Liu, L., & Hu, J. (2012). Just noticeable difference for 3d images with depth saliency. In Advances in Multimedia Information Processing–PCM 2012 (pp. 414-423). Springer Berlin Heidelberg.
-
[39] Didyk, P., Ritschel, T., Eisemann, E., Myszkowski, K., & Seidel, H. P. (2011). A perceptual model for disparity. ACM Transactions on Graphics (TOG), 30(4),96.
-
[40] http://www.middlebury.edu
-
[41] http://www.hdhes.com/tv/hdtvviewdistance.aspx
|