华艺学术文献数据库

题名	視訊辨識技術應用於智慧型監控系統之研究
并列篇名	Vision Sensing Techniques for Intelligent Surveillance System
DOI	10.6342/NTU201603459
作者	陳宣輯
关键词	電腦視覺；智慧監控；物件偵測；行人比對；人臉對位；人臉辨識；攝影機異常偵測；遺留物偵測；智慧型監控介面； Computer Vision ； Intelligent Surveillance System ； Object Detection ； Person Re-Identification ； Face Alignment ； Face Recognition ； Camera Tampering Detection ； Abandoned Luggage Detection ； Intelligent Visualization
期刊名称	國立臺灣大學資訊工程學系學位論文
卷期/出版年月	2016年
学位类别	博士
导师	洪一平
内容语文	英文
中文摘要	隨著智慧型監控系統的發展，影像分析與辨識技術已經成為智慧型監控系統內最重要的核心技術。本研究以建構全方位的智慧型監控系統為目標，提出多項前端影像辨識技術，包含：監控攝影機之干擾偵測、監控攝影機之行人與物件偵測、行人臉部定位技術、行人遺留物偵測、行人比對技術與智慧型監控人機介面。智慧型影像監控系統利用攝影機為主要訊號輸入，透過電腦視覺影像辨識技術達到自動監控的目標。因此如何保護攝影機為首要之任務，我們提出即時攝影機干擾偵測技術，此演算法能透過攝影機輸入影像，判定攝影機是否遭受人為蓄意遮蔽、轉向、失焦、斷線等破壞，此方法偵測影像中的關鍵點，並偵測其變化，達成低計算成本之優點並在多個實際測試影片中獲得相當穩定結果：低誤判率與高準確度。確保攝影機安全後，我們利用固定式攝影機場景的特性，也就是行人在攝影機不同位置擁有的一致性特徵，透過自動取樣此攝影機下的行人與物件，自動學習其顯著特徵並訓練出多個特定區域行人精煉偵測器，每個偵測器負責攝影機下之局部範圍。與目前僅使用單一偵測器方法實驗比較後發現，本方法能大幅提升的物件偵測之準確度。此外，我們也提出行人比對技術，當給予一個嫌疑者於攝影機中的照片後，此技術整合局部與整體外觀特徵，達成高準確度之行人比對技術，能在攝影機網路所有的行人資料中快速找到相符者。除了行人外觀為一項重要特徵外，臉部資訊也是不可缺少的影像線索，我能提出臉部特徵對位偵測技術，利用具深度資訓的人臉訓練影像，離線建立3D人臉模型，並於偵測時套用於於二維影像上。與現今僅使用平面二維資訊之方法比較後發現，多了三維模型的資訊能使對位結果更為準確；此外，由於我們有3D人臉模型，因此在臉部對位後，我們能直接獲的臉部旋轉資訊，提供智慧型監控系統更多有關行人的資訊，例如：在智慧型人流分析系統中，我們可利用臉部角度與行經路徑估測行人所關注的區域與商品。除了行人偵測與辨識外，我們也試著進行攝影機下的人為分析，我們以行人遺留物偵測為範例，在影像上的每個像素建立前景/背景狀態有限狀態機，分析該像素的狀態轉換與變化過程，決定是否在畫面中出現靜止不動的前景物。為了完整分析遺留物的事件，我們追朔過去一段時間內的移動物體軌跡，分析並驗證物主是否確實遠離了遺留物，以減少誤報情形，此方法在兩個公開測試資料庫(PETS2006、AVSS2007、NTU) 的偵測數據上均勝過相關研究。最後基於以上核心技術，我們再提出兩項先進的人機顯示方法，方便監控者快速了解、觀看並搜索多攝影機網路內所有行人與事件。
英文摘要	With the development of intelligent surveillance systems, video analysis, and recognition technology have become the most important core techniques in this field. In order to construct a surveillance system with higher intelligence, this research proposes a number of advanced video recognition technologies, including the camera interference/tampering detection, pedestrian detection, abandoned luggage detection, pedestrian re-identification and intelligent interface for visualization. Video surveillance uses cameras as the primary input sensor to achieve automatic monitoring. Therefore, how to protect the camera has become the top priority. We propose real-time camera sabotage/tampering detection technology which quickly detects whether or not cameras are hindered by deliberate shelter, disorientation, out of focus, disconnection and other damage via the video analysis. We initially locate the key points whose appearances are relatively stable. Monitoring the changes of these key points and scene structure can detect the tampering events precisely and efficiently. Our method requires lower computational cost and obtains higher stability and accuracy rate in comparison to the existing methods. After protecting cameras, we propose a scene-specific pedestrian detection and object classification. Our approach is location-based, which cab discover scene-dependent discriminative features to identifying foreground objects of different categories (e.g., pedestrians, bicycles, and vehicles). We incorporate a similarity grouping procedure capable of gathering more consistent training examples from a considerably larger neighbor area and train the　specific pedestrian detectors for each grouped local area. Our approach gets significant improvement in detection and classification comparing the traditional generic object detector and classifier. Also, we propose an ensemble of invariant features (EIF), which can properly handle the color variations and human poses/viewpoints for matching pedestrian images observed in different cameras. Our proposed method belongs the direct method, which requires no domain learning. The novel features combined both the holistic and region-based features. The holistic features are extracted by using a publicly available pre-trained deep convolutional neural network (DCNN) used in generic object classification. In contrast, the region-based features are extracted based on our proposed two-way Gaussian Mixture Model fitting (2WGMMF), which overcomes the self-occlusion and poses variations. In addition to the appearance feature, the face information is undoubtedly the indispensable vital in video surveillance. We propose a 3D face alignment algorithm in the 2D image based on Active Shape Model. We off-line train a 3D shape model with different view-based local texture models from a 3D database, and then on-line fit a face in a 2D image by these models. This method mainly leverages additional depth information on the traditional 2D image alignment problem and gets a promising improvement compared to the existing model-based and regression-based approaches. Since the human poses, and their gaze directions are especially valuable information to the surveillance system, the head poses can be directly estimated by the alignment result of the proposed 3D model subsequently. Based on the robust pedestrian detection and re-identification algorithm, we also focus the problem of event detection in surveillance cameras. We take the abandoned luggage detection as an example since it is one of the most critical and challenge problems in video surveillance. We propose the complementary background model which combines short- and long-term background models to classify each pixel as 2-bit code where each bit represents a foreground or background. Subsequently, we introduce a finite-state machine framework to identify static foreground regions based on the temporal transition of code patterns and to determine whether the selected area contain abandoned objects by analyzing the back-traced trajectories of luggage owners. The experimental results obtained based on video images from 2006 Performance Evaluation of Tracking and Surveillance (PETS2006), 2007 Advanced Video, Signal-based Surveillance (AVSS2007) databases and NTU data set collected by ourselves. We show that the proposed approach is useful for detecting abandoned luggage and that it outperforms previous methods. Finally, based on the above core technologies, we also propose two advanced visualization interface, which facilitates people to observe quickly and search incidents of pedestrians within a camera network.
主题分类	基礎與應用科學 > 資訊科學電機資訊學院 > 資訊工程學系
参考文献	[2] Yu Su Bingpeng Ma and Frederic Jurie. Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image and Vision Computing, 32(6):379–390, 2014. 連結： [5] Douglas Gray and Hai Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proc. of the European Conference on Computer Vision (ECCV), pages 262–275, Marseille, France, 2008. 連結： [6] Davide Baltieri, Roberto Vezzani, and Rita Cucchiara. Mapping appearance descriptors on 3d body models for people re-identification. International Journal of Computer Vision (IJCV), 111(3):345–364, 2015. 連結： [7] Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deepreid: Deep filter pairing neural network for person re-identification. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 152–159, Columbus, OH, 2014. 連結： [9] Fatih Porikli, Yuri Ivanov, and Tetsuji Haga. Robust abandoned object detection using dual foregrounds. EURASIP Journal on Advances in Signal Processing, 2008:30, 2008. 連結： [11] Markus Enzweiler and Darieu M Gavrila. Monocular pedestrian detection: Survey and experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 31(12):2179–2195, 2009. 連結： [12] Apurva Bedagkar-Gala and Shishir K Shah. A survey of approaches and trends in person re-identification. Image and Vision Computing, 32(4):270–286, 2014. 連結： [15] Nigel JB McFarlane and C Paddy Schofield. Segmentation and tracking of piglets in images. Machine Vision and Applications, 8(3):187–193, 1995. 連結： [16] Chikahito Nakajima, Massimiliano Pontil, Bernd Heisele, and Tomaso Poggio. Full-body person recognition system. Pattern Recognition, 36(9): 1997–2006, 2003. 連結： [19] Zhaoxiang Zhang, Kaiqi Huang, Yunhong Wang, and Min Li. View independent object classification by exploring scene consistency information for traffic scene surveillance. Neurocomputing, 99:250–260, 2013. 連結： [21] Sabine Sternig, Peter M Roth, and Horst Bischof. On-line inverse multiple instance boosting for classifier grids. Pattern Recognition Letters, 33(7):890–897, 2012. 連結： [24] Martin Hirzer, Csaba Beleznai, Peter M Roth, and Horst Bischof. Person re-identification by descriptive and discriminative classification. In Image Analysis, pages 91–102. Springer, 2011. 連結： [25] Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. Person re- identification by video ranking. In Proc. of the European Conference on Computer Vision (ECCV), pages 688–703, Zurich, Switzerland, 2014. 連結： [31] Alexis Mignon and Frédéric Jurie. Pcca: A new approach for distance learning from sparse pairwise constraints. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2666–2672, Providence, RI, 2012. 連結： [33] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and application. Computer Vision and Image Understanding (CVIU), 61(1):38–59, 1995. 連結： [34] Timothy F Cootes, Gareth J Edwards, and Christopher J Taylor. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 23(6):681–685, 2001. 連結： [35] David Cristinacce and Tim Cootes. Automatic feature localisation with constrained local models. Pattern Recognition, 41(10):3054–3067, 2008. 連結： [36] Jason M Saragih, Simon Lucey, and Jeffrey F Cohn. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision (IJCV), 91(2):200–215, 2011. 連結： [37] Yan Tong, Yang Wang, Zhiwei Zhu, and Qiang Ji. Robust facial feature tracking under varying face pose and facial expression. Pattern Recognition, 40(11):3195– 3208, 2007. 連結： [39] Xiangxin Zhu and Deva Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879–2886, Providence, RI, 2012. 連結： [43] Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. Face alignment by explicit shape regression. International Journal of Computer Vision (IJCV), 107(2):177– 190, 2014. 連結： [47] Iain Matthews and Simon Baker. Active appearance models revisited. International Journal of Computer Vision (IJCV), 60(2):135–164, 2004. 連結： [49] Dianle Zhou, Dijana Petrovska-Delacrétaz, and Bernadette Dorizzi. 3d active shape model for automatic facial landmark location trained with automatically generated landmark points. In IEEE International Conference on Pattern Recognition (ICPR), pages 3801–3805, Istanbul, Turkey, 2010. 連結： [50] David G Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2):91–110, 2004. 連結： [52] Jesús Martínez-del Rincón, J Elías Herrero-Jaraba, J Raúl Gómez, and Carlos Orrite-Urunuela. Automatic left luggage detection and tracking using multi-camera ukf. In Proc. of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), pages 59–66, New York City, NY, 2006. 連結： [54] YingLi Tian, Rogerio Schmidt Feris, Haowei Liu, Arun Hampapur, and Ming-Ting Sun. Robust detection of abandoned and removed objects in complex surveillance videos. IEEE TSMC Part C, 41(5):565–576, 2011. 連結： [55] Quanfu Fan and Sharath Pankanti. Modeling of temporarily static objects for robust abandoned object detection in urban surveillance. In IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pages 36–41, Klagenfurt, Austria, 2011. 連結： [57] Huei-Hung Liao, Jing-Ying Chang, and Liang-Gee Chen. A localized approach to abandoned luggage detection with foreground-mask sampling. In IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pages 132–139, Santa Fe, New Mexico, 2008. 連結： [59] Fengjun Lv, Xuefeng Song, Bo Wu, Vivek Kumar Singh, and Ramakant Nevatia. Left-luggage detection using bayesian inference. In Proc. of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), pages 83–90, New York City, NY, 2006. 連結： [61] Edouard Auvinet, Etienne Grossmann, Caroline Rougier, Mohamed Dahmane, and Jean Meunier. Left-luggage detection using homographies and simple heuristics. In IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), pages 51–58, New York City, NY, 2006. 連結： [64] Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 951–958, Miami, FL, 2009. 連結： [65] Liyuan Li, Weimin Huang, IY-H Gu, and Qi Tian. Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing, 13(11):1459–1472, 2004. 連結： [68] Kyungnam Kim, Thanarat H Chalidabhongse, David Harwood, and Larry Davis. Real-time foreground–background segmentation using codebook model. Real-Time Imaging, 11(3):172–185, 2005. 連結： [69] Zoran Zivkovic. Improved adaptive gaussian mixture model for background subtraction. In IAPR International Conference on Pattern Recognition (ICPR), pages 28–31, Cambridge, UK, 2004. 連結： [70] Yu-Ting Chen, Chu-Song Chen, Chun-Rong Huang, and Yi-Ping Hung. Efficient hierarchical method for background subtraction. Pattern Recognition, 40(10): 2706–2715, 2007. 連結： [73] David W Scott. Multivariate density estimation: theory, practice, and visualization, volume 383. John Wiley & Sons, 2009. 連結： [74] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 23(11):1222–1239, 2001. 連結： [77] Shivani Agarwal, Aatif Awan, and Dan Roth. Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 26(11):1475–1490, 2004. 連結： [78] Shuicheng Yan Shaogang Gong, Marco Cristani and Chen Change Loy. Person Re-Identification, volume 1. Springer, 2014. 連結： [80] Davide Baltieri, Roberto Vezzani, and Rita Cucchiara. 3dpes: 3d people dataset for surveillance and forensics. In Proc. of the Joint ACM Workshop on Human Gesture and Behavior Understanding, pages 59–64, Scottsdale, AZ, 2011. 連結： [81] Tarak Gandhi and Mohan Manubhai Trivedi. Person tracking and reidentification: Introducing panoramic appearance map (pam) for feature representation. Machine Vision and Applications, 18(3-4):207–220, 2007. 連結： [82] Omar Hamdoun, Fabien Moutarde, Bogdan Stanciulescu, and Bruno Steux. Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences. In ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pages 1–6, Stanford University, CA, 2008. 連結： [83] Weiming Hu, Min Hu, Xue Zhou, Tieniu Tan, Jianguang Lou, and Steve Maybank. Principal axis-based correspondence between multiple cameras for people tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 28(4): 663–671, 2006. 連結： [84] Douglas Gray, Shane Brennan, and Hai Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), volume 3, Rio de Janeiro, Brazil, 2007. 連結： [85] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 580–587, 2014. 連結： [88] Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1717–1724, Columbus, OH, 2014. 連結： [93] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014. 連結： [95] Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation by joint identification-verification. In Proc. of the Advances in Neural Information Processing Systems (NIPS), pages 1988–1996, Montreal, Canda, 2014. 連結： [96] Kideog Jeong and Christopher Jaynes. Object matching in disjoint cameras using a color transfer approach. Machine Vision and Applications, 19(5-6):443–455, 2008. 連結： [97] Solomon Kullback and Richard A Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, 1951. 連結： [98] Geoffrey McLachlan and David Peel. Finite mixture models. John Wiley & Sons, 2004. 連結： [99] Kathryn Roeder and Larry Wasserman. Practical bayesian density estimation using mixtures of normals. Journal of the American Statistical Association, 92(439): 894–902, 1997. 連結： [100] Jonathan G Campbell, Chris Fraley, Fionn Murtagh, and Adrian E Raftery. Linear flaw detection in woven textiles using model-based clustering. Pattern Recognition Letters, 18(14):1539–1548, 1997. 連結： [101] Abhijit Dasgupta and Adrian E Raftery. Detecting features in spatial point pro- cesses with clutter via model-based clustering. Journal of the American Statistical Association, 93(441):294–302, 1998. 連結： [104] Nebojsa Jojic, Alessandro Perina, Matteo Cristani, Vittorio Murino, and Brendan Frey. Stel component analysis: Modeling spatial correlations in image class structure. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2044–2051, Miami, FL, 2009. 連結： [105] Bingpeng Ma, Yu Su, and Frédéric Jurie. Bicov: a novel image representation for person re-identification and face verification. In Proc. of the British Machine Vision Conference (BMVC), pages 1–11, Surrey, UK, 2012. 連結： [106] Roberto Vezzani, Costantino Grana, and Rita Cucchiara. Probabilistic people tracking with appearance models and occlusion classification: The ad-hoc system. Pattern Recognition Letters, 32(6):867–877, April 2011. 連結： [107] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278–2324, 1998. 連結： [113] Chun-Wei Chen and Chieh-Chih Wang. 3d active appearance model for aligning faces in 2d images. In International Conference on Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ, pages 3133–3139, Nice, France, 2008. 連結： [115] K Somani Arun, Thomas S Huang, and Steven D Blostein. Least-squares fitting of two 3-d point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), PAMI-9(5):698–700, 1987. 連結： [118] Zhengyou Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(11):1330–1334, 2000. 連結： [122] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1):35–45, 1960. 連結： [125] Anil Aksay, Alptekin Temizel, et al. Camera tamper detection using wavelet analysis for video surveillance. In IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pages 558–562, London, UK, 2007. 連結： [126] Ali Sağlam and Alptekin Temizel. Real-time adaptive camera tamper detection for video surveillance. In IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pages 2727–2734, Genoa, Italy, 2009. 連結： [127] Nobuyuki Otsu. A threshold selection method from gray-level histograms. Automatica, 11(285-296):23–27, 1975. 連結： [131] Advanced Video and Signal based Surveillance. i-lids bag and vehicle detection challenge. http://www.eecs.qmul.ac.uk/~andrea/avss2007_ d.html, 2007. 連結： [132] Kuan-Wen Chen, Chih-Chuan Lai, Pei-Jyun Lee, Chu-Song Chen, and Yi-Ping Hung. Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras. IEEE Transactions on Multimedia, 13(4):625– 638, 2011. 連結： [134] Harpreet S Sawhney, Aydin Arpa, Rakesh Kumar, Supun Samarasekera, Manoj Aggarwal, Steve Hsu, David Nister, and K Hanna. Video flashlights: real time rendering of multiple videos for immersive model visualization. In Proc. of the ACM International Conference Proceeding Series, volume 28, pages 157–168, 2002. 連結： [135] Sven Fleck, Florian Busch, Peter Biber, and Wolfgang Straber. 3d surveillance a distributed network of smart cameras for real-time tracking and its visualization in 3d. In Workshop of the IEEE Conference on Computer Vision and Pattern Recognition (CVPRW), pages 118–118, 2006. 連結： [136] Yung-Cheng Cheng, Kai-Ying Lin, Yong-Sheng Chen, Jenn-Hwan Tarng, Chii-Yah Yuan, and Chen-Ying Kao. Accurate planar image registration for an integrated video surveillance system. In IEEE Workshop on Computational Intelligence for Visual Intelligence (CIVI), pages 37–43, 2009. 連結： [137] Ulrich Neumann, Suya You, Jinhui Hu, Bolan Jiang, and JongWeon Lee. Augmented virtual environments (ave): Dynamic fusion of imagery and 3d models. In Proc. of the IEEE Virtual Reality, pages 61–67, 2003. 連結： [138] Yi-Yuan Chen, Yung-Huang Huang, Yung-Cheng Cheng, Yong-Sheng Chen, et al.Integration of multiple views for a 3-d indoor surveillance system. INFORMATION-An InternationalInterdisciplinary Journal, 13(6): 2039–2057, 連結： [140] Kuan-Wen Chen, Chih-Wei Lin, Tzu-Hsuan Chiu, Mike Yen-Yang Chen, and Yi-Ping Hung. Multi-resolution design for large-scale and high-resolution monitoring. IEEE Transactions on Multimedia, 13(6):1256–1268, 2011. 連結： [142] Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. 連結： [143] Tianzhu Zhang, Hanqing Lu, and Stan Z Li. Learning semantic scene models by object classification and trajectory clustering. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1940–1947, Miami, FL, 2009. 連結： [1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Proc. of the Advances in Neural Information Processing Systems (NIPS), pages 1097–1105, South Lake Tahoe, NA, 2012. [3] Igor Kviatkovsky, Amit Adam, and Ehud Rivlin. Color invariants for person reidentification. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(7):1622–1634, 2013. [4] Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. Person re-identification by symmetry-driven accumulation of local features. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2360–2367, San Francisco, CA, 2010. [8] Dong Seon Cheng, Marco Cristani, Michele Stoppa, Loris Bazzani, and Vittorio Murino. Custom pictorial structures for re-identification. In Proc. of the British Machine Vision Conference (BMVC), pages 6–16, Dundee, UK, 2011. [10] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 886–893, San Diego, CA, 2005. [13] Piotr Dollár, Serge Belongie, and Pietro Perona. The fastest pedestrian detector in the west. In Proc. of the British Machine Vision Conference (BMVC), volume 2, page 7, Aberystwyth, UK, 2010. [14] Pedro Felzenszwalb, David McAllester, and Deva Ramanan. A discriminatively trained, multiscale, deformable part model. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8, Anchorage, AL, 2008. [17] Meng Wang and Xiaogang Wang. Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3401–3408, Colorado Springs, CO, 2011. [18] Meng Wang, Wei Li, and Xiaogang Wang. Transferring a generic pedestrian detector towards specific scenes. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3274–3281, Providence, RI, 2012. [20] Helmut Grabner, Peter M Roth, and Horst Bischof. Is pedestrian detection really a hard task. In IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), pages 1–8, Rio de Janeiro, Brazil, 2007. [22] Sabine Sternig, Peter M Roth, Helmut Grabner, and Horst Bischof. Robust adaptive classifier grids for object detection from static cameras. In Proc. of the Computer Vision Winter Workshop, volume 2009, Snowbird, UT, 2009. [23] Peter M Roth, Sabine Sternig, Helmut Grabner, and Horst Bischof. Classifier grids for robust adaptive object detection. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2727–2734, Miami, FL, 2009. [26] Bryan Prosser, Wei-Shi Zheng, Shaogang Gong, Tao Xiang, and Q Mary. Person re-identification by support vector ranking. In Proc. of the British Machine Vision Conference (BMVC), volume 2, page 6, Aberystwyth, UK, 2010. [27] Rui Zhao, Wanli Ouyang, and Xiaogang Wang. Person re-identification by salience matching. In Proc.of IEEE International Conference on Computer Vision (ICCV), pages 2528–2535, Sydney, Australia, 2013. [28] Rui Zhao, Wanli Ouyang, and Xiaogang Wang. Unsupervised salience learning for person re-identification. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3586–3593, Portland, OR, 2013. [29] Mert Dikmen, Emre Akbas, Thomas S Huang, and Narendra Ahuja. Pedestrian recognition with a learned metric. In Asian Conference on Computer Vision (ACCV), pages 501–512, Pondicherry, India, 2011. [30] Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. Large scale metric learning from equivalence constraints. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2288–2295, Providence, RI, 2012. [32] Ejaz Ahmed, Michael Jones, and Tim K. Marks. An improved deep learning architecture for person re-identification. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3908–3916, Boston, MA, 2015. [38] Paul Viola and Michael J Jones. Robust real-time face detection. International Journal of Computer Vision (IJCV), 57(2):137–154, 2004. [40] Xuehan Xiong and Fernando De la Torre. Supervised descent method and its applications to face alignment. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 532–539, Portland, OR, 2013. [41] Xavier P Burgos-Artizzu, Pietro Perona, and Piotr Dollár. Robust face landmark estimation under occlusion. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 1513–1520, Sydney, Australia, 2013. [42] Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. Face alignment at 3000 fps via regressing local binary features. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1685–1692, Columbus, OH, 2014. [44] Junjie Yan, Zhen Lei, Dong Yi, and Stan Z Li. Learn to combine multiple hypotheses for accurate face alignment. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 392–396, Sydney, Australia, 2013. [45] Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. In Proc. of the International Society for Optics and Photonics of SPIE, pages 586–606, 1992. [46] Jing Xiao, Simon Baker, Iain Matthews, and Takeo Kanade. Real-time combined 2d+ 3d active appearance models. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 535–542, Washington, D.C., 2004. [48] Christian Vogler, Zhiguo Li, Atul Kanaujia, Siome Goldenstein, and Dimitris Metaxas. The best of both worlds: Combining 3d deformable models with active shape models. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 1–7, Rio de Janeiro, Brazil, 2007. [51] Seth Koterba, Simon Baker, Iain Matthews, Changbo Hu, Jing Xiao, Jeffrey Cohn, and Takeo Kanade. Multi-view aam fitting and camera calibration. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 511–518, Beijing, China, 2005. [53] Ruben Heras Evangelio, Tobias Senst, and Thomas Sikora. Detection of static objects for the task of video surveillance. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 534–540, Kona, HI, 2011. [56] Quanfu Fan, Prasad Gabbur, and Sharath Pankanti. Relative attributes for large-scale abandoned object detection. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 2736–2743, Sydney, Australia, 2013. [58] Jiyan Pan, Quanfu Fan, and Sharath Pankanti. Robust abandoned object detection using region-level analysis. In Proc.of the IEEE International Conference on Image Processing (ICIP), pages 3597–3600, Brussels, Belgium, 2011. [60] Liyuan Li, Ruijiang Luo, Ruihua Ma, Weimin Huang, and Karianto Leman. Evaluation of an ivs system for abandoned object detection on pets 2006 datasets. In Proc. of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), pages 91–98, New York City, NY, 2006. [62] Lisa M Brown. View independent vehicle/person classification. In Proc. of the ACM 2nd international workshop on Video surveillance and sensor networks, pages 114–123, New York, NY, 2004. [63] Ping-Han Lee, Tzu-Hsuan Chiu, Yen-Liang Lin, and Yi-Ping Hung. Real-time pedestrian and vehicle detection in video using 3d cues. In IEEE International Conference on Multimedia and Expo (ICME), pages 614–617, Cancun, 2009. [66] Chris Stauffer and W. Eric L. Grimson. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8):747–757, 2000. [67] Chris Stauffer and W Eric L Grimson. Adaptive background mixture models for real-time tracking. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Fort Collins, CO, 1999. [71] Cheng-Hao Kuo, Chang Huang, and Ram Nevatia. Multi-target tracking by on-line learned discriminative appearance models. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 685–692, San Francisco, CA, 2010. [72] Pedro Martins Jorge Batista Joao F. Henriques, Rui Caseiro. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3):583–596, 2015. [75] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages I–511, Kauai, HI, 2001. [76] Bastian Leibe, Konrad Schindler, and Luc Van Gool. Coupled detection and trajectory estimation for multi-object tracking. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 1–8, Rio de Janeiro, Brazil, 2007. [79] Slawomir Bak, Etienne Corvee, Francois Brémond, and Monique Thonnat. Person re-identification using haar-based and dcd-based signature. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–8, Boston, MA, 2010. [86] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Proc. of the European Conference on Computer Vision (ECCV), pages 818–833, Zurich, Switzerland, 2014. [87] Pierre Sermanet, Koray Kavukcuoglu, Sandhya Chintala, and Yann LeCun. Pedestrian detection with unsupervised multi-stage feature learning. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3626– 3633, Portland, OR, 2013. [89] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. [90] Pierre-André Savalle, Stavros Tsogkas, George Papandreou, and Iasonas Kokkinos. Deformable part models with cnn features. In European Conference on Computer Vision (ECCV), Parts and Attributes Workshop, Zurich, Switzerland, 2014. [91] Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. Hierarchical convolutional features for visual tracking. In Proc.of the IEEE International Conference on Computer Vision (ICCV), pages 3074–3082, Santiago, Chile, 2015. [92] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In Proc. of the British Machine Vision Conference (BMVC), pages 3–15, Nottingham., UK, 2014. [94] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 1735–1742, New York City, NY, 2006. [102] Simone Franzini and Jezekiel Ben-Arie. Speech recognition by indexing and sequencing. In International Conference of Soft Computing and Pattern Recognition (SoCPaR), pages 93–98, Cergy-Pontoise, France, 2010. [103] Kai Ma and Jezekiel Ben-Arie. Vector array based multi-view face detection with compound exemplars. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3186–3193, Providence, RI, 2012. [108] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Proc. of the Advances in neural information processing systems, pages 3320–3328, 2014. [109] Steven C Mitchell, Boudewijn PF Lelieveldt, Rob J van der Geest, Jorrit Schaap, Johan HC Reiber, and Milan Sonka. Segmentation of cardiac mr images: An active appearance model approach. In Proc. of the SPIE, Medical Imaging: Image Processing, pages 224–234, San Diego, CA, 2000. International Society for Optics and Photonics. [110] Dianle Zhou, Dijana Petrovska-Delacrétaz, and Bernadette Dorizzi. Automatic landmark location with a combined active shape model. In IEEE International Conference on Biometrics: Theory, Applications, and Systems, pages 1–7, Arlington, VA, 2009. [111] Volker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. In Proc. of the Conference on Computer Graphics and Interactive Techniques, pages 187–194, Los Angeles, CA, 1999. [112] Lie Gu and Takeo Kanade. 3d alignment of face in a single image. In Proc.of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 1735–1742, New York City, NY, 2006. [114] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In Proc. of the European Conference on Computer Vision (ECCV), pages 404–417, Graz, Austria, 2006. [116] G. Bradski. Opencv open source computer vision. Dr. Dobb’s Journal of Software Tools, 2000. http://opencv.org/. [117] National Laboratory of Pattern Recognition (NLPR). Biometrics ideal test (bit):casia-3d facev1. http://biometrics.idealtest.org/, 2010. [119] Xing Chen. Active shape model library (asmlibrary) sdk. http://code. google.com/p/asmlib-opencv/, 2012. [120] Radhika Vathsan. Active appearance models library (aamlibrary) sdk. https://code.google.com/p/aam-opencv/, 2012. [121] Erik Murphy-Chutorian and Mohan Manubhai Trivedi. Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 31(4):607–626, 2009. [123] Evan Ribnick, Stefan Atev, Osama Masoud, Nikolaos Papanikolopoulos, and Richard Voyles. Real-time detection of camera tampering. In IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pages 10-10, Sydney, Australia, 2006. [124] Pedro Gil-Jiménez, R López-Sastre, Philip Siegmann, Javier Acevedo-Rodríguez, and Saturnino Maldonado-Bascón. Automatic control of video surveillance camera sabotage. In Proc. of the International Work-Conference on Nature Inspired Problem-Solving Methods in Knowledge Engineering, pages 222–231. Springer, 2007. [128] Ismail Haritaoglu, David Harwood, and Larry S Davis. W 4: Real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8):809–830, 2000. [129] Yael Pritch, Alex Rav-Acha, and Shmuel Peleg. Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 30(11):1971–1984, 2008. [130] IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS). Pets 2006 benchmark data. http://www.cvg.reading.ac. uk/PETS2006/data.html, 2006. [133] Peter M Roth, Volker Settgast, Peter Widhalm, Marcel Lancelle, Josef Birchbauer, Norbert Brändle, Sven Havemann, and Horst Bischof. Next-generation 3d visualization for visual surveillance. In IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), pages 343–348, 2011. 2010. [139] Philip DeCamp, George Shaw, Rony Kubat, and Deb Roy. An immersive system for browsing and visualizing surveillance video. In Proc. of the ACM international conference on Multimedia, pages 371–380. ACM, 2010. [141] David G Lowe. Object recognition from local scale-invariant features. In The Proc. of the IEEE International Conference on Computer Vision, volume 2, pages 1150–1157, 1999.