题名

以中文文本分析為主的線上社交訊息作者辨識

并列篇名

Toward to a stylometric analysis model for the authorship verification of online social message

作者

柯冠廷(Guan-Ting Ke);葉國暉(Kuo-Hui Yeh);駱立軒(Li-Hsuan Lo)

关键词

身份鑑別 ; 社群網路 ; 語意模型 ; 支援向量機 ; 多層感知器 ; Authentication ; Social Media ; Semantic Analysis Model ; Support Vector Machine ; Multilayer Perceptron

期刊名称

資訊安全通訊

卷期/出版年月

24卷4期(2018 / 10 / 01)

页次

15 - 30

内容语文

繁體中文

中文摘要

本研究主要探討基於社交聊天訊息文本之身份鑑別,近年來,線上社交詐騙行為頻傳,大多情況為利用社交工程手法進行個人帳號之盜用,鑑於此,本研究以此現象為研究標的,希望能建立一套有效率的身分鑑別系統以辨別文本訊息之來源使用者的真實性與合法性。研究方法中將以使用者的社交文本訊息作為使用者鑑別資料來源,並利用語意分析模型(Semantic Analysis Model)、多層感知器(Multilayer Perceptron, MLP)與支援向量機(Support Vector Machine, SVM)做為主要的資料分析演算法,進行使用者鑑別符元的產生與鑑別準確率檢測。研究成果顯示,在語意模型分析實驗中,有65%的檢測案例之相似度皆低於70%,而多層感知器分析與支援向量機分析則分別可達到80%與88%的鑑別準確率。

英文摘要

Recently, cases of scamming on social media keep pouring in. Most cases are related to hacked social media accounts, which belong to those who suffered from identity stealing by social engineering. In this research, we focus on how users' instant messages can be exploited to defeat identity thieves. We proposed an authentication system based on stylometry of users' instant messages, which is able to tell whether the current user of the account having both of its representation and perpetuity. We collect users' instant message as the raw data for training process, create the classifiers through Latent Semantic Analysis (LSA), Multilayer Perceptron (MLP) and Support Vector Machine (SVM). The research result pointed out that, with only LSA model equipped, 65% of test cases reach lower than 70% of similarity, while utilizing MLP and SVM can reach 80% and 88% of accuracy, respectively.

主题分类 基礎與應用科學 > 資訊科學
参考文献
  1. 盜用LINE帳號誆稱借錢中老年族群最易被騙http://www.chinatimes.com/realtimenews/20170415004271-260402 (accessed on 15th Apr. 2018)
  2. 八成以上台灣人愛用 Facebook、Line 坐穩社群網站龍頭 1 人平均擁 4 個社群帳號年輕人更愛 YouTube 和 IG https://www.iii.org.tw/Press/NewsDtl.aspx?nsp_sqno=1934&fm_sqno=14 (accessed on 15th Apr. 2018)
  3. 台灣活躍用戶破 1800 萬人,Facebook 鎖定電商發力 https://www.bnext.com.tw/article/40252/BN-2016-07-19-174028-223 (accessed on 15th Apr. 2018)
  4. 社群新寵兒:即時通訊軟體全球使用率上升 12%,更多網路使用者選擇非開放的社群平台 http://www.cna.com.tw/postwrite/Detail/179665.aspx#.WkDY9t-WZhF (access ed on 15th Apr. 2018)
  5. Abbasi, A.,Chen, H. C.(2008).Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace.Journal of ACM Transactions on Information Systems,26(7)
  6. Albadarneh, J.,Talafha, B.,Al-Ayyoub, M.,Zaqaibeh, B.,Al-Smadi, M.,Jararweh, Y.,Benkhelifa, E.(2015).Using Big Data Analytics For Authorship Authentication of Arabic Tweets.2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC),Limassol, Cyprus:
  7. Botelho, J.,Antunes, C.(2011).Combining Social Network Analysis with Semi-supervised Clustering: a case study on fraud detection.Mining Data Semantics (MDS'2011) in conjuction with SIGKDD
  8. Boyd, D. M.,Ellison, N. B.(2007).Social Network Sites: Definition, History, and Scholarship.Journal of Computer-Mediated Communication,13(1)
  9. Brocardo, M. L.,Traore, I.,Woungang, I.(2015).Authorship verification of e-mail and tweet messages applied for continuous authentication.ACM Journal of Computer and System Sciences,81,1429-1440.
  10. V. Cosenza, VINCOS BLOG, http://vincos.it/world-map-of-social-networks/2017 (accessed on 15th April 2018)
  11. Deerwester, S.,Dumais, S. T.,Furnas, G. W.,Landauer, T. K.,Harshman, R.(1990).Indexing by latent semantic analysis.Journal of the American Society for Information Science,41(6),391-407.
  12. Evangelopoulos, N. E.(2013).Latent semantic analysis.Journal of the Wiley Interdisciplinary Reviews: Cognitive Science,4(6),683-692.
  13. Golub, G. H.,Reinsch, C.(1970).Singular value decomposition and least squares solutions.Journal of the Numerische Mathematik,14(5),403-420.
  14. Gonçalves, P.,Araújo, M.,Benevenuto, F.,Cha, M.(2013).Comparing and Combining Sentiment Analysis Methods.Proceedings of the first ACM conference on Online social networks
  15. Klein, R.,Kyrilov, A.,Tokman, M.(2011).Automated assessment of short free-text responses in computer science using latent semantic analysis.Proceedings of the Sixteenth Annual Joint Conference on Innovation and Technology in Computer Science Education (ITiCES 2011),Darmstadt, Germany, June:
  16. Kuo, F. F.,Shan, M. K.,Lee, S. Y.(2013).Background music recommendation for video based on multimodal latent semantic analysis.2013 IEEE International Conference on Multimedia and Expo (ICME)
  17. Landauer, T. K.,McNamara, D. S.,Dennis, S.,Kintsch, W.(2013).Handbook of Latent Semantic Analysis.London, UK:Psychology Press.
  18. Manning, C. D.,Raghavan, P.,Schütze, H.(2008).Introduction to information retrieval.Cambridge:Cambridge University Press.
  19. Mantyjarvi, J.,Himberg, J.,Seppanen, T.(2001).Recognizing human motion with multiple acceleration sensors.2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236)
  20. Nardi, B. A.,Whittaker, S.,Bradner, E.(2000).Interaction and Outeraction: Instant Messaging in Action.CSCW '00 Proceedings of the 2000 ACM conference on Computer supported cooperative work
  21. Ozsoy, M. G.,Alpaslan, F. N.,Cicekli, I.(2011).Text summarization using latent semantic analysis.Journal of Information Science,37(4),405-417.
  22. Rosenblatt, F.(1958).The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.Psychological Review,386-408.
  23. Samuel, A. L.(1959).Some studies in machine learning using the game of checkers.IBM Journal of Research and Development,3,210-219.
  24. Vapnik, V.,Cortes, C.(1995).Support-Vector Networks.Journal of Machine Learning,20,273-297.
  25. Wu, S. H.,Chou, M. J.,Tseng, C. H.(2017).Detecting In Situ Identity Fraud on Social Network Services: A Case Study With Facebook.IEEE Systems Journal,11,2432-2443.
  26. Zheng, R.,Li, J.,Chen, H.,Huang, Z.(2006).A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques.Journal of the American Society for Information Science and Technology,57(3),378-393.
  27. 資策會 FIND/經濟部技術處「資策會 FIND(2016)/ 服務系統體系驅動新興事業研發計畫(2/4)」,https://www.iii.org.tw/Press/NewsDtl.aspx?fm_sqno=14&nsp_sqno=1952 (accessed on 15th Apr. 2018)
  28. 維基百科,https://zh.wikipedia.org/wiki/助詞 (accessed on 15th Apr. 2018)
  29. 維基百科,https://zh.wikipedia.org/wiki/身份验证 (accessed on 15th Apr. 2018)