英文摘要
|
In this study we are trying to do the Chinese semantic analysis by using the data from the social network media based on the genetic algorithm method. It is more difficult for Chinese semantic analysis than English, since the different is in the grammar structure. In English, a single word just could describe the situation and the Chinese might combine several words to say the same thing. Generally, it could be easy and precisely in word segmentation by a dictionary with lots of glossary. It would be too expensive to develop a massive dictionary for the people. The cost of constructing a massive dictionary is unthinkable, which is very unfavorable for personal analysis. Therefore, this study developed system to create dictionary database automaticity by using the genetic algorithm method for the Chinese word segmentation. This study collects the data for more than 400 posts from the extracts the articles by board of Tai-traveling in the social network media PTT. Meanwhile, we also use the TF-IDF (Term Frequency-Inverse Document Frequency) method to get the key words of each post. The result shows a high precision in Chinese word segmentation and gets a traveling rank for the local area, Taichung, in Taiwan.
|
参考文献
|
-
林千翔,張嘉惠,陳貞伶(2010)。結合長詞優先與序列標記之中文斷詞研究。運算語言學國際期刊&中文語言處理
連結:
-
陳稼興,謝佳倫,許芳誠(2000)。以遺傳演算法為基礎的中文斷詞研究。資訊管理研究期刊
連結:
-
批踢踢實業坊 BBS
-
Chen, K. L.,Liu, S. H.(1992).Word Identification for Mandarin Chinese Sentences.Proceeding of COLING-92, 14th Int. Conf. On Computational Linguistics
-
Deerwester, S.,Dumais, S. T.,Furnas, G. W.,Landauer, T. K.,Harshman, R.(1990).Indexing by latent semantic analysis.Journal of the American Society for Information Sciences,41(6),391-407.
-
Hofmann, T.(1999).Probabilistic latent semantic indexing.Proceedings of the Twenty-second Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1999),Berkeley, CA, USA:
-
Liang N. Y.(1990).The Knowledge of Chinese Word Segmentation.Journal of Chinese Information Processing,4,42-49.
-
Nic, J. Y.,Briscobois, M.(1996).On Chinese Text Retrieval.Proceeding of International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR)
-
Salton, G.,McGill, M. J.(1983).Introduction to Modern Information Retrieval.New York:McGraw-Hill Co..
-
Yang, Shaosong,Xu, Guoyan,Wang, Zhijian,Zhou, Fachao(2015).The Parallel Improved Apriori Algorithm Research Based on Spark.Ninth International Conference on Frontier of Computer Science and Technology
-
方心伶(2008)。新竹市,國立清華大學統計研究所。
-
王彥叡(2014)。新北市,國立台北大學資訊管理研究所。
-
沈育信(2015)。新北市,淡江大學資訊管理學系碩士在職專班。
-
許菱祥(2006).中文文法.台北:大中國圖書公司.
-
陳永德(1997)。台北市,國立台灣大學心理研究所。
-
陳克健,陳正佳,林隆基(1986)。中央研究院資訊所技術報告中央研究院資訊所技術報告,中央研究院資訊。
-
陳鍾誠,許聞廉(1998)。結合統計與規則的多層次中文斷詞系統。第十一屆計算語言學研討會論文集
|