In this study we are trying to do the Chinese semantic analysis by using the data from the social network media based on the genetic algorithm method. It is more difficult for Chinese semantic analysis than English, since the different is in the grammar structure. In English, a single word just could describe the situation and the Chinese might combine several words to say the same thing. Generally, it could be easy and precisely in word segmentation by a dictionary with lots of glossary. It would be too expensive to develop a massive dictionary for the people. The cost of constructing a massive dictionary is unthinkable, which is very unfavorable for personal analysis. Therefore, this study developed system to create dictionary database automaticity by using the genetic algorithm method for the Chinese word segmentation. This study collects the data for more than 400 posts from the extracts the articles by board of Tai-traveling in the social network media PTT. Meanwhile, we also use the TF-IDF (Term Frequency-Inverse Document Frequency) method to get the key words of each post. The result shows a high precision in Chinese word segmentation and gets a traveling rank for the local area, Taichung, in Taiwan.
批踢踢實業坊 BBS
Chen, K. L.,Liu, S. H.(1992).Word Identification for Mandarin Chinese Sentences.Proceeding of COLING-92, 14th Int. Conf. On Computational Linguistics
Deerwester, S.,Dumais, S. T.,Furnas, G. W.,Landauer, T. K.,Harshman, R.(1990).Indexing by latent semantic analysis.Journal of the American Society for Information Sciences,41(6),391-407.
Hofmann, T.(1999).Probabilistic latent semantic indexing.Proceedings of the Twenty-second Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1999),Berkeley, CA, USA:
Liang N. Y.(1990).The Knowledge of Chinese Word Segmentation.Journal of Chinese Information Processing,4,42-49.
Nic, J. Y.,Briscobois, M.(1996).On Chinese Text Retrieval.Proceeding of International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR)
Salton, G.,McGill, M. J.(1983).Introduction to Modern Information Retrieval.New York:McGraw-Hill Co..
Yang, Shaosong,Xu, Guoyan,Wang, Zhijian,Zhou, Fachao(2015).The Parallel Improved Apriori Algorithm Research Based on Spark.Ninth International Conference on Frontier of Computer Science and Technology