题名

基於決策樹與二元語言模型的網路用語轉譯系統

并列篇名

An Internet Slang Translator Based on Decision Tree and Bi-gram Language Model

DOI

10.6188/JEB.2015.17(1).02

作者

楊亨利(Heng-Li Yang);黃泓彰(Hung-Chang Huang);林青峰(Qing-Feng Lin)

关键词

網路用語 ; 網路流行語 ; 文字正規化 ; 決策樹 ; 語言模型 ; Internet slang ; Internet buzzwords ; text normalization ; decision tree ; bi-gram language model

期刊名称

電子商務學報

卷期/出版年月

17卷1期(2015 / 03 / 01)

页次

25 - 48

内容语文

繁體中文

中文摘要

網路文章中含有的網路用語或網路流行語,對於以正規中文為對象的文字分析而言是一個阻礙分析的問題;若將網路用語轉譯為正規中文將會有助於取得更多可用的資訊。為了將網路用語轉譯為正規中文,本研究蒐集網路用語的定義與網路文章,將網路用語分類後,運用決策樹和語言模型的轉譯方法,對各類用語作合適的轉譯。轉譯系統能夠偵測並轉譯約81%的網路用語,其轉譯的精確度約為90%;因此,本研究所提出之以決策樹和語言模型為基礎之系統應可適合網路用語的轉譯。

英文摘要

While conducting text mining on Chinese content, Internet slang is a problem which results in a lower accuracy of text segmentation. Translating Internet slang into formal Chinese would help segmentation and, in addition, revealing the implicit information of the slang. In order to translate Internet slang, this study first collected meanings of slangs and web texts. Next, Internet slang was categorized, and translating methods, which are mainly based on decision tree and bi-gram language model, were developed for each category. The translator was then implemented. Eighty-one percentages of the Internet slang in web texts were correctly detected and translated, with a precision at ninety percentages. It is concluded that the proposed methods are quite applicable to Internet slang translation.

主题分类 人文學 > 人文學綜合
基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 社會科學綜合
参考文献
  1. 張慧美(2006)。網路語言之語言風格研究。彰化師大國文學誌,13,331-359。
    連結:
  2. 周鳳五(2006)。火星文的美麗與哀愁。取自2012 年11 月18 日:http://www.taipei.gov.tw/public/MMO/TRAD/950804_home.ppt
  3. Aw, A.,Zhang, M.,Xiao, J.,Su, J.(2006).A phrase-based statistical model for SMS text normalization.Proceedings of the COLING/ACL 2006 Main Conference,Sydney, Australia:
  4. Goutte, C.,Cancedda, N.,Dymetman, M.,Foster, G.(2009).Learning machine translation.Cambridge:The MIT Press.
  5. Khan, O. A.,Karim, A.(2012).A rule-based model for normalization of SMS text.Proceedings of the 2012 IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI),Athens, Greece:
  6. Kouloumpis, E.,Wilson, T., Moore, J.(2011).Twitter sentiment analysis: The good the bad and the OMG!.Proceedings of the 5th International AAAI Conference on Weblogs and Social Media,Barcelona, Spain:
  7. Levenshtein, V.(1966).Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady,10,707-710.
  8. Liu, F.,Weng, F.,Wang, B.,Liu, Y.(2011).Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision.Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics,Portland, Oregon, USA:
  9. Liu, W.,Allison, B.,Guthrie, L.(2008).Professor or screaming beast? Detecting words misuse in Chinese.Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08),Marrakech, Morocco:
  10. Manning, C. D.,Raghavan, P.,Schütze, H.(2008).Introduction to information retrieval.New York:Cambridge University Press.
  11. NieBen, S.,Och, F. J.,Leusch, G.,Ney, H.(2000).An evaluation tool for machine translation: Fast evaluation for MT research.Proceedings of the 2nd Language Resources and Evaluation Conference (LREC),Athens, Greece:
  12. Pennell, D. L., Liu, Y.(2010).Normalization of text messages for text-to-speech.Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),Dallas, Texas, USA:
  13. Pennell, D. L.,Liu, Y.(2011).Toward text message normalization: Modeling abbreviation generation.Proceedings of the 2011 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),Prague, Czech:
  14. Sasu, L.(2011).A probabilistic model for spelling correction.Bulletin of the Transilvania University of Brasov,4(2),141-146.
  15. Schwarm, S.,Ostendorf, M.(2002).Text normalization with varied data sources for conversational speech language modeling.Proceedings of the 2002 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),Orlando, Florida, USA.:
  16. Sproat, R.,Black, A. W.,Chen, S.,Kumar, S.,Ostendorf, M.,Richards, C.(2001).Normalization of non-standard words.Computer Speech and Language,15(3),287-333.
  17. Wu, W.,Zhang, B.,Ostendorf, M.(2010).Automatic generation of personalized annotation tags for Twitter users.Human Language Technologies: the 2010 Annual Conference of the North American Chapter of the ACL,Los Angeles, California, USA:
  18. Yang, S.,Zhao, H.,Wang, X., Lu, B.(2012).Spell checking for Chinese.Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12),Istanbul, Turkey:
  19. 王貞英(2010)。碩士論文(碩士論文)。台灣,新竹,國立清華大學台灣研究教師在職專班語言組。
  20. 張有軍(2009)。口頭語?書面語?—網路語言對語體二分法的挑戰。US-China Foreign Language,7(11),5-8。
被引用次数
  1. 歐仁彬,楊盛琮,黃天受,郝沛毅(2018)。網路直播聊天室情緒探勘-使用模糊支持向量機。資訊管理學報,25(2),185-218。