题名

A Rule Set to Select Representative Nouns from a Noun Synonym Set for a Japanese Fishing Website

DOI

10.7903/ijecs.1125

作者

Kenji Kawabata;Kunihiko Kaneko

关键词

Noun Synonym ; Japanese Syntax Analysis ; Keyword Dictionary

期刊名称

International Journal of Electronic Commerce Studies

卷期/出版年月

4卷2期(2013 / 12 / 01)

页次

323 - 336

内容语文

英文

英文摘要

Japanese documents have noun synonyms. These use kanji notation, hiragana notation, and katakana notation for words. Sometimes words have alternate kanji expressions: alternate names for an object, different suffixes for kanji, etc. This is why noun synonym sets are formed for Japanese nouns. Thesauruses and dictionaries can be used to select a representative expression from a noun synonym set. However, these references do not consider the type of document. Representative nouns are often different depending on the type of articles. For example, in articles in newspapers, kanji is preferred. In contrast, in articles in encyclopedias, katakana is preferred. The problem is to form a rule set to select a representative noun from a noun synonym set, and the rule set must consider the type of document. We propose a rule set arranged for the WEB Fish Encyclopedia (in Japanese, Sakanazukan). We introduce a keyword category in the rule set to increase the correctness of the selected representative noun. As a result, most of the representative expressions were selected appropriately from noun synonyms. We expressed these noun synonyms as feature vectors. By using three numerical values and four Boolean values, all noun synonyms were expressed.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
社會科學 > 財金及會計學
社會科學 > 管理學
参考文献
  1. Fishing-Forum, The WEB fish encyclopedia (in Japanese, Sakanazukan). Retrieved on January 31, 2012, from http://www.fishing-forum.org/zukan/
  2. Halpern, J.(2002).Lexicon-based Orthographic Disambiguation in CJK Intelligent Information Retrieval.Proceedings of the 3rd workshop on Asian language resources and international standardization,Stroudsburg:
  3. Huang, Z.,Eidelman, V.,Harper, M.(2009).Improving a simple bigram hmm part-of-speech tagger by latent annotation and self-Training.Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics,Stroudsburg:
  4. Kawahara, D.,Kurohashi, S.(2006).Case frame compilation from the web using high-performance computing.Proceedings of the 5th International conference on Language Resource and Evaluation,Italy:
  5. Lafferty, J.,McCallum, A.,Pereira, F.(2001).Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.Proceeding of the 18th International Conference on Machine Learning,San Francisco:
  6. Ravi, S.,Knight, K.(2009).Minimized models for unsupervised part-of-speech tagging.Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language,Singapore:
  7. Shinmura, I.(1991)。Kojien fourth edition (Japanese Dictionary)。Tokyo:Iwanami。
  8. Wikimedia Foundation, Inc. Wikipedia. Retrieved on January 31, 2012, from http://ja.wikipedia.org/