


Research and Development on Automatic Information Organization and Subject Analysis in Recent Decades




曾元顯(Yuen-Hsien Tseng)


關鍵詞擷取 ; 關聯詞分析 ; 文件歸類 ; 主題分類 ; 資訊檢索 ; Keyword extraction ; Association analysis ; Document clustering ; Topic categorization ; Information retrieval




51卷特刊(2014 / 12 / 01)


3 - 26






Information organization and subject analysis (IOSA) is an important issue in the field of library and information science (LIS). As the fast advance in information technology, more and more digital documents are emerging in a pace such that automated IOSA become inevitable. This article firstly introduces the development of related automatic techniques in recent decades and promotes a tranditional viewpoint based on the workflow of: (1) data collection and aggregation, (2) cataloguing, (3) regulation, (4) archving, and (5) usage, to regulate the whole process when applying automated techniques to any IOSA task. Some application examples are then described to let the readers have a feel of the feasibility of these techniques; specifically the applications of keyword extraction, association analysis, document clustering, and topic categorization are mentioned. We conclude that the related techniques and applications are still developing in a quick pace such that only a few percentages of them can be mentioned. This article is intended to promote the mutual cooperation among the LIS and other fields.

主题分类 人文學 > 圖書資訊學
  1. Chien, L.-F.,Pu, H.-T.(1996).Important issues on Chinese information retrieval.Computational Linguistics and Chinese Language Processing,1(1),205-221.
  2. 蔡孟竹、曾元顯(2003)。中文OCR文件檢索測試集之製作與應用。教育資料與圖書館學,40(3),325-344。
  3. Bai, B.-R.,Chen, C.-L.,Chien, L.-F.,Lee, L.-S.(2002).Intelligent retrieval of dynamic networked information from mobile terminals using spoken natural language queries.IEEE Transactions on Consumer Electronics,44(1),62-72.
  4. Chan, L. M.(2007).Cataloging and classification: An introduction.Lanham, MD:Scarecrow Press.
  5. Chang, C.-H.,Lui, S.-C.(2001).IEPAD: Information extraction based on pattern discovery.Proceedings of the 10th International Conference on World Wide Web,New York, NY:
  6. Chen, H.,Yim, T.,Fye, D.,Schatz, B.(1995).Automatic thesaurus generation for an electronic community system.Journal of the American Society for Information Science and Technology,46(3),175-193.
  7. Chien, L.-F.(1997).PAT-tree-based keyword extraction for Chinese information retrieval.ACM SIGIR Forum,31(SI),50-58.
  8. Chien, L.-F.(1995).Q(Csmart)-A high-performance Chinese document retrieval system.Proceedings of the 1995 International Conference on Computer Processing of Oriental Languages,Bethesda, MD:
  9. Chien, L.-F.(1995).Fast and quasi-natural language search for gigabytes of Chinese texts.Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,New York, NY:
  10. Chowdhury, G. G.(2010).Introduction to modern information retrieval.New York, NY:Neal-Schuman.
  11. Fader, A.,Soderland, S.,Etzioni, O.(2011).Identifying relations for open information extraction.Proceedings of the Conference on Empirical Methods in Natural Language Processing,Stroudsburg, PA:
  12. Ghemawat, S.,Gobioff, H.,Leung, S.-T.(2003).The Google file system.ACM SIGOPS Operating Systems Review,37(5),29-43.
  13. Harman, D.(1992).The DARPA TIPSTER project.SIGIR Forum,26(2),26-28.
  14. Hearst, M. A.(1992).Automatic acquisition of hyponyms from large text corpora.Proceedings of the 14th Conference on Computational linguistics-Volume 2,Stroudsburg, PA:
  15. Hsieh, Y.-M.,Bai, M.-H.,Chang, J. S.,Chen, K.-J.(2012).Improving PCFG Chinese Parsing with Context-Dependent Probability Re-estimation.Proceedings of the Second CIPSSIGHAN Joint Conference on Chinese Language Processing,Tianjin, China:
  16. Lin, W.-C.,Chang, Y.-C.,Chen, H.-H.(2005).From text to image: Generating visual query for image retrieval.Multilingual information access for text, speech and images,Berlin, German:
  17. Ogden, T. H.(1977).Subjects of analysis.New York, NY:Jason Aronson.
  18. Olson, H. A.,Boll, J. J.(2001).Subject analysis in online catalogs.Englewood, CO:Libraries.
  19. Salton, G.(1989).Automatic text processing: The transformation, analysis, and retrieval of information by computer.Reading, MA:Addison-Wesley.
  20. Sanderson, M.,Croft, B.(1999).Deriving concept hierarchies from text.Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,New York, NY:
  21. Sasaki, Y.,Chen, H.-H.,Chen, K.-h.,Lin, C.-J.(2005).Overview of the NTCIR-5 crosslingual question answering task (CLQA1).Proceedings of NTCIR-5 Workshop Meeting,Tokyo, Japan:
  22. Sundheim, B. M.(1991).Overview of the third message understanding evaluation and conference.Proceedings of the 3rd Conference on Message Understanding,Stroudsburg, PA:
  23. Taylor, A. G.,Joudrey, D. N.(2008).The organization of information.Westport, CO:Libraries.
  24. Tseng, Y.-H.(1999).Content-based retrieval for music collections.Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,New York, NY:
  25. Tseng, Y.-H.(2001).Automatic cataloguing and searching for retrospective data by use of OCR text.Journal of the American Society for Information Science and Technology,52(5),378-390.
  26. Tseng, Y.-H.(1998).An approach to retrieval of OCR degraded text.National Taiwan University Journal of Library Science,13,153-168.
  27. Tseng, Y.-H.(2002).Automatic thesaurus generation for Chinese documents.Journal of the American Society for Information Science and Technology,53(13),1130-1138.
  28. Tseng, Y.-H.,Chang, C.-Y.,Rundgren Chang, S.-N.,Rundgren, C.-J.(2010).Mining concept maps from news stories for measuring civic scientific literacy in media.Computers & Education,55(1),165-177.
  29. Tseng, Y.-H.,Ho, Z.-P.,Yang, K.-S.,Chen, C.-C.(2012).Mining term networks from text collections for crime investigation.Expert Systems with Applications,39(11),10082-10090.
  30. Tseng, Y.-H.,Lee, L.-H.,Lin, S.-Y.,Liao, B.-S.,Liu, M.-J.,Chen, H.-H.,Fader, A.(2014).Chinese open relation extraction for knowledge acquisition.Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Volume 2: Short Papers,Gothenburg, Sweden:
  31. Tseng, Y.-H.,Lin, C.-J.,Lin, Y.-I.(2007).Text mining techniques for patent analysis.Information Processing and Management: An International Journal,43(5),1216-1247.
  32. Van Rijsbergen, C. J.(1979).Information retrieval.Boston, MA:Butterworth-Heinemann.
  33. Witten, I. H.,Moffat, A.,Bell, T. C.(1999).Managing gigabytes: Compressing and indexing documents and images.San Francisco, CA:Morgan Kaufmann.
  34. Witten, I. H.,Paynter, G. W.,Frank, E.,Gutwin, C.,Nevill-Manning, C. G.(1999).KEA: Practical automatic keyphrase extraction.Proceedings of the Fourth ACM Conference on Digital Libraries,New York, NY:
  35. 朱讚美(2000)。嘉義縣=Chiayi,國立中正大學資訊工程研究所=Institute of Information Engineering, National Chung Cheng University。
  36. 江玉婷、陳光華(1999)。TREC現況及其對資訊檢索研究之影響。圖書與資訊學刊,29,36-59。
  37. 曾元顯(2014)。,台北=Taipei:國立臺灣師範大學=National Taiwan Normal University。
  38. 曾元顯(2002)。回溯性資料數位化服務之規劃與建置。資訊傳播與圖書館學,9(2),27-39。
  39. 曾元顯、王峻禧(2007)。分類不一致之自動偵測:以農資中心資料為例。圖書館學與資訊科學,33(2),20-32。
  40. 曾元顯、林瑜一(1998)。模糊搜尋、相關詞提示與相關詞回饋在OPAC系統中的成效評估。中國圖書館學會會報,61,103-125。
  41. 謝欣君、張玉山、袁賢銘(1998)。異質性搜尋引擎代理人之設計與實作。1998台灣區網際網路研討會發表之論文,花蓮縣=Hualien, Taiwan:
  1. 謝育慈(2016)。醫學博碩士論文關鍵詞與MeSH詞彙之對應研究-以臺北醫學大學為例。淡江大學數位出版與典藏數位學習碩士在職專班學位論文。2016。1-81。