题名

中文文本可讀性探討:指標選取、模型建立與效度驗證

并列篇名

Investigating Chinese Text Readability: Linguistic Features, Modeling, and Validation

DOI

10.6129/CJP.20120621

作者

宋曜廷(Yao-Ting Sung);陳茹玲(Ju-Ling Chen);李宜憲(Yi-Shian Lee);查日龢(Jih-Ho Cha);曾厚強(Hou-Chiang Tseng);林維駿(Wei-Chun Lin);張道行(Tao-Hsing Chang);張國恩(Kuo-En Chang)

关键词

可讀性 ; 正確性 ; 逐步迴歸 ; SVM數學模型 ; accuracy ; readability ; stepwise regression ; support vector machine

期刊名称

中華心理學刊

卷期/出版年月

55卷1期(2013 / 03 / 01)

页次

75 - 106

内容语文

繁體中文

中文摘要

本研究根據中文特性發展可讀性指標,接著建立中文文本可讀性數學模型,並進行模型效度驗證。本研究以所發展24個可讀性指標為預測變項,386篇教科書文章之年級值為效標變項,建立逐步迴歸(stepwise regression)與SVM可讀性數學模型,再以96篇新文章為測試資料進行模型驗證。研究結果顯示:在逐步迴歸模型中,難詞數、單句數比率、實詞頻對數平均與人稱代名詞數為重要的預測變項;以SVM模型F-score方法所得的重要預測變項則為難詞數、二字詞數、字數與中筆畫字元數等。逐步迴歸模型與SVM模型對新文章的預測正確性分別為55.21%及72.92%,兩種模型預測低年級文章之正確性均高於高年級文章。

英文摘要

This study aims to (a) develop readability indicators based on the textual factors that influence reading comprehension; (b) construct the readability model for Chinese text; and (c) validate the proposed readability models. This study constructs readability models employing step regression and SVM, using 24 readability indicators as its predictive variable and the grade level of 386 textbook articles as the criteria. The proposed models are then validated according to an additional 96 texts. The results show that in step regression, the critical predictors are the number of complex words, proportion of simple sentences, average logarithm of content word frequency, and number of personal pronouns. In the SVM model, the critical predictors selected by using the F-score include the number of complex words, number of two-character words, number of characters, and number of intermediate-stroke characters. The accuracy rates of step regression and SVM are 55.21% and 72.92%, respectively. Both models predict the texts more accurately at the lower grade levels than at the higher grade levels.

主题分类 社會科學 > 心理學
参考文献
  1. 陳茹玲、蘇宜芬(2010)。國小不同認字能力學童辨識中文字詞之字元複雜度效果與詞長效果研究。教育心理學報,41,579-604。
    連結:
  2. 蘇宜芬、陳學志(2007)。認字自動化指標之建立與信效度研究。教育心理學報,38,501-514。
    連結:
  3. Dale, E., & Chall, J. S. (1949). The concept of readability.Elementary English, 26, 19-26.
  4. Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221-233.
  5. Flesch, R. (1949). New facts about readability. College English, 10, 225-226.
  6. Lively, B. A., & Pressey, S. L. (1923). A method for measuring the “vocabulary burden” of textbooks. Educational Administration and Supervision, 9, 389-398.
  7. Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin, 27, 37-54.
  8. Flesch, R. (1946). The art of plain talk. New York: Harper & Brothers.
  9. Thorndike, E. L. (1921). The teacher's word book. New York: Teachers College, Columbia University.
  10. Bailin, A.,Grafstein, A.(2001).The linguistic assumptions underlying readability formula: A critique.Language & Communication,21,285-301.
  11. Balota, D. A.,Chumbley, J. J.(1984).Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage.Journal of Experimental Psychology: Human Perception and Performance,10,340-357.
  12. Beck, I. L.,McKeown, M. G.,Kucan, L.(2002).Bringing words to life: Robust vocabulary instruction.New York:The Guilford Press.
  13. Benjamin, R. G.(2012).Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty.Educational Psychology Review,24,63-88.
  14. Boser, B. E.,Guyon, I. M.,Vapnik, V. N.(1992).A training algorithm for optimal margin classifiers.5th annual ACM workshop on computational learning theory,Pittsburgh, PA:
  15. Bruce, B. C.,Rubin, A. D.,Starr, K. S.(1981).Reading Educational ReportReading Educational Report,Urbana, IL:University of Illinois.
  16. Caylor, J. S.,Sticht, T. G.,Fox, L. C.,Ford, J. P.(1973).,Alexandria, VA:Human Resources Research Organization.
  17. Chall, J. S.,Dale, E.(1995).Readability revisited: The new Dale-Chall readability formula.Cambridge, MA:Brookline Books.
  18. Chang, C.-C.,Lin, C.-J.(2011).LIBSVM: A library for support vector machines.ACM Transactions on Intelligent Systems and Technology,2,1-27.
  19. Chang, T. H.,Sung, Y. T.,Lee, Y. T.(2012).A Chinese word segmentation and POS tagging system for readability research.42nd Annual Meeting of the Society for Computers in Psychology(SCiP 2012),Minneapolis, MN.:
  20. Chen, J. L.,Cha, J. H.,Chang, T. H.,Sung, Y. T.,Hsieh, K. S.(2012).CRIE: A tool for analyzing Chinese text characteristics.42nd Annual Meeting of the Society for Computers in Psychology (SCiP 2012),Minneapolis, MN.:
  21. Clay, M. M.(1991).Becoming literate.Auckland, NZ:Heinemann.
  22. Coupland, N.(1978).Is readability real?.Communication of Scientific and Technical Information,35,15-17.
  23. Dale, E.(1967).Can you give the public what it wants?.New York:World Book Encyclopedia.
  24. DuBay, W. H.(2004).The principles of readability.Costa Mesa, CA:Impact Information.
  25. DuBay, W. H.(2007).Smart Language: Reader, readability, and the grading of text.Costa Mesa, CA:Impact Information.
  26. Dzaldov, B. S.,Peterson, S.(2005).Book leveling and readers.The Reading Teacher,59,222-229.
  27. Faison, E. W.(1951).Readability of children's textbooks.Journal of Educational Psychology,42,43-51.
  28. Fang, S. P.(1994).English word length effects and the Chinese character-word difference: Truth or myth?.Chinese Journal of Psychology,36,59-80.
  29. Fellbaum, C.(Ed.)(1998).WordNet: An electronic lexical database.Cambridge, MA:MIT.
  30. Feng, L.,Jansche, M.,Huenerfauth, M.,Elhadad, N.(2010).A comparison of features for automatic readability assessment.23rd International Conference on Computational Linguistics,Beijing, China:
  31. Flesch, R.(1951).How to test readability.New York:Harper & Brothers.
  32. Flesch, R.(1979).How to write plain English: A book for lawyers and consumers.New York:Harper & Brothers.
  33. Forster, K. I.,Chambers, S. M.(1973).Lexical access and naming time.Journal of Verbal Learning and Verbal Behavior,12,627-635.
  34. Fountas, I. C.,Pinnell, G. S.(1999).Matching books to readers: Using leveled books in guided reading, K-3.Portsmouth, NH:Heinemann.
  35. Fry, E. B.(1968).A readability formula that saves time.Journal of reading,11,513-516.
  36. Fry, E. B.(2002).Readability versus leveling.Reading Teacher,56,286-292.
  37. Fry, E. B.,Kress, J. E.,Fountoukidis, D. J.(1993).The reading teacher's book of lists (3rd ed.).West Nyack, NY:The Center for Applied Research in Education.
  38. Gernsbacher, M. A.(1990).Language comprehension as structure building.Hillsdale, NJ:Lawrence Erlbaum Associates.
  39. Gernsbacher, M. A.(1984).Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy.Journal of Experimental Psychology: General,113,256-281.
  40. Gershkoff-Stowe, L.,Hahn, E. R.(2007).Fast mapping skills in the developing lexicon.Journal of Speech, Language, and Hearing Research,50,682-696.
  41. Givón T.(1979).On understanding grammar.New York:Academic Press.
  42. Goldiamond, I.,Hawkins, W. F.(1958).Vexierversuch: The log relationship between word-frequency and recognition obtained in the absence of stimulus words.Journal of Experimental psychology,56,457-463.
  43. Graesser, A. C.,McNamara, D. S.,Kulikowich, J. M.(2011).Coh-Metrix: Providing multilevel analyses of text characteristics.Educational Researcher,40,223-234.
  44. Graesser, A. C.,McNamara, D. S.,Louwerse, M. M.,Cai, Z.(2004).Coh-Metrix: Analysis of text on cohesion and language.Behavior Research Methods, Instruments, & Computers,36,193-202.
  45. Graesser, A. C.,Singer, M.,Trabasso, T.(1994).Cons t ruct ing inferences dur ing nar rat ive text comprehension.Psychological Review,101,371-395.
  46. Gunning, R.(1952).The technique of clear writing.New York:McGraw-Hill.
  47. Hair, J. F. J.,Anderson, R. E.,Tatham, R. L.,Black, W. C.(1998).Multivariate data analysis (5th ed.).Upper Saddle River, NJ:Prentice Hall.
  48. Halliday, M. A. K.,Hasan, R.(1976).Cohesion in English.London, UK:Lomgman.
  49. Harris, T. L.(Ed.),Cooper, E. J.(Ed.)(1985).Reading, thinking, and concept development.New York:College Entrance Examination Board.
  50. Howes, D. H.,Solomon, R. L.(1951).Visual duration threshold as a function of word-probability.Journal of Experimental Psychology,41,401-410.
  51. Huang, S.(2000).The story of heads and tails - On a sequentially sensitive lexicon.Language and Linguistics,1(2),79-107.
  52. Hwang, S. J. J.(Ed.),Merrifield, W. R.(Ed.)(1992).Language in context: Essay for Robert E. Longacre.Dallas, TX:Summer Institute of Linguistics and University of Texas at Arlington.
  53. Johansson, V.(2008).Lexical diversity and lexical density in speech and writing: A developmental perspective.Lund University, Department of Linguistics and Phonetics Working Papers,53,61-79.
  54. Jordan, M. P.(1998).The power of negation in English: Text, context and relevance.Journal of Pragmatics,29,705-752.
  55. Just, M. A.(Ed.),Carpenter, P. A.(Ed.)(1987).The psychology of reading and language processing.Newton, MA:Allyn & Bacon.
  56. Kincaid, J. P.,Fishburne, L. R. P.,Rogers, R. L.,Chissom, B. S.(1975).Derivation of new readability formulas (automated readability index, Fog Count and Flesch Reading Ease Formula) for navy enlisted personnel.Millington, TN:Navy Research Branch.
  57. Kintsch, W.(1988).The role of knowledge in discourse comprehension: A construction-integration model.Psychological Review,95,163-182.
  58. Klare, G. R.(1980).A manual for readable writing (4th ed.).Glen Burnie, MD:REM.
  59. Klare, G. R.(1976).A second look at the validity of readability formulas.Journal of Reading Behavior,8,129-152.
  60. Klare, G. R.(1963).The measurement of readability.Ames, IA:Iowa State University Press.
  61. Klare, G. R.(2000).The measurement of Readability: Useful information for communicators.ACM Journal of Computer Documentation,24,107-121.
  62. LaBerge, D.,Samuels, S. J.(1974).Toward a theory of automatic information processing in reading.Cognitive Psychology,6,293-323.
  63. Larsson, P.(2006).Uppsala, Sweden,Uppsala University.
  64. Lehnert, W. G.(Ed.),Ringle, M. H.(Ed.)(1982).Strategies for natural language processing.Hillsdale, NJ:Lawrence Erlbaum Associates.
  65. Leong, C. K.,Cheng, P. W.,Mulcahy, R.(1987).Automatic processing of morphemic orthography by mature readers.Language and Speech,30,181-197.
  66. Louwerse, M. M.(2001).An analytic and cognitive parameterization of coherence relations.Cognitive Linguistics,12,291-315.
  67. Louwerse, M. M.,Mitchell, H. H.(2003).Toward a taxonomy of a set of discourse markers in dialog: A theoretical and computational Linguistic account.Discourse Processes,35,199-239.
  68. Malvern, D. D.,Richards, B. J.,Chipere, N.,Durán, P.(2004).Lexical diversity and language development: Quantification and assessment.New York:Palgrave Macmillan.
  69. McBride-Chang, C.(Ed.),Chen, H.-C.(Ed.)(2003).Reading development in Chinese children.Westport, CT:Greenwood.
  70. McCall, W. A.,Crabbs, L. M.(1979).McCall-Crabbs Standard test lessons in reading.New York:Teachers College Press.
  71. McClelland, J. L.,Rumelhart, D. E.(1981).An interactive activation model of context effects in letter perception: Part 1. An account of basic findings.Psychological Review,88,375-407.
  72. McCusker, L. M.(1977).Some determinants of word recognition: Frequency.24th Annual Convention of the Southwestern Psychological Association,Fort Worth, TX.:
  73. McLaughlin, G. H.(1969).SMOG grading - A new readability formula.Journal of Reading,22,639-646.
  74. McLaughlin, G. H.(1968).Proposals for British readability measures.The third international reading symposium,London:
  75. McNamara, D. S.,Kintsch, E.,Songer, N. B.,Kintsch, W.(1996).Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text.Cognition and Instruction,14,1-43.
  76. McNamara, D. S.,Kintsch, W.(1996).Learning from texts: Effects of prior knowledge and text coherence.Discourse Processes,22,247-288.
  77. McNamara, D. S.,Louwerse, M. M.,Graesser, A. C.(2002).Coh-MetrixCoh-Metrix: Automated cohesion and coherence scores to predict text readability and facilitate comprehension.Memphis, TN:Institute for Intelligent Systems, University of Memphis.
  78. McNamara, D. S.,Louwerse, M. M.,McCarthy, P. M.,Graesser, A. C.(2010).Coh-Mertix: Capturing linguistic features of cohesion.Discourse Process,47,292-330.
  79. Murray, W. S.,Forster, K. I.(2004).Serial mechanisms in lexical access: The rank hypothesis.Psychological Review,111,721-756.
  80. Pearson, P. D.(Ed.),Barr, R.(Ed.),Kamil, M. I.(Ed.),Mosenthal, P.(Ed.)(1984).Handbook of reading research.New York:Longman.
  81. Peng, D.-L.(Ed.),Shu, H.(Ed.),Chen, H.-C.(Ed.)(1997).The cognitive research of Chinese.Shandong:Shandong Educational Publisher.
  82. Petersen, S. E.,Ostendorf, M.(2009).A machine learning approach to reading level assessment.Computer Speech and Language,23,89-106.
  83. Powers, R. D.,Sumner, W. A.,Kearl, B. E.(1958).A recalculation of four adult readability formulas.Journal of Educational Psychology,49,99-105.
  84. Ravid, D.,Berman, R. A.(2010).Developing noun phrase complexity at school age: A text-embedded cross-linguistic analysis.First Language,30,3-26.
  85. Rayner, K.(Ed.)(1983).Eye movements in reading: Perceptual and language processes.New York:Academic Press.
  86. Ruddell, R. B.(Ed.),Rudell, M. R.(Ed.),Singer, H.(Ed.)(1994).Theoretical models and processes of reading: Fourth Edition.Newark, DE:International Reading Association.
  87. Sakaluk, B. L.(Ed.),Samuel, S. J.(Ed.)(1988).Readability: Its past, present, and future.Newark, DE:International Reading Association.
  88. Samuels, S. J.(Ed.),Farstrup, A. E.(Ed.)(2006).What research says about reading instruction.Newark, DE:International Reading Association.
  89. Sanders, T. J. M.,Spooren, W. P. M.,Noordman, L. G. M.(1992).Toward a taxonomy of coherence relations.Discourse Processes,15,1-35.
  90. Schriver, K.(2000).Readability formula in the new millennium: What's the use?.ACM Journal of Computer Documentation,24,138-140.
  91. Schwarm, S. E.,Ostendorf, M.(2005).Reading level assessment using support vector machines and statistical language models.43rd annual meeting of the Association for Computational Linguistics,Ann Arbor, MI.:
  92. Seidenberg, M. S.,McClelland, J. L.(1989).Adistributed, developmental model of word recognition and naming.Psychological Review,96,523-568.
  93. Selzer, J.(1981).Readability is a four-letter word.Journal of Business Communication,18(4),23-34.
  94. Snowling, M. J.(Ed.),Hulme, C.(Ed.)(2008).The science of reading: A handbook.Oxford, UK:Blackwell.
  95. Spache, G.(1953).A new readability formula for primarygrade reading materials.Elementary School Journal,53,410-413.
  96. Spache, G.(1978).Good reading for poor readers (10th ed.).Champaign, IL:Author.
  97. Stenner A. J.,Burdick, H.,Sanford, E. E.,Burdick, D. S.(2006).How accurate are Lexile text measures?.Journal of Applied Measurement,7,307-322.
  98. Su, Y.-F.,Samuels, S. J.(2010).Developmental changes in character-complexity and word-length effects when reading Chinese script.Reading and Writing: An Interdisciplinary Journal,23,1085-1108.
  99. Sung, Y.-T.,Chang, T. H.,Chen, J.-L.,Cha, J.-H.,Huang, C.-H.,Hu, M.-K.(2011).The construction of Chinese Readability Index Explorer and the analysis of text readability.21th Annual Meeting of Society for Text and Discourse Process,Poitiers, France:
  100. Swanson, C. E.,Fox, H. G.(1953).Validity of readability formulas.Journal of Applied Psychology,37,114-118.
  101. Tan, L. H.,Peng, D. L.(1990).The effects of semantic context on the feature analyses of single Chinese characters.Journal of Psychology,4,5-10.
  102. Tanaka-Ishii, K.,Tezuka, S.,Terada, H.(2010).Sorting texts by readability.Computational Linguistics,36,203-227.
  103. Taylor, B. M.(Ed.),Graves, M. F.(Ed.),van den Broek, P.(Ed.)(2000).Reading for meaning: Fostering comprehension in the middle grades.Newark, DE:International Reading Association.
  104. Templin, M. C.(1975).Certain language skills in children: Their development and interrelationships.Minneapolis, MN:University of Minnesota Press.
  105. van Oostendorp, H.(Ed.),Goldman, S. R.(Ed.)(1998).The construction of mental representations during reading.Mahwah, NJ:Lawrence Erlbaum Associates.
  106. Vapnik, V. N.,Chervonenkis, A.(1974).Teoriya RaspoznavaniyaObrazov: Statisticheskie Problemy Obucheniya.Moscow, Russia:Nauka.
  107. Vaughan, J. J.(1976).Interpreting readability assessment.Journal of Reading,19,635-639.
  108. Webelhuth, G.(Ed.)(1995).Government and binding theory and the minimalist program.Oxford, UK:Blackwell.
  109. Whaley, C. P.(1978).Word-nonword classification time.Journal of Verbal Learning and Verbal Behavior,17,143-154.
  110. Yang, S.-J.(1970).Madison, WI.,University of Wisconsin.
  111. 中文詞知識庫小組(1993)。中文詞類分析(三版)。台北=Taipei:中央研究院資訊科學所中文詞知識庫小組=Academic Sinica Institute of Information Science。
  112. 朱德熙(1982)。語法講義。北京=Beijing:商務印書館=Commercial Press。
  113. 何永清(2005)。現代漢語語法新探。台北=Taipei:商務印書館=Commercial Press。
  114. 孫德金(2002)。漢語語法教程。北京=Beijing:北京語言文化大學=Language and Culture University Press。
  115. 荊溪昱(1992)。行政院國家科學委員會專題研究計畫行政院國家科學委員會專題研究計畫,台北=Taipei:行政院國家科學委員會=National Science Council, Executive Yuan。
  116. 荊溪昱(1995)。中文國文教材的適讀性研究:適讀年級值的推估。教育研究資訊,3(3),113-127。
  117. 梅家駒、筑一鳴、高蘊琦、殷鴻翔(1984)。同義詞詞林。上海=Shanghai:上海辭書=Shanghai Lexicographical Publishing House。
  118. 郭銳(2001)。詞頻與詞的功能的相關性。語文研究,3,1-9。
  119. 陳世敏(1972)。可讀性公式簡介。思與言,10(5),31-38。
  120. 陳順宇(2009)。迴歸分析。台北=Taipei:三民書局=Sanmin。
  121. 喻柏林、曹河圻、馮玲、李文玲(1990)。漢字形碼和音碼的整體性對部件識別的影響。心理學報,22,232-239。
  122. 楊孝濚(1971)。影響中文可讀性語言因素的分析。報學,4(7),58-67。
  123. 鄭錦全(2005)。詞匯語義與句子閱讀難易度計量。第六屆漢語詞彙語意學研討會,廈門=Xiamen, China:
被引用次数
  1. 陳家慧(Chia-Hui Chen);劉佩怡(Pei-Yi Liu);陳沛妤(Pei-Yu Chen)(2024)。管理當局帝國建立與財務報表可讀性:論供應鏈會計師之影響。中山管理評論。32(3)。409-454。 
  2. 蕭惠貞(Huichen S. Hsiao);詹士微(Shih-Wei Chan);陳瀅伃(Ying-Yu Chen)(2022)。人工智慧學習平台之教學應用反思-以法律華語文本為例。臺大華語文學習與科技。2(1)。107-143。