题名

Identifying Food-related Word Association and Topic Model Processing using LDA

并列篇名

「食」類相關的詞彙聯想識別和主題模型處理:以LDA為例

DOI

10.6182/jlis.201806_16(1).023

作者

李郁錦(Yu-Chin Li);胡宗智(Tsung-Chih Hu);張國恩(Kuo-En Chang)

关键词

LDA (latent Dirichlet allocation) ; Mandarin Vocabulary Study ; Semantic Priming ; Timelimited Multiple Divergent Thinking Test of Word Associative Strategy (TLM-DTTWAS) ; Word Association ; LDA(latent Dirichlet allocation) ; 華語詞彙學習 ; 語義啟動 ; 多重限時「詞彙聯想策略擴散性思考測驗」 ; 詞彙聯想

期刊名称

圖書資訊學刊

卷期/出版年月

16卷1期(2018 / 06 / 01)

页次

23 - 43

内容语文

英文

中文摘要

This paper presents an interdisciplinary study that combines natural language processing and psycholinguistics research. The latent Dirichlet allocation (LDA) model was used for semantic relatedness computation to enable an understanding of the mechanisms and processes through which humans encode and retrieve lexical units. To test the similarity of the output of the topic model and human word association, the "Time-limited Multiple Divergent Thinking Test of Word Associative Strategy" (TLM-DTTWAS) was used to collect data and conduct tests with three food-related stimulus words. A total of 101 subjects took the tests, producing 4,251 words. The empirical results were analyzed on two levels: (1) by the expert word association classification: taxonomic and script proposed by Ross and Murphy (1999); (2) followed by the associative hierarchy theory of Mednick (1962), to sort the vocabulary test results into two associative hierarchies, "steep" and "flat." The analysis indicated that human word association displays randomness, as well as generalization and continuity. After the experimental text was passed through the LDA latent semantic model which demonstrated highly significant correlation. This was a whole new attempt to train a data science model to make inference and prediction of human concept association which could be very useful in teaching as well as commercial applications.

英文摘要

本研究結合自然語言處理及心理語言學二者,屬一跨領域研究。為理解人類對詞彙認知與習得的機制與過程,試圖以主題模型中的潛在語意模型LDA(latent Dirichlet allocation),進行詞彙語意相關度的運算。為測試潛在語意模型的輸出與人類詞彙聯想的相似度,本研究藉由大規模的多重限時「詞彙聯想策略擴散性思考測驗」的資料搜集,以三項刺激詞進行測驗,共101位受試者參與受試,輸出共4,251項獨立詞。實驗結果透過二個層次的分析:(1)以專家分類(expert classification)的方式,透過二名專家,一方面以Ross與Murphy(1999)所提出的詞彙聯想結果的分類指標(知識及腳本分類)分類。另一方面,以Mednick(1962)的連結層級理論,將詞彙測驗結果分為二類:陡峭式與平緩式連結。分析結果指出人類聯想不僅具有隨機性,更具有普遍性及延展性。(2)實驗文本經由潛在語意模型LDA運算,二者的結果交叉比對後,證實具高度顯著相關。輸出結果符合人類學習和聯想的機制。本研究所進行的是一個全新的嘗試—資料處理科學對人類的詞彙及概念的聯想進行推理和預測。此一結果,未來在教學和商業上可提供改善及應用。

主题分类 人文學 > 圖書資訊學
参考文献
  1. Chen, M.-L.,Wang, H.-C.,Ko, H.-W.(2009).The construction and validation of chinese semantic space by usinglatent semantic analysis.Chinese Journal of Psychology,51(4),415-435.
    連結:
  2. Huang, P.-S.,Chen, H.-C.,Huang, H.-C.,Liu, C.-H.(2009).The Development of Divergent Thinking Test of Word Associative Strategy (DTTWAS).Psychological Testing,56(2),153-177.
    連結:
  3. Huang, P.-S.,Chen, H.-C.,Liu, C.-H.(2012).The development of Chinese word remote associates test for college students.Psychological Testing,59(4),581-607.
    連結:
  4. Altınel, B.,Ganiz, M. C.(2016).A new hybrid semi-supervised algorithm for text classification with class-based semantics.Knowledge-Based Systems,108,50-64.
  5. Anderson, R. C.(1977).Schema-directed processes in language comprehension.Urbana, IL:University of Illinois at Urbana-Champaign.
  6. Baddeley, A. D.(1982).Domains of recollection.Psychological Review,89(6),708-729.
  7. Blei, D. M.,Ng, A. Y.,Jordan, M. I.(2003).Latent Dirichlet allocation.Journal of Machine Learning Research,3(2003),993-1022.
  8. Budanitsky, A.,Hirst, G.(2006).Evaluating wordnet-based measures of lexical semantic relatedness.Computational Linguistics,32(1),13-47.
  9. Chen, T.,Xie, Y.-Q.(2005).Literature review of feature dimension reduction in text categorization.Journal of the China Society for Scientific and Technical Information,24(6),690-695.
  10. Cuenca, M. J.,Hilferty, J.(1999).Introducción a la lingüística cognitiva.Barcelona, Spain:Editorial Ariel.
  11. De Boom, C.,Van Canneyt, S.,Demeester, T.,Dhoedt, B.(2016).Representation learning for very short texts using weighted word embedding aggregation.Pattern Recognition Letters,80,150-156.
  12. Deerwester, S.,Dumais, S. T.,Furnas, G. W.,Landauer, T. K.,Harshman, R.(1990).Indexing by latent semantic analysis.Journal of the American Society for Information Science,41(6),391-407.
  13. Dumais, S. T.(2004).Latent semantic analysis.Annual Review of Information Science and Technology,38(1),188-230.
  14. Ellis, N. C.(2002).Frequency effects in language processing.Studies in Second Language Acquisition,24(2),143-188.
  15. Engle, R. W.,Nations, J. K.,Cantor, J.(1990).Is "working memory capacity" just another name for word knowledge?.Journal of Educational Psychology,82(4),799-804.
  16. Gathercole, S. E.,Baddeley, A. D.(2014).Working memory and language.New York, NY:Psychology Press.
  17. Gu, P. Y.(2003).Vocabulary learning in a second language: Person, task, context and strategies.The Electronic Journal for English as a Second Language,7(2),1-25.
  18. Guilford, J. P.(1967).The nature of human intelligence.New York, NY:McGraw-Hill.
  19. Hassabis, D.,Kumaran, D.,Summerfield, C.,Botvinick, M.(2017).Neuroscience-inspired artificial intelligence.Neuron,95(2),245-258.
  20. Hu, Z.-W.,Chen, Y.-Z.,Chang, S.-H.,Sung, Y.-C.(1996).Chinese polyseme free association norm.Chinese Journal of Psychology,38(2),67-168.
  21. Kosslyn, S. M.,Smith, E. E.(2006).Cognitive psychology: Mind and brain.Upper Saddle River, NJ:Prentice-Hall.
  22. Landauer, T. K.(2002).On the computational basis of learning and cognition: Arguments from LSA.Psychology of Learning and Motivation,41,43-84.
  23. Langacker, R. W.(1987).Foundations of cognitive grammar. Volume I: Theoretical prerequisites.Stanford, CA:Stanford University Press.
  24. Lehrer, A.(Ed.),Kittay, E.(Ed.)(1992).Frames, fields, and contrasts: New essays in semantic and lexical organization.Hillsdale, NJ:Lawrence Erlbaum Associates.
  25. McKeown, M. G.(Ed.),Curtis, M. E.(Ed.)(1987).The nature of vocabulary acquisition.Hillsdale, NJ:Lawrence Erlbaum Associates.
  26. Meara, P.(2009).Connected words: Word as s ociat ions and s econd language vocabulary acquisition.Philadelphia, PA:John Benjamins.
  27. Mednick, M. T.,Mednick, S. A.,Jung, C. C.(1964).Continual association as a function of level of creativity and type of verbal stimulus.The Journal of Abnormal and Social Psychology,69(5),511-515.
  28. Mednick, S.(1962).The associative basis of the creative process.Psychological Review,69(3),220-232.
  29. Nation, I. S. P.(2013).Teaching and learning vocabulary.Boston, MA:Heinle Cengage Learning.
  30. Nguyen, S. P.(2007).Cross-classification and category representation in children's concepts.Developmental Psychology,43(3),719-731.
  31. Polguère, A.(2014).From writing dictionaries to weaving lexical networks.International Journal of Lexicography,27(4),396-418.
  32. Ramachandran, V. S.,Hubbard, E. M.(2001).Synaesthesia-A window into perception, thought and language.Journal of Consciousness Studies,8(12),3-34.
  33. Roediger, H. L.,Weldon, M. S.,Stadler, M. L.,Riegler, G. L.(1992).Direct comparison of two implicit memory tests: Word fragment and word stem completion.Journal of Experimental Psychology: Learning, Memory, and Cognition,18(6),1251-1269.
  34. Rosario, B.(2000).final paper INFOSYSfinal paper INFOSYS,Berkeley, CA:University of Berkeley.
  35. Rosch, E.,Mervis, C. B.(1975).Family resemblances: Studies in the internal structure of categories.Cognitive Psychology,7(4),573-605.
  36. Ross, B. H.,Murphy, G. L.(1999).Food for thought: Cross-classification and category organization in a complex real-world domain.Cognitive Psychology,38(4),495-553.
  37. Schmitt, D. N.(Ed.),McCarthy, M.(Ed.)(1997).Vocabulary: Description, acquisition and pedagogy.Cambridge, England:Cambridge University Press.
  38. Schmitt, N.,Meara, P.(1997).Researching vocabulary through a word knowledge framework.Studies in Second Language Acquisition,19(1),17-36.
  39. Squire, L. R.,Kandel, E. R.(2000).Memory: From mind to molecules.New York, NY:Holt Paperbacks.
  40. Symons, C. S.,Johnson, B. T.(1997).The self-reference effect in memory: A meta-analysis.Psychological Bulletin,121(3),371-394.
  41. Szymański, J.,Rzeniewicz, J.(2016).Identification of category associations using a multilabel classifier.Expert Systems with Applications,61,327-342.
  42. Vapnik, V. N.(1998).Statistical learning theory.New York, NY:Wiley.
  43. Walter, S.,Unger, C.,Cimiano, P.(2014).ATOLL-A framework for the automatic induction of ontology lexica.Data and Knowledge Engineering,94,148-162.
  44. Zock, M.,Tesfaye, D.(2012).Automatic index creation to support navigation in lexical graphs encoding part_of relations.Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon,Mumbai, India: