题名

深度詞庫:邁向知識導向的人工智慧基礎

并列篇名

DeepLEX: Toward a Knowledge-yielding Approach and Resource for AI

DOI

10.6129/CJP.201909_61(3).0004

作者

謝舒凱(Shu-Kai Hsieh);曾昱翔(Yu-Hsiang Tseng)

关键词

人工智慧 ; 計算詞庫 ; 對話系統 ; 語意表徵 ; AI ; computational lexicon ; dialogue system ; semantic representation

期刊名称

中華心理學刊

卷期/出版年月

61卷3期(2019 / 09 / 01)

页次

231 - 247

内容语文

繁體中文;英文

中文摘要

晚近的深度學習神經網路在大數據與高效計算的時代背景之下,在語音處理與其他辨識任務上取得重大的成就。尤其詞嵌入(word embeddings)的分布向量語意(distributional vector semantics)表徵提出後,計算機逐步掌握人類語言中的詞彙語義關係。然而語言與概念知識中存在的豐富階層關係,仍難以被目前的神經網路架構表徵與概化。在計算語言學領域,學者們從不同的詞彙理論假說,發展出各式詞彙資源(lexical resources),試圖彌補計算機從「共聚性」(syntagmatic)資料難以學習到的「類聚性」(paradigmatic)知識,以讓計算機逐漸靠近人類可以利用少量數據,在未知情況下進行推理,以及瞭解甚至同理人類情感的能力。這些人類能力的共通之處在於涉及個人、社會與文化脈絡的互動,具有高脈絡變異性的特點,難以用巨量的薄數據的方式讓電腦學習。此研究採取計算功能語言學的觀點,認為詞庫是外顯的人類語言知識倉儲。透過人為標記與自動的抽取紀錄,是通用人工智慧自主學習的重要基礎之一。本研究並進一步認為,詞庫中的語言知識除了「形式」與「意義」的配對關係以外,更應回應在中文語言裡,表達形式的流動性以及表達形式與意義的連動性。本研究的目的在整合並發展包含語言、心理、華語教學等各層次變項的「深度詞庫」,以及讓使用者得以自由決定中文語式的標記工具,並討論此語言資源未來的可能應用。

英文摘要

Deep learning and neural network has gained substantial progress in recent years. After the introduction of word embeddings, a form of distributional vector semantics, computers could better simulate the lexical semantic relationships between words. However, the hierarchical nature of human language and concepts are still difficult to modeled by current approach. In computational linguistics, researchers developed lexical resources from different theoretical perspectives. These language resources attempt to bridge the gap between syntagmatic relationships, which computers can readily modeled from data, and paradigmatic knowledge, that are not readily grasped by computers. These knowledge are essential for the capability to reason in an unfamiliar context with only few data, and are also vital to develop empathy of human emotions. The commonality of these capabilities involves the high context variance, in which individual, social and cultural context intertwined, render a great challenge for computers to learn in a data-hungry way. Current study considers, as one would argue in computational functional linguistics, lexicon as an explicit knowledge base of human language. It is human annotation aided by automatic extraction the essential building block of strong artificial intelligence. Moreover, the knowledge stored in lexicon not only contains the pairing between forms and meanings, it should also address the fluidity of formulae and the dynamics between form-meaning pairings. The goal of current study is thus to integrate and develop a novel lexicon model called DeepLex that includes multilevel lexical properties, such as linguistic, psychological and pedagogical. A web-based tool is also developed to help users to freely determine and annotate formulae in Chinese. Further applications of DeepLex is also discussed.

主题分类 社會科學 > 心理學
参考文献
  1. Huang C. L.,Chung, C. K.,Hui, N.,Lin, Y. C.,Seih, Y. T.,Lam, B. C. P.(2012).The development of Chinese linguistic inquiry and word count dictionary.Chinese Journal of Psychology,54(2),185-201.
    連結:
  2. Asghar, N.,Poupart, P.,Hoey, J.,Jiang, X.,Mou, L.(2017).,未出版
  3. Baayen, R. H.(2010).Demythologizing the word Frequency Effect: A discriminative learning perspective.The mental lexico,5,436-461.
  4. Bahdanau, D.,Cho, K.,Bengio, Y.(2015).Neural machine translation by jointly learning to align and translate.International Conference on Learning Representations 2015,San Diego, C. A.:
  5. Balota, D. A.,Cortese, M. J.,Sergent-Marshall, S. D.,Spieler, D. H.,Yap, M. J.(2004).Visual word recognition of single-syllable words.Journal of Experimental Psychology: General,133,283-316.
  6. Bojanowski, P.,Grave, E.,Joulin, A.,Mikolov(2016).,未出版
  7. Brysbaert, M.,Keuleers, E.,New B.(2011).Assessing the usefulness of Google books’ word frequencies for psycholinguistic research on word processing.Frontiers in Psychology,2,2-27.
  8. Bybee, Joan(2010).Language, usage and cognition.Cambridge UK:Cambridge University Press.
  9. Chang, Y. N.,Hsu, C. H.,Tsai, J. L.,Chen, C. L.,Lee, C. Y.(2016).A psycholinguistic database for traditional Chinese character naming.Behavior Research Methods,48,112-122.
  10. Coltheart, M.,Rastle, K.,Perry, C.,Langdon, R.,Ziegler, J.(2001).DRC: A dual route cascaded model of visual word recognition and reading aloud.Psychological Review,108(1),204-256.
  11. de Saussure, F. (1916). Course in general linguistics. London, UK: Duckworth.
  12. Firth, J. R.(1957).A synopsis of linguistic theory 1930-1955.Special Volume of the Philological Society,Oxford, UK:
  13. Fon, J.(Ed.)(2019).Dimensions of Diffusion and Diversity.Berlin, Germany:Walter de Gruyter.
  14. Gatt, A,Krahmer, E.(2018).Survey of the state of the art in natural language generation: core tasks, applications and evaluation.Journal of Artificial Intelligence Research,61,65-170.
  15. Geeraert, K.,Newman, J.,Baayen, R. H.(2017).Idiom variation: Experimental data and a blueprint of a computational model.Topics in Cognitive Science,9,653-669.
  16. Halliday, M.,Christian M.(2008).An Introduction to Functional Grammar.Abingdon, UK:Routledge.
  17. Hinton, G.,Sabour S.,Frosst, N.(2018).Matrix capsules with EM routing.Sixth International Conference on Learning Representation,Canada: Vancouver:
  18. Hsieh, S. K.,Tseng, Y. H.,Lee, C.Y.,Chiang, C.Y.(2018).Fluid Annotation: A granularity-aware annotation tool for Chinese word fluidity.proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC)
  19. Kingma, D. P.,Ba, J. L.(2015).Adam: a method for stochastic optimization.International Conference on Learning Representations 2015,San Diego, CA.:
  20. Kousta, S.,Vinson, D. P.,Vigliocco, G.(2009).Emotion words, regardless of polarity, have a processing advantage over neutral words.Cognition,112,473-481.
  21. Liu, C. W.,Lowe, R.,Serban, I. V.,Noseworthy, M.,Charlin, L.,Pineau, J.(2016).How NOT to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation.Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
  22. Lu, P. Y.,Chang, Y. Y.,Hsieh, S. K.(2013).Causing Emotion in Collocation: An Exploratory Data Analysis.Proceedings of the 25th Taiwan conference on Computational Linguistics and Speech Processing,Kaohsiung, Taiwan:
  23. Marcus, G.(2018).,未出版
  24. Mikolov, T.,Chen, K.,Corrado, G.,Dean, J.(2013).,未出版
  25. Papineni, K.,Roukos, S.,Ward, T.,Zhu, W. J.(2002).BLEU: a method for automatic evaluation of machine translation.Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics,Philadelphia. PA.:
  26. Pennebaker, J. W.,Francis, M. E.,Booth, R. J.(2001).Linguistic inquiry and word count: LIWC2001.Mahwah, NJ:Erlbaum.
  27. Peters, M. E.,Neumann, M.,Iyyer, M.,Gardner, M.,Clark, C.,Lee, K.(2018).Deep contextualized word representations.Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
  28. Schafaei-Bajestan, E.,Baayen, R. H.(2018).Wide learning for auditory comprehension.Interspeech 2018,Hyderabad, India:
  29. Schmid, H. J.(Ed.),Handl S.(Ed.)(2010).Cognitive foundations of linguistic usage patterns.Berlin, Germany:Walter De Gruyter.
  30. Sun, F.,Guo, J.,Lan, Y.,Xu, J.,Cheng, X.(2015).Learning word representations by jointly modeling syntagmatic and paradigmatic relations.53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,Beijin, China:
  31. Sundararajan, L.,Hsieh, S. K.(2018).Sundararajan, L., & Hsieh, S. K. (2018). Chinese SSWC. Manuscript in preparation..
  32. Sundararajan, L.,Schubert, L. K.(2005).Verbal expressions of self and emotions: A taxonomy with implications for Alexithymia and related disorders.Consciousness & emotion book series: Vol.1. Consciousness & emotion: Agency, conscious choice, and selective perception,Amsterdam, Netherlands:
  33. Sutskever, Iiya,Vinyals, Oriol,Le, Quoc V(2014).Sequence to sequence learning with neural networks.Advances in neural information processing systems
  34. Tsai, J. L.,Lee, C. Y.,Lin, Y. C.,Tzeng, O. J.,Hung, D. L.(2006).Neighborhood size effects of chinese words in lexical decision and reading.Language and Linguistics,7,659-675.
  35. Tse, C. S.,Yap, M. J.,Chan, Y. L.,Sze, W. P.,Shaoul, C.,Lin, D.(2017).The Chinese Lexicon Project: A megastudy of lexical decision performance for 25,000+ traditional Chinese two-character compound words.Behavior Research Methods,49(4),1503-1519.
  36. Vaswani, A.,Shazeer, N.,Parmar, N.,Uszkoreit, J.,Jones, L.,Gomez, A.,Kaiser, L.(2017).Attention is all you need.31st Conference on Neural Information Processing Systems,Long Beach, CA.:
  37. Vinson, D.,Ponari, M.,Vigliocco, G.(2014).How does emotional content affect lexical processing?.Cognition & Emotion,28(4),737-746.
  38. Vinyals, O.,Le, Q.(2015).Neural Conversational Model.31st International Conference on Machine Learning,Lille, France:
  39. Wurzne, K. M.(Ed),Pohl, E.(Ed)(2011).Lexical Resources in Psycholinguistic Research.Potsdam, Germany:Universitätsverlag Potsdam.
  40. Yu, L. C.,Lee L. H.,Hao, S.,Wang, J.,He, Y.,Ju, J.,Lai, K. R.,Zhang, X.(2016).Building Chinese affective resources in valence-arousal dimensions.16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,San Diego, CA.:
  41. Zhou, H.,Huang, M.,Zhang, T.,Zhu, X.,Liu, B.(2018).Emotional chatting machine: emotional conversation generation with internal and external memory.32th AAAI Conference on Artificial Intelligence,New Orlean, LA.:
  42. Zipf, G. K. (1932). Selected studies of the principle of relative frequency in language. Cambridge, MA: Harvard University Press.
  43. 王伯雅, P. Y.(2015)。台北=Taipei, Taiwan,國立台灣大學語言學研究所=National Taiwan University。
  44. 吳小涵, H. H.(2018)。台北=Taipei, Taiwan,國立台灣大學語言學研究所=National Taiwan University。
  45. 呂佩瑜, P. Y.(2015)。台北=Taipei, Taiwan,國立台灣大學語言學研究所=National Taiwan University。
  46. 林欣霓, H. N.(2012)。台北=Taipei, Taiwan,國立台灣師範大學英語學系=National Taiwan Normal University。
  47. 黃居仁, C. R.,謝舒凱, S. K.,洪嘉馡, J. F.,陳韻竹, Y. Z.,蘇依莉, I. L.,陳永祥, Y. X.,黃勝偉(2010)。中文詞彙網路:跨語言知識處理基礎架構的設計理念與實踐。第九屆漢語詞彙語義學研討會議論文集,蘇州,中國=Suzhou, China:
  48. 楊靜琛, C. C.(2015)。台北=Taipei, Taiwan,國立台灣大學語言學研究所=National Taiwan University。
  49. 劉郁文, Y. W.(2017)。台北=Taipei, Taiwan,國立台灣大學語言學研究所=National Taiwan University。
  50. 謝舒凱, S. K.(2019)。科技部專題研究成果報告科技部專題研究成果報告,未出版