Wordlist serves as a reference for second language teaching, and also guides second language learners to evaluate what words they need to acquire. The TOCFL wordlist is one of the common learning materials for learners preparing for Chinese proficiency test. However, words included in the TOCFL wordlist were largely selected from a written corpus, whereas words extracted from a spoken corpus were limited. Because written and spoken corpora are presumably different, it is necessary to include words in both registers and to emphasize the differences while teaching. To balance the proportions of written and spoken words in the TOCFL wordlists, this study first established a native spoken corpus by extracting subtitles from Mandarin movies and TV series, and then compiled a list of high-frequency spoken words as an amendment to the TOCFL wordlist. Comparison between this spoken wordlist with the TOCFL wordlist showed that the most frequently used 713 words in the corpus were not covered in the TOCFL wordlist. We then suggested a list of the top 238 high-frequency words to be included in the TOCFL wordlist. The 713 high-frequency spoken words were further classified into six groups based on their features, and some key findings were summarized as follows: (1) the majority of the items are word chunks, (2) the spoken words are characterized as multi-syllable words, and (3) there are large numbers of word combinations of bu and mei in the list. We hope that the provision of this commonly used spoken wordlist can increase the proportion of spoken words in the TOCFL wordlist, which can offer learners more authentic materials to meet their oral communication needs.
Chen, Keh-Jiann,Bai, Ming-Hong(1998).Unknown Word Detection for Chinese by a Corpus-based Learning Method.International Journal of Computational linguistics and Chinese Language Processing,3(1),27-44.
(2004).Vocabulary in a Second Language: Selection, Acquisition, and Testing.
(1983).Language and communication.
Berber-Sardinha, T.(2000).Comparing corpora with WordSmith Tools: How large must the reference corpus be?.Proceedings of the workshop on Comparing corpora-Volume 9
Biber, Douglas(1988).Variation Across Speech and Writing.Cambridge:Cambridge University Press.
Biber, Douglas,Conrad, Susan,Cortes, Viviana(2004).If you look at...: Lexical bundles in university teaching and textbooks.Applied linguistics,25(3),371-405.
Biber, Douglas,Finegan, Edward(1991).On the Exploitation of Computerized Corpora in Variation Studies.English Corpus Linguistics,London:
Carter, Ronald(2004).Language and Creativity: The Art of Common Talk.London:Routledge.
Carter, Ronald,McCarthy, Michael,Hughes, Rebecca(2002).Exploring Grammar in Context:Upper-intermediate and Advanced.Ernst Klett Sprachen.
Danescu-Niculescu-Mizil, Cristian,Lee, Lillian(2011).Chameleons in Imagined Conversations: A new Approach to Understanding Coordination of Linguistic Style in Dialogs.Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Davies, Mark,Gardner, Dee(2013).A Frequency Dictionary of American English: Word Sketches, Collocates and Thematic Lists.Routledge.
Greenbaum, Sidney,Svartvik, Jan(1990).The London-Lund Corpus of Spoken English.Lund University Press.
Halliday, Michael. A. K.(1989).Spoken and Written Language.Oxford University Press.
Halliday, Michael. A. K.(ed.),Gibbons, John(ed.),Nicholas, Howard(ed.)(1990).Learning, Keeping and Using Language: Selected papers from the Eighth World Congress of Applied Linguistics, Sydney
Kilgarriff, Adam(1997).I don't believe in word senses.Computers and the Humanities,31(2),91-113.
Kilgarriff, Adam,Pavel, Rychly,Pavel, Smrz,David, Tugwell(2004).The Sketch Engine.Proc. EURALEX,Lorient. France:
Krishnamurthy, Ramesh(2003).Language as Chunks, Not Words.JALT2003 Proceedings,Tokyo. Japan:
McCarthy, Michael(2004).Touchstone: From corpus to course book.Cambridge University Press.
McCarthy, Michael,Carter, Ronald(2001).Size isn't everything: Spoken English, Corpus, and the Classroom.Tesol Quarterly,35(2),337-340.
McCarthy, Michael,O'Dell, Felicity(2003).English Vocabulary in Use. Advanced.Cambridge:Cambridge University Press.
McCarthy, Michael,O'Dell, Felicity(2003).English Idioms in Use. Intermediate to Upper-intermediate. with Answers.Cambridge:Cambridge University Press.
Nation, I. S. P.(2001).Learning Vocabulary in Another Language.Cambridge University Press.
Nation, Paul,Waring, Robert(1997).Vocabulary Size, Text Coverage and Word Lists.Vocabular: Descri ption, Acquisition, Pedagogy,14,6-19.
Nattinger, James R.,DeCarrico, Jeanette. S.(1992).Lexical Phrases and Language Teaching.Oxford University Press.
Phillips, Martin(1989).Lexical Structure of Text.English Language Research.
Saussure, Ferdinand de,Bally, Charles(ed.),Sechehaye, Albert(ed.),Riedlinger, Albert(ed.),Baskin, Wade(Trans.)(1959).Course in General Linguistics.New York:Philosophical Library.
Shirato, J.(2005).A Corpus-based Analysis of Basic Spoken Vocabulary in EFL Textbook Conversations.Hokusei Gakuen University Graduate School Literature Review,2,15-31.
Willis, Dave(1990).The Lexical Syllabus: A New Approach to the Language Teaching.Collins ELT.
Wray, Alison(2000).Formulaic Sequences in Second Language Teaching: Principle and Practice.Applied linguistics,21(4),463-489.
Wray, Alison(2002).The transition to language.
尹惠貞(2006)。北京語言大學=Beijing Language and Culture University。
王希杰(1991)。語言學百題。上海=Shanghai:上海教育出版社=Shanghai Educational Press。
王芳智編(1990)。漢語口語學。山西教育出版社=Shanxi Educational Pub. Co.。
朱慶明(2005)。現代漢語實用語法分析。北京=Beijing:清華大學出版社=Qing Hua University Press。
周祖謨(1959)。漢語詞彙講話。人民教育出版社=People's Education Press。
徐立人(2011)。中央大學網路學習科技研究所=Graduate Institute of Network Learning Technology, National Central University。
高名凱、石安石(2002)。語言學概論。北京=Beijing:中華書局=ZhongHua Book Company。
曹合建編(2008)。基於語料庫的商務英語研究。北京=Beijing:對外經濟貿易大學出版社=University of International Business and Economics Press。
葉蜚聲、徐通鏘(1993)。語言學綱要。臺北市=Taipei:書林=Bookman Bookstore。
劉玫芳(2013)。國立高雄師範大學=National Kaohsiung Normal University。