题名

The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min

DOI

10.6519/TJL.2008.6(2).5

作者

徐嘉慧(Kawai Chui);賴惠玲(Huei-Ling Lai)

关键词
期刊名称

Taiwan Journal of Linguistics

卷期/出版年月

6卷2期(2008 / 12 / 01)

页次

119 - 144

内容语文

英文

英文摘要

In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research. As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.

主题分类 人文學 > 語言學
参考文献
  1. Lyu, Ren-yuan,Min-siong Liang,Yuang-chin Chiang.(2004).Toward constructing a multilingual speech corpus for Taiwanese (Min-nan), Hakka, and Mandarin.Computational Linguistics and Chinese Language Processing,9(2),1-12.
    連結:
  2. Academia Sinica Balanced Corpus of Modern Chinese
  3. British Academic Spoken English (BASE) corpus
  4. British National Corpus
  5. Brown University Corpus
  6. Cambridge International Corpus
  7. Chinese Pear Stories
  8. Collins Cobuild
  9. CORIS/CODIS Corpus
  10. Corpus of Spoken Bulgarian
  11. Corpus of Spoken Israeli Hebrew
  12. Corpus of Spoken Professional American-English
  13. Council for Hakka Affairs
  14. CRATER Spanish Corpus
  15. Formosan Language Archive
  16. Hakka Magazine
  17. Hakka News Magazine
  18. Hakka Taiwanese Special Magazine
  19. Hakka Television
  20. Helsinki Corpus of English Texts
  21. Hong Kong Cantonese Adult Language Corpus
  22. International Corpus of English
  23. Lancaster Corpus of Mandarin Chinese
  24. Lancaster Speech, Writing and Thought Presentation Spoken Corpus
  25. Lancaster/Oslo-Bergen Corpus
  26. Lancaster-Los Angeles Spoken Chinese Corpus
  27. Language Archives Project
  28. London-Lund Corpus of Spoken English
  29. Mandarin spoken corpora project
  30. Michigan Corpus of Academic Spoken English
  31. NEGRA Corpus
  32. Oslo Corpus of Bosnian Texts
  33. Santa Barbara Corpus of Spoken American English
  34. Southern Min Archives
  35. Spoken Dutch Corpus
  36. Spoken Language Corpus of Swedish
  37. Survey of California and Other Indian languages
  38. Taiwan Languages and Literature Society
  39. UCLA Corpus of Written Chinese
  40. Wenzhou Spoken Corpus
  41. York-Toronto-Helsinki Parsed Corpus of Old English Prose
  42. Aboudan, Rima,Geoffrey Beattie.(1996).Cross-cultural similarities in gestures: the deep relationship between gestures and speech which transcends language barriers.Semiotica,111(3-4),269-294.
  43. Chafe, Wallace.(1980).The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production.Norwood, NJ:Ablex.
  44. Chappell, Hilary.,Hilary Chappell (ed.)(2001).Sinitic Grammar.Oxford:Oxford University Press.
  45. Crowdy, Steve.(1993).Spoken corpus design.Literary and Linguistic Computing,8(4),259-265.
  46. Hashimoto, Mantaro J.(1973).The Hakka Dialect: A Linguistic Study of its Phonology, Syntax, and Lexicon.Cambridge:Cambridge University Press.
  47. Lau, Chunfat.(1999).Criteria for the classification of Chinese dialects and the question of the status of Hakka.Paper presented at the Eighth International Conference on Chinese Languages and Linguistics,Melbourne:
  48. Leung, M.-T.,S.-P. Law.(2001).HKCAC: The Hong Kong Cantonese adult language corpus.International Journal of Corpus Linguistics,6,305-325.
  49. Luo, Mei-zhen.(1998).The continuity and variation of Hakka language and culture in Taiwan.Proceedings of the Fourth International conference on Hakkaology: Hakka and Modern World,Taipei:
  50. Sinclair, J.(1991).Corpus, Concordance, Collocation.Oxford:Oxford University Press.
  51. Wang, H. C.,F. Seide,C. Y. Tseng,L. S. Lee.(2000).Mat-2000-design, collection, and validation of a Mandarin 2,000-speaker telephone speech database.Paper presented at the International Conference on Spoken Language Processing 2000,China: Beijing:
  52. Wu, Zhong-jie.,Feng-fu Tsao (ed.),Mei-hui Tsai(1995).Hakka subdialects and Hakka teaching.Papers from the 1994 Conference on Language Teaching and Linguistics in Taiwan Vol. II: Hakka,Taipei:
  53. Xu, Zhao-quan.(2003).Hakka Dictionary of Taiwan.
被引用次数
  1. Chui, Kawai(2018).Directionality of Change: Grammatical Variation and Do-Constructions in Taiwan Mandarin.同心圓:語言學研究,44(1),65-88.
  2. Lai, Huei-ling,Chung, Siaw-Fong(2018).COLOR POLYSEMY: BLACK AND WHITE IN TAIWANESE LANGUAGES.Taiwan Journal of Linguistics,16(1),95-130.
  3. Tsay, Jane S.,Ruan, Jia-Cing,Myers, James,Hsu, Chiung-Wen(2012).Development and Testing of Transcription Software for a Southern Min Spoken Corpus.中文計算語言學期刊,17(1),1-26.
  4. 賴惠玲,劉吉軒,葉秋杏(2021)。臺灣客語語料庫建置與客語詞彙使用初探。數位典藏與數位人文,8,75-131。
  5. 劉晉廷(2022)。Revisiting Phonetically Incomplete Tone Three Sandhi in Mandarin Chinese: Insights froma Revised Wug Test。臺中教育大學學報:人文藝術類,36(1),1-24。
  6. 謝承諭(2020).MEANING IN REPAIR: THE ABSTRACT NOUN YISI 'MEANING/INTENTION' IN THE MANAGEMENT OF INTERSUBJECTIVITY IN MANDARIN CONVERSATION.Taiwan Journal of Linguistics,18(2),39-88.
  7. (2011)。「(一)整個」程度副詞新興用法的語法化初探。朝陽人文社會學刊,9(1),141-158。
  8. (2022).Degree adverbs in spoken Mandarin A behavioral profile corpus-based approach to language alternatives.Concentric: Studies in Linguistics,48(2),285-322.