英文摘要
|
In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research.
As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.
|
参考文献
|
-
Lyu, Ren-yuan,Min-siong Liang,Yuang-chin Chiang.(2004).Toward constructing a multilingual speech corpus for Taiwanese (Min-nan), Hakka, and Mandarin.Computational Linguistics and Chinese Language Processing,9(2),1-12.
連結:
-
Academia Sinica Balanced Corpus of Modern Chinese
-
British Academic Spoken English (BASE) corpus
-
British National Corpus
-
Brown University Corpus
-
Cambridge International Corpus
-
Chinese Pear Stories
-
Collins Cobuild
-
CORIS/CODIS Corpus
-
Corpus of Spoken Bulgarian
-
Corpus of Spoken Israeli Hebrew
-
Corpus of Spoken Professional American-English
-
Council for Hakka Affairs
-
CRATER Spanish Corpus
-
Formosan Language Archive
-
Hakka Magazine
-
Hakka News Magazine
-
Hakka Taiwanese Special Magazine
-
Hakka Television
-
Helsinki Corpus of English Texts
-
Hong Kong Cantonese Adult Language Corpus
-
International Corpus of English
-
Lancaster Corpus of Mandarin Chinese
-
Lancaster Speech, Writing and Thought Presentation Spoken Corpus
-
Lancaster/Oslo-Bergen Corpus
-
Lancaster-Los Angeles Spoken Chinese Corpus
-
Language Archives Project
-
London-Lund Corpus of Spoken English
-
Mandarin spoken corpora project
-
Michigan Corpus of Academic Spoken English
-
NEGRA Corpus
-
Oslo Corpus of Bosnian Texts
-
Santa Barbara Corpus of Spoken American English
-
Southern Min Archives
-
Spoken Dutch Corpus
-
Spoken Language Corpus of Swedish
-
Survey of California and Other Indian languages
-
Taiwan Languages and Literature Society
-
UCLA Corpus of Written Chinese
-
Wenzhou Spoken Corpus
-
York-Toronto-Helsinki Parsed Corpus of Old English Prose
-
Aboudan, Rima,Geoffrey Beattie.(1996).Cross-cultural similarities in gestures: the deep relationship between gestures and speech which transcends language barriers.Semiotica,111(3-4),269-294.
-
Chafe, Wallace.(1980).The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production.Norwood, NJ:Ablex.
-
Chappell, Hilary.,Hilary Chappell (ed.)(2001).Sinitic Grammar.Oxford:Oxford University Press.
-
Crowdy, Steve.(1993).Spoken corpus design.Literary and Linguistic Computing,8(4),259-265.
-
Hashimoto, Mantaro J.(1973).The Hakka Dialect: A Linguistic Study of its Phonology, Syntax, and Lexicon.Cambridge:Cambridge University Press.
-
Lau, Chunfat.(1999).Criteria for the classification of Chinese dialects and the question of the status of Hakka.Paper presented at the Eighth International Conference on Chinese Languages and Linguistics,Melbourne:
-
Leung, M.-T.,S.-P. Law.(2001).HKCAC: The Hong Kong Cantonese adult language corpus.International Journal of Corpus Linguistics,6,305-325.
-
Luo, Mei-zhen.(1998).The continuity and variation of Hakka language and culture in Taiwan.Proceedings of the Fourth International conference on Hakkaology: Hakka and Modern World,Taipei:
-
Sinclair, J.(1991).Corpus, Concordance, Collocation.Oxford:Oxford University Press.
-
Wang, H. C.,F. Seide,C. Y. Tseng,L. S. Lee.(2000).Mat-2000-design, collection, and validation of a Mandarin 2,000-speaker telephone speech database.Paper presented at the International Conference on Spoken Language Processing 2000,China: Beijing:
-
Wu, Zhong-jie.,Feng-fu Tsao (ed.),Mei-hui Tsai(1995).Hakka subdialects and Hakka teaching.Papers from the 1994 Conference on Language Teaching and Linguistics in Taiwan Vol. II: Hakka,Taipei:
-
Xu, Zhao-quan.(2003).Hakka Dictionary of Taiwan.
|