英文摘要
|
Humanists often rely on texts in their research. They may want to extract as many terms of specific type as possible from the texts. Term extraction methods are computational algorithms to extract meaningful terms from a large corpus of digitized texts. Term-clips method is a semi-automatic term extraction approach that requires human-computer interaction to extract terms from texts. In this paper, we discuss a new term-extraction tool, called term extractor 2020, based on improvements from the clipper tool developed in 2015. We recall the idea of term-clips method, describe the problems of the old tool in real cases, and discuss how these problems were solved with term extractor 2020. We run an experiment to extract six classes of terms (village names, person names, ship names, date string, person titles, and freight items) from the text 熱蘭遮城日誌 (a Chinese translation version of "De Dagregisters van het Kasteel Zeelandia") vol. 3. The experiment shows that term extractor 2020 can help researchers extract terms (especially terms in Chinese) in an effective and efficient way.
|