题名

Error Classification of Machine Translation A Corpus-based Study on Chinese-English Patent Translation

作者

Jiuan-An Hsu

关键词

Machine translation ; manual error analysis of MT output ; error classification of Chinese-English MT output

期刊名称

翻譯學研究集刊

卷期/出版年月

18輯(2014 / 12 / 01)

页次

121 - 136

内容语文

英文

英文摘要

While machine translation (MT) systems have been widely applied to translation tasks, the quality of the text outputs often remains unsatisfactory. The demand for better output quality prompts researchers to focus on finding effective ways to evaluate the quality of MT outputs. One popular approach is human error analysis, which is the manual identification and classification of errors made by MT systems. Although there have been many studies examining common error types in MT, none has been found to be targeting the distant language pair of Chinese and English. This study looks into errors in Chinese-English MT of patent abstracts, as such a distant language pair may result in very different error types. In the first level of the hierarchical classification scheme used in this study, errors are split into five major categories: orthographic, morphological, lexical, semantic, and syntactic errors. Each main category is further divided into several subcategories. Thirty-four MT outputs were manually corrected and annotated to identify the distribution of translation errors. The findings suggest that certain features of the Chinese language, such as low occurrences of articles and relatively unclear sentence and phrase structures, do severely affect the performance of the MT system studied. These findings have implications for MT system developers and post-editors.

主题分类 人文學 > 語言學
参考文献
  1. (1994).Reversible Grammar in Natural Language Processing.Springer.
  2. WIPO. (2014). International Patent Classification (IPC). Retrieved April 28, 2014, from http://www.wipo.int/classifications/ipc/en/
  3. Llitjós, A. F., & Carbonell, J. G. (2004). The translation correction tool: English-Spanish user studies..
  4. WIPO. (2014). International Patent Classification (IPC) Official Publication. 2014.1. Retrieved April 8, 2014, from http://web2.wipo.int/ipcpub/#refresh=page
  5. Alam, Y. S.(2013).Manual Evaluation and Error Analysis of Machine Translation Output between a Distant Language Pair Focusing on Effects of Sentence Length.Tenth Symposium on Natural Language Processing (SNLP-2013),Phuket, Thailand:
  6. Arnold, D.,Balkan, L.,Humphreys, R. L.,Meijer, S.,Sadler, L.(1994).Machine translation: an introductory guide.London:NCC Blackwell.
  7. Banerjee, S.,Lavie, A.(2005).METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.Paper presented at the Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization
  8. Bennett, P. A.(1981).The evolution of passive and disposal sentences.Journal of Chinese linguistics,9(1),61-90.
  9. Birch, A.,Osborne, M.,Blunsom, P.(2010).Metrics for MT evaluation: evaluating reordering.Machine Translation,24(1),15-26.
  10. Costa-Jussà, M. R.,Farrús, M.(2014).Statistical machine translation enhancements through linguistic levels: A survey.ACM Computing Surveys (CSUR),46(3),42.
  11. Elming, J.,Habash, N.(2009).Syntactic reordering for English-Arabic phrase-based machine translation.EACL 2009 Workshop on Computational Approaches to Semitic Languages
  12. Farrús Cabeceran, M.,Ruiz Costa-Jussà, M.,Mariño Acebal, J. B.,Rodríguez Fonollosa, J. A.(2010).Linguistic-based evaluation criteria to identify statistical machine translation errors.14th Annual Conference of the European Association for Machine Translation, Saint-Raphaël,Saint-Raphaël:
  13. Flanagan, M.(1994).Error classification for MT evaluation.Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas
  14. Font-Llitjós, A.,Carbonell, J. G.,Lavie, A.(2005).A framework for interactive and automatic refinement of transfer-based machine translation.European association of machine translation(EAMT) 10th annual conference
  15. Jurafsky, D.,Martin, J. H.,Horton, M.(Ed.)(2000).Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.New Jersey, USA:Pearson Higher Education.
  16. Koerner, E. F. K.(Ed.),Asher, R. E.(Ed.)(1995).Concise History of the Language Sciences: From the Sumerians to the Cognitivists.Oxford:Pergamon Press.
  17. Mestre, E. M. M.,Pastor, M. L. C.,de Vera, C.(2012).A pragmatic analysis of errors in University students' writings in English.English for Specific Purposes World,12(35)
  18. Mey, J. L.(1993).Pragmatics: An Introduction.Oxford:Blackwell.
  19. Papineni, K.,Roukos, S.,Ward, T.,Zhu, W.-J.(2002).BLEU: a method for automatic evaluation of machine translation.40th annual meeting on association for computational linguistics
  20. Popović, M.,Burchardt, A.(2011).From human to automatic error classification for machine translation output.EAMT 11: 15th International Conference of the European Association for Machine Translation, Leuven, Belgium,Leuven, Belgium:
  21. Popović, M.,Ney, H.(2011).Towards automatic error analysis of machine translation output.Computational Linguistics,37(4),657-688.
  22. Reithinger, N.,Engel, R.,Kipp, M.,Klesen, M.(1996).Predicting dialogue acts for a speech-to-speech translation system.ICSLP 96: Fourth International Conference on Spoken Language, Philadelphia, PA, USA,Philadelphia, PA, USA:
  23. Snover, M. G.,Madnani, N.,Dorr, B.,Schwartz, R.(2009).TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate.MachineTranslation,23(2-3),117-127.
  24. Snover, M.,Dorr, B.,Schwartz, R.,Micciulla, L.,Makhoul, J.(2006).A study of translation edit rate with targeted human annotation.Proceedings of association for machine translation in the Americas
  25. Stymne, S.(2011).Blast: A tool for error analysis of machine translation output.Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations, Portland, Oregon, USA,Portland, Oregon, USA:
  26. Sun, C.(1996).Word-order Change and Grammaticalization in the History of Chinese.California:Stanford University Press.
  27. Trujillo, A.(1999).Translation Engines: Techniques for Machine Translation: Techniques for Machine Translations.London:Springer.
  28. Vilar, D.,Xu, J.,d'Haro, L. F.,Ney, H.(2006).Error analysis of statistical machine translation output.LREC-2006: Fifth International Conference on Language Resources and Evaluation, Genoa, Italy,Genoa, Italy:
  29. Zeman, D.,Fishel, M.,Berka, J.,Bojar, O.(2011).Addicter: What Is Wrong with My Translations?.The Prague Bulletin of Mathematical Linguistics,96(1),79-88.