题名

Testing an Automated Accuracy Assessment Method on Bibliographic Data

并列篇名

書目資料準確性評估自動化之測試研究

DOI

10.6182/jlis.2014.12(2).019

作者

Marlies Olensky

关键词

資料準確性評估 ; 書目資料 ; 書目計量資料來源 ; Data Accuracy Assessment ; Bibliographic Data ; Bibliometric Data Sources ; Web of Science ; Scopus

期刊名称

圖書資訊學刊

卷期/出版年月

12卷2期(2014 / 12 / 01)

页次

19 - 38

内容语文

英文

中文摘要

本研究探討資料品質文獻所提及的自動化資料準確性評估法,以瞭解其用於評估書目資料時的適切性。本研究用來測試的書目資料為兩位諾貝爾化學獎得主10年內之出版品,書目資料檢索自Web of Science與Scopus;在準確性評估上,分別以自動化與人工兩種評估法進行書目資料準確性測試,之後再跟原始出版品比對,以瞭解人工與自動化評估的高下。研究結果顯示,人工評估法的準確性得分較高,自動化評估法還需要納入更多能反映書目資料特質的評估規則,始能提高準確性。在兩組書目資料的測試中,單一分欄資料準確性的評估,都比整體書目記錄評估的表現要好。本研究之貢獻在於增進對書目資料準確度標準評估法的探討,並說明了資料準確性在引文比對過程中的重大影響。

英文摘要

This study investigates automated data accuracy assessment as described in data quality literature for its suitability to assess bibliographic data. The data samples comprise the publications of two Nobel Prize winners in the field of Chemistry for a 10-year-publication period retrieved from the two bibliometric data sources, Web of Science and Scopus. The bibliographic records are assessed against the original publication ("gold standard") and an automatic assessment method is compared to a manual one. The results show that the manual assessment method reflects truer accuracy scores. The automated assessment method would need to be extended by additional rules that reflect specific characteristics of bibliographic data. Both data sources had higher accuracy scores per field than accumulated per record. This study contributes to the research on finding a standardized assessment method of bibliographic data accuracy as well as defining the impact of data accuracy on the citation matching process.

主题分类 人文學 > 圖書資訊學
参考文献
  1. (2013).Oxford English Dictionary: Online Version.
  2. Harzing, A.-W. (2008). Google Scholar - Anew data source for citation analysis. Retrieved from http://www.harzing.com/pop_gs.htm
  3. International Organization for Standardization (ISO). (2005). ISO 9000:2005: Quality management systems-Fundamentals and vocabulary. Geneva, Switzerland: International Organization for Standardization..
  4. Archambault, É.,Campbell, D.,Gingras, Y.,Larivière, V.(2009).Comparing bibliometric statistics obtained from the Web of Science and Scopus.Journal of the American Society for Information Science and Technology,60(7),1320-1326.
  5. Batini, C.,Cabitza, F.,Cappiello, C.,Francalanci, C.(2008).A comprehensive data quality methodology for web and structured data.International Journal of Innovative Computing and Applications,1(3),205-218.
  6. Batini, C.,Cappiello, C.,Francalanci, C.,Maurino, A.(2009).Methodologies for data quality assessment and improvement.ACM Computing Surveys,41(3),16:1-16:52.
  7. Batini, C.,Scannapieco, M.(2006).Data quality: Concepts, methodologies and techniques.Berlin, Germany:Springer.
  8. Bovee, M.,Srivastava, R. P.,Mak, B.(2003).A conceptual framework and belief-function approach to assessing overall information quality.International Journal of Intelligent Systems,18(1),51-74.
  9. Buchanan, R. A.(2006).Accuracy of cited references: The role of citation databases.College & Research Libraries,67(4),292-303.
  10. Even, A.,Shankaranarayanan, G.(2007).Utility-driven assessment of data quality.SIGMIS Database,38(2),75-93.
  11. García-Pérez, M. A.(2010).Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h-indices in psychology.Journal of the American Society for Information Science and Technology,61(10),2070-2085.
  12. Gingras, Y.,Wallace, M.(2010).Why it has become more difficult to predict Nobel Prize winners: A bibliometric analysis of nominees and winners of the chemistry and physics prizes (1901-2007).Scientometrics,82(2),401-412.
  13. Hildebrandt, A. L.,Larsen, B.(2008).Reference and citation errors: A study of three law journals.13th Nordic Workshop on Bibliometrics and Research Policy,Tampere, Finland:
  14. Hood, W. W.,Wilson, C. S.(2003).Informetric studies using databases: Opportunities and challenges.Scientometrics,58(3),587-608.
  15. Jacsó, P.(2008).The plausibility of computing the h-index of scholarly productivity and impact using reference-enhanced databases.Online Information Review,32(2),266-283.
  16. Jarke, M.,Lenzerini, M.,Vassiliou, Y.,Vassiliadis, P.(2003).Fundamentals of data warehouses.Berlin, Germany:Springer.
  17. Larsen, B.,Hytteballe Ibanez, K.,Bolling, P.(2007).Error rates and error types for the Web of Science algorithm for automatic identification of citations.12th Nordic Workshop on Bibliometrics and Research Policy,Copenhagen, Denmark:
  18. Lee, Y. W.,Pipino, L. L.,Funk, J. D.,Wang, R. Y.(2006).Journey to data quality.Cambridge, MA:MIT Press.
  19. Lee, Y. W.,Strong, D. M.,Kahn, B. K.,Wang, R. Y.(2002).AIMQ: A methodology for information quality assessment.Information & Management,40(2),133-146.
  20. Levenshtein, V. I.(1966).Binary codes capable of correcting deletions, insertions and reversals.Soviet Physics Doklady,10,707-710.
  21. Loshin, D.(2001).Enterprise knowledge management: The data quality approach.San Diego, CA:Morgan Kaufmann.
  22. Maydanchik, A.(2007).Data quality assessment.Bradley Beach, NJ:Technics Publications.
  23. Meho, L. I.,Yang, K.(2007).Impact of data sources on citation counts and rankings of LIS faculty: Web of Science vs. Scopus and Google Scholar.Journal of the American Society for Information Science and Technology,58(13),2105-2125.
  24. Moed, H. F.(2005).Citation analysis in research evaluation.Dordrecht, Netherlands:Springer.
  25. Moed, H. F.,Vriens, M.(1989).Possible inaccuracies occurring in citation analysis.Journal of Information Science,15(2),95-107.
  26. Naumann, F.(2002).Quality-driven query answering for integrated information systems.Berlin, Germany:Springer.
  27. Neuhaus, C.,Daniel, H.-D.(2008).Data sources for performing citation analysis: An overview.Journal of Documentation,64(2),193-210.
  28. Olensky, M.(2012).How is bibliographic data accuracy assessed?.Proceedings of the 17th International Conference on Science and Technology Indicators,Montreal, Canada:
  29. Olensky, M.(2014).Germany,Berlin School of Library and Information Science, Humboldt-Universitat zu Berlin.
  30. Pipino, L. L.,Lee, Y. W.,Wang, R. Y.(2002).Data quality assessment.Communications of the ACM,45(4),211-218.
  31. Redman, T. C.(1996).Data quality for the information age.Boston, MA:Artech House.
  32. Scannapieco, M.,Virgillito, A.,Marchetti, C.,Mecella, M.,Baldoni, R.(2004).The DaQuinCIS architecture: A platform for exchanging and improving data quality in cooperative information systems.Information Systems,29(7),551-582.
  33. Schmidt, M.(2012).Development and evaluation of a match key for linking references to cited articles.Proceedings of the 17th International Conference on Science and Technology Indicators,Montreal, Canada:
  34. Su, Y.,Jin, Z.(2004).A methodology for information quality assessment in the designing and manufacturing processes of mechanical products.Proceedings of the Ninth International Conference on Information Quality (ICIQ-04),Cambridge, MA:
  35. Tunger, D.,Haustein, S.,Ruppert, L.,Luca, G.,Unterhalt, S.(2010)."The Delphic Oracle": An analysis of potential error sources in bibliographic databases.Proceedings of the 11th International Conference on Science and Technology Indicators,Leiden, Netherlands:
  36. Wallin, J. A.(2005).Bibliometric methods: Pitfalls and possibilities.Basic & Clinical Pharmacology & Toxicology,97(5),261-275.
  37. Wand, Y.,Wang, R. Y.(1996).Anchoring data quality dimensions in ontological foundations.Communications of the ACM,39(11),86-95.
  38. Wang, R. Y.(1998).A product perspective on total data quality management.Communications of the ACM,41(2),58-65.
  39. Wang, R. Y.,Strong, D. M.(1996).Beyond accuracy: What data quality means to data consumers.Journal of Management Information Systems,12(4),5-33.
  40. Winkler, W. E.(1995).Matching and record linkage.Business Survey Methods,New York, NY: