题名

結構資料的再次使用:語意、連結與實作

并列篇名

Reuse of Structured Data: Semantics, Linkage, and Realization

DOI

10.6245/JLIS.2017.431/722

作者

黃韋菁(Andrea Wei-Ching Huang);李承錱(Cheng-Jen Lee);莊庭瑞(Tyng-Ruey Chuang)

关键词

CKAN ; 資料溯源 ; 資料品質 ; 知識庫 ; 開放資料連結(LOD) ; 知識本體 ; 語意再現 ; CKAN ; Data Provenance ; Data Quality ; Knowledge Base ; Linked Open Data (LOD) ; Ontology ; Semantic Representation

期刊名称

圖書館學與資訊科學

卷期/出版年月

43卷1期(2017 / 04 / 01)

页次

7 - 46

内容语文

繁體中文;英文

中文摘要

持續創造資料的語意與連結,藉由全球資訊網散布同時可由常人和機器處理並理解的結構性資料,進而增進資料集的「再次使用價值」(reuse value)是目前廣受重視的課題,也是本研究由理論探討邁向系統實作的動力與目的。本文簡述與「開放資料連結」(Linked Open Data, LOD)相關國際計畫與技術發展,介紹以「開放資料連結」方式建置的五項跨領域知識庫和七項專業知識庫,並解析資料品質、後設資料(Metadata)及資料溯源(Provenance)的關聯脈絡。本研究同時進行實作網站data.odw.tw,收納典藏品目錄資料,並設計知識本體(voc4odw)轉換半結構式資料為富語意結構的連結式資料。一方面擴充CKAN(The Comprehensive Knowledge Archive Network)資料集管理系統,作為連結式資料的儲存與展示平台,進而強調從原始目錄資料到語意連結資料的分段轉換步驟,最後將各步驟轉換程式以及CKAN 軟體程式碼以「開放原始碼」(Open Source)方式釋出。另一方面,由於研究資料來源採「創用CC」(Creative Commons)公眾授權,因此研究成果亦以相同方式釋出,在開放基礎上促使資料與程式碼的保存與發展,可被自由再次使用與擴散。

英文摘要

In order to increase the reuse value of existing datasets, it is now becoming a general practice to add semantic links among the records in a dataset, and to link these records to external resources. The enriched datasets are published on the web for both human and machine to consume and re‐purpose. In this paper, we make use of publicly available structured records from a digital archive catalogue, and we demonstrate a principled approach to converting the records into semantically rich and interlinked resources for all to reuse. While exploring the various issues involved in the process of reusing and re‐purposing existing datasets, we review the recent progress in the field of Linked Open Data (LOD), and examine twelve well‐known knowledge bases built with a Linked Data approach. We also discuss the general issues of data quality, metadata vocabularies, and data provenance. The concrete outcome of this research work is the following: (1) a website data.odw.tw that hosts more than 840,000 semantically enriched catalogue records across multiple subject areas, (2) a lightweight ontology voc4odw for describing data reuse and provenance, among others, and (3) a set of open source software tools available to all to perform the kind of data conversion and enrichment we did in this research. We have used and extended CKAN (The Comprehensive Knowledge Archive Network) as a platform to host and publish Linked Data. Our extensions to CKAN is open sourced as well. As the records we drawn from the originally catalogue are released under the Creative Commons licenses, the semantically enriched resources we now re‐publish on the Web are free for all to reuse as well.

主题分类 人文學 > 圖書資訊學
参考文献
  1. (2013).Handbook of data quality.Berlin, Heidelberg:Springer.
  2. Godby, C. J. (2016). Seeding the linked data cloud: The present and future of library identifiers. Days of Knowledge organization, Oslo and Akershus University. Retrieved from http://edu.hioa.no/korg2016/korg2016_godby.pdf
  3. (2016).Building Trust in Information.Springer International Publishing.
  4. Auer, S.,Bizer, C.,Kobilarov, G.,Lehmann, J.,Cyganiak, R.,Ives, Z.(2007).DBpedia: A nucleus for a web of open data.The Semantic Web,Berlin, Heidelberg:
  5. Baca, M.,Gill, M.(2015).Encoding multilingual knowledge systems in the digital age: The getty vocabularies.Knowledge Organization,42(4),232-243.
  6. Batini, C.,Cappiello, C.,Francalanci, C.,Maurino, A.(2009).Methodologies for data quality assessment and improvement.ACM Computing Surveys (CSUR),41(3),16.
  7. Bizer, C.,Lehmann, J.,Kobilarov, G.,Auer, S.,Becker, C.,Cyganiak, R.,Hellmann, S.(2009).DBpedia-A crystallization point for the web of data.Web Semantics: Science, Services and Agents on the World Wide Web,7(3),154-165.
  8. Bollacker, K.,Evans, C.,Paritosh, P.,Sturge, T.,Taylor, J.(2008).Freebase: A collaboratively created graph database for structuring human knowledge.Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data
  9. Carata, L.,Akoush, S.,Balakrishnan, N.,Bytheway, T.,Sohan, R.,Seltzer, M.,Hopper, A.(2014).A primer on provenance.Communications of the ACM,57(5),52-60.
  10. Charles, V.(2016).Linked data for Europeana cultural heritage: The Europeana approach.Bibliographic metadata getting linked...,Paris:
  11. Charles, V.,Manguinhas, H.,Alexiev, V.,Charles, V.,Dammers, M.(2015).Wikidata, a Target for Europeana's Semantic Strategy.Glam-Wiki 2015
  12. Chuttur, M. Y.(2014).Investigating the effect of definitions and best practice guidelines on errors in Dublin Core metadata records.Journal of Information Science,40(1),28-37.
  13. De Sabbata, S.,Acheson, E.(2016).Geographies of gazetteers in Great Britain.24th GIS Research UK (GISRUK 2016) conference
  14. Dextre Clarke, S. G.(2016).Origins and trajectory of the long thesaurus debate.Knowledge Organization,43(3),138-144.
  15. Emani, C. K.,Cullot, N.,Nicolle, C.(2015).Understandable big data: A survey.Computer Science Review,17,70-81.
  16. Ermilov, I.,Pellegrini, T.(2015).Data licensing on the cloud: Empirical insights and implications for linked data.Proceedings of the 11th International Conference on Semantic Systems
  17. Erxleben, F.,Günther, M.,Krötzsch, M.,Mendez, J.,Vrandečić, D.(2014).Introducing Wikidata to the linked data web.International Semantic Web Conference (ISWC)
  18. Färber, M.,Bartscherer, F.,Menne, C.,Rettinger, A.(2016).Linked data quality of DBpedia, freebase, OpenCyc, Wikidata, and YAGO.Semantic Web,0(0),1-53.
  19. Ford, H.,Graham, M..Provenance, power and place: Linked data and opaque digital geographies.Environment and Planning D: Society and Space
  20. Goodwin, J.,Dolbear, C.,Hart, G.(2008).Geographical linked data: The administrative geography of Great Britain on the semantic web.Transactions in GIS,12(s1),19-30.
  21. Hallo, M.,Luján-Mora, S.,Maté, A.,Trujillo, J.(2016).Current state of linked data in digital libraries.Journal of Information Science,42,117-127.
  22. Haslhofer, B.,Isaac, A.(2011).data. europeana. eu: The europeana Linked Open Data pilot.International Conference on Dublin Core and Metadata Applications
  23. Hoffart, J.,Suchanek, F. M.,Berberich, K.,Lewis-Kelham, E.,De Melo, G.,Weikum, G.(2011).YAGO2: exploring and querying world knowledge in time, space, context, and many languages.Proceedings of the 20th international conference companion on World Wide Web
  24. Hoffart, J.,Suchanek, F. M.,Berberich, K.,Weikum, G.(2013).YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia.Artificial Intelligence,194,28-61.
  25. Huang, A. W. C.,Chuang, T. R.(2014).Relations for Reusing (R4R) in a Shared Context: An Exploration on Research Publications and Cultural Objects.Proceedings of the 4th International Workshop on Semantic Digital Archives (SDA)@ JCDL/TPDL,London, UK.:
  26. Ismayilov, A.,Kontokostas, D.,Auer, S.,Lehmann, J.,Hellmann, S.(2016).Wikidata through the eyes of Dbpedia.Semantic Web
  27. Janssen, M.,Charalabidis, Y.,Zuiderwijk, A.(2012).Benefits, adoption barriers and myths of open data and open government.Information Systems Management,29(4),258-268.
  28. Knoblock, C. A.,Szekely, P. A.(2015).Exploiting semantics for big data integration.AI Magazine,36(1),25-38.
  29. Lee, C.J.,Huang, A.W.C.,Chuang, T.R.(2017).Metadata as Linked Data for Research Data Repositories.International Symposium on Grids and Clouds (ISGC) 2017
  30. Lee, C.J.,Huang, A.W.C.,Chuang, T.R.(2016).A linked open data repository built with CKAN.CKANCon 2016,Madrid, Spain:
  31. Lehmann, J.,Isele, R.,Jakob, M.,Jentzsch, A.,Kontokostas, D.,Mendes, P. N.,Bizer, C.(2015).DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia.Semantic Web,6(2),167-195.
  32. Mahdisoltani, F.,Biega, J.,Suchanek, F.(2015).Yago3: A knowledge base from multilingual wikipedias.7th Biennial Conference on Innovative Data Systems Research, CIDR Conference,Asilomar, CA.:
  33. Marden, J.,Li-Madeo, C.,Whysel, N.,Edelstein, J.(2013).Linked open data for cultural heritage: evolution of an information technology.Proceedings of the 31st ACM international conference on Design of communication
  34. Meroño-Peñuela, A.,Ashkpour, A.,Van Erp, M.,Mandemakers, K.,Breure, L.,Scharnhorst, A.,Van Harmelen, F.(2014).Semantic technologies for historical research: A survey.Semantic Web,6(6),539-564.
  35. Mitchell, E. T.(2016).The current state of linked data in libraries, archives, and museums.Library Technology Reports,52(1),5-13.
  36. Moura, T. H.,Davis, C. A., Jr(2014).Integration of linked data sources for gazetteer expansion.Proceedings of the 8th Workshop on Geographic Information Retrieval
  37. Omitola, T.,Gibbins, N.,Shadbolt, N.(2010).Provenance in linked data integration.Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly (LDFI-2010),Ghent, Belgium:
  38. Park, J. R.,Childress, E.(2009).Dublin Core metadata semantics: An analysis of the perspectives of information professionals.Journal of Information Science,35(6),727-739.
  39. Parr, C. S.,Schulz, K. S.,Hammock, J.,Wilson, N.,Leary, P.,Rice, J.,Corrigan, R. J., Jr(2016).TraitBank: Practical semantics for organism attribute data.Semantic Web,7(6),577-588.
  40. Parr, C. S.,Wilson, N.,Leary, P.,Schulz, K.,Lans, K.,Walley, L.,Holmes, J.(2014).The encyclopedia of Life v2: Providing global access to knowledge about life on earth.Biodiversity Data Journal,2,e1079.
  41. Poole, A. H.(2016).The conceptual landscape of digital curation.Journal of Documentation,72(5),961-986.
  42. Schaible, J.,Gottron, T.,Scherp, A.(2014).Survey on common strategies of vocabulary reuse in linked open data modeling.European Semantic Web Conference
  43. Schmachtenberg, M.,Bizer, C.,Paulheim, H.(2014).Adoption of the linked data best practices in different topical domains.International Semantic Web Conference
  44. Srinivasan, R.,Becvar, K.,Boast, R.,Enote, J.(2010).Diverse knowledges and contact zones within the digital museum.Science, Technology, & Human Values,35(5),735-768.
  45. Stadler, C.,Lehmann, J.,Höffner, K.,Auer, S.(2012).Linkedgeodata: A core for a web of spatial open data.Semantic Web,3(4),333-354.
  46. Stvilia, B.,Gasser, L.,Twidale, M. B.,Smith, L. C.(2007).A framework for information quality assessment.Journal of the American Society for Information Science and Technology,58(12),1720-1733.
  47. Suchanek, F. M.,Kasneci, G.,Weikum, G.(2008).Yago: A large ontology from Wikipedia and Wordnet.Web Semantics: Science, Services and Agents on the World Wide Web,6(3),203-217.
  48. Suchanek, F. M.,Kasneci, G.,Weikum, G.(2007).Yago: A core of semantic knowledge.Proceedings of the 16th International Conference on World Wide Web,New York, NY:
  49. Tani, A.,Candela, L.,Castelli, D.(2013).Dealing with metadata quality: The legacy of digital library efforts.Information Processing & Management,49(6),1194-1205.
  50. Van Hooland, S.,Verborgh, R.(2014).Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata.London:Facet Publishing.
  51. Vandenbussche, P. Y.,Atemezing, G. A.,Poveda-Villalón, M.,Vatant, B.(2015).Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web.Semantic Web,8(3),437-452.
  52. Voß, J.(2016).Classification of Knowledge Organization Systems with Wikidata.Proceedings of the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016),Hannover, Germany:
  53. Vrandečić, D.,Krötzsch, M.(2014).Wikidata: A free collaborative knowledgebase.Communications of the ACM,57(10),78-85.
  54. Yasser, C. M.(2011).An analysis of problems in metadata records.Journal of Library Metadata,11(2),51-62.
  55. Yus, R.,Pappachan, P.(2015).Are Apps Going Semantic? A Systematic Review of Semantic Mobile Applications.the 1st International Workshop on Mobile Deployment of Semantic Technologies (MoDeST 2015), co-located with the 14th International Semantic Web Conference (ISWC 2015),Bethlehem, PA.:
  56. Zaveri, A.,Rula, A.,Maurino, A.,Pietrobon, R.,Lehmann, J.,Auer, S.(2016).Quality assessment for linked data: A survey.Semantic Web,7(1),63-93.
  57. Zhu, R.,Hu, Y.,Janowicz, K.,McKenzie, G.(2016).Spatial signatures for geographic feature types: Examining gazetteer ontologies using spatial statistics.Transactions in GIS,20(3),333-355.