题名

利用關聯探勘技術壓縮原生型XML資料庫

并列篇名

Compressing the Native XML Database via Association Mining

DOI

10.29767/ECS.200803.0005

作者

李金鳳(Chin-Feng Lee);唐啟明(Chi-Ming Tang)

关键词

XML ; 資料探勘 ; 資料壓縮 ; 原生型XML資料庫 ; XML ; Data Mining ; Data Compression ; Native XML Database

期刊名称

Electronic Commerce Studies

卷期/出版年月

6卷1期(2008 / 03 / 31)

页次

83 - 104

内容语文

繁體中文

中文摘要

隨著電子商務的蓬勃發展,延伸式標籤語(eXtensible Markup Language; XM)已成為企業互相傳遞資訊的標準語言。目前透過關聯式資料庫來儲存XML文件的方式時,需建立對映規則將XML文件儲存在數個關聯表裡,此舉不僅會破壞XML的階層結構,也會降低系統的執行效率;而原生型XML資料庫的儲存方式是可以直接將XML文件存入資料庫中,故在效率上比利用關聯式資料庫的方式更佳。因此,本文首先利用資料探勘中的關聯技術找出原生型XML資料庫中的高頻標籤集(frequent tag data sets)及高頻字元資料集(frequent character data sets),並利用這些高頻標籤集及高頻字元資料集建立壓縮規則進行原生型XML資料庫壓縮。另外,XML文件數量會隨著時間成長而產生異動,導致之前所探勘出的高頻標籤集及高頻字元資料集有所變動。故結合動態探勘演算法的概念,將所產生的高頻標籤集、高頻字元資料集及壓縮規則作進行動態維護,不必因為資料的異動而重新對整個原生型XML資料庫作探勘及壓縮。實驗結果顯示本論文所提壓縮方法對XML文件的壓縮率平均在75%。動態壓縮比靜態壓縮方法的壓縮時間節省約40秒,因此本論文所提之壓縮方法是非常有效的。

英文摘要

XML has become a standard so that the transactional processes operate well in enterprise data exchange. However the existing database systems like relational databases provide inadequate facilities to manage the nested and ordered structures in XML documents. Therefore, there exist two important issues about the storage capacity for huge XML documents and the complexity mapping between the relational databases and XML repository. A native XML database is a solution to efficiently retrieve XML documents as basic units without requiring complicated transformation. Moreover, database compression is bound to relief the storage capacities. Hence, we use association mining techniques to compress a native XML database for solving the above problems. The frequent character data sets and frequent tag sets can be explored out and be applied to establish a set of database compression rules. The proposed method also applies dynamically mining techniques to maintain the compression rules without periodically decompressing and exploring the whole database compression again if there are any database updates. The proposed approach contributes to the native XML database both in extracting hidden information and lossless compression, respectively. The experimental results show that our compression method has powerful compression effectiveness and the static compression can reach the ratio of 75%. When we apply the porposed dynamical mining techniques, we can save 40 seconds in database compression time.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
参考文献
  1. A. Cannane,H. E. Williams(2000).A Compression Scheme for Large Databases.Proceedings of the Australian Database Conference (ADC'2000),22(2),6-11.
  2. C. F. Lee,S. W. Changchien,W. T.Wang,J. J. Shen(2006).A Data Mining Approach to Database Compression.Information Systems Frontier (ISF),8(3),147-161.
  3. C. L. Goh,K. M. Aisaka,Tsukamoto, K. Harumoto,S. Nishio(1998).Database Compression with Data Mining Methods.Proceedings of the 5th International Conference on Foundations of Data OrganiPation (FODO'98)
  4. D. Florescu,D. Kossmann(1999).Storing and Querying XML Data Using an RDBMS.IEEE Data Engineering Bulletin,22(3),27-34.
  5. E. Bertino,B. Catania(2001).Integrating XML and Database.IEEE Internet Computing,5(4),84-88.
  6. Quest Synthetic Data Generation
  7. J. Fong,H. K. Wong,Z. Cheng(2003).Converting Relational Database into XML Documents with DOM.Information and Software Technology,45,335-355.
  8. J. Han,M. Kamber(2001).Data Mining: Concepts and Techniques.Morgan Kaufmann.
  9. J. W. Lee,K. Lee,W. Kim(2001).Preparations for Semantics-Based XML Mining.Proceedings of the IEEE International Conference on Data Mining
  10. M. Strobel(2002).An XML Schema Representation for the Communication Design of Electronic Negotiations.Computer Networks,39,661-680.
  11. R. Agrawal,R. Srikant(1994).Fast Algorithms for Mining Association Rules.Proceedings of the 20th International Conference on Very Large Data Bases (VLDB`94)
  12. R. Agrawal,T. Imielinski,A. Swami(1993).Mining Association Rules between Sets of Items in Large Databases.Proceedings of the ACM SIGMOD Conference on Management of Data,Washington, D.C.:
  13. S.W., Changchien,T.C., Lu(2001).A New Efficient Association Rules Mining Method Using Class Inheritance Tree (CIT).Proceedings of the 12th International Conference on Information Management (ICIM 2001),Twaiwn:
  14. Extensible Markup Language (XML) Version 1.1
  15. 李金鳳、張簡尚偉、王威澤(2001)。2001全國計算機會議-資料庫與軟體工程。台北: