题名 |
利用關聯探勘技術壓縮原生型XML資料庫 |
并列篇名 |
Compressing the Native XML Database via Association Mining |
DOI |
10.29767/ECS.200803.0005 |
作者 |
李金鳳(Chin-Feng Lee);唐啟明(Chi-Ming Tang) |
关键词 |
XML ; 資料探勘 ; 資料壓縮 ; 原生型XML資料庫 ; XML ; Data Mining ; Data Compression ; Native XML Database |
期刊名称 |
Electronic Commerce Studies |
卷期/出版年月 |
6卷1期(2008 / 03 / 31) |
页次 |
83 - 104 |
内容语文 |
繁體中文 |
中文摘要 |
隨著電子商務的蓬勃發展,延伸式標籤語(eXtensible Markup Language; XM)已成為企業互相傳遞資訊的標準語言。目前透過關聯式資料庫來儲存XML文件的方式時,需建立對映規則將XML文件儲存在數個關聯表裡,此舉不僅會破壞XML的階層結構,也會降低系統的執行效率;而原生型XML資料庫的儲存方式是可以直接將XML文件存入資料庫中,故在效率上比利用關聯式資料庫的方式更佳。因此,本文首先利用資料探勘中的關聯技術找出原生型XML資料庫中的高頻標籤集(frequent tag data sets)及高頻字元資料集(frequent character data sets),並利用這些高頻標籤集及高頻字元資料集建立壓縮規則進行原生型XML資料庫壓縮。另外,XML文件數量會隨著時間成長而產生異動,導致之前所探勘出的高頻標籤集及高頻字元資料集有所變動。故結合動態探勘演算法的概念,將所產生的高頻標籤集、高頻字元資料集及壓縮規則作進行動態維護,不必因為資料的異動而重新對整個原生型XML資料庫作探勘及壓縮。實驗結果顯示本論文所提壓縮方法對XML文件的壓縮率平均在75%。動態壓縮比靜態壓縮方法的壓縮時間節省約40秒,因此本論文所提之壓縮方法是非常有效的。 |
英文摘要 |
XML has become a standard so that the transactional processes operate well in enterprise data exchange. However the existing database systems like relational databases provide inadequate facilities to manage the nested and ordered structures in XML documents. Therefore, there exist two important issues about the storage capacity for huge XML documents and the complexity mapping between the relational databases and XML repository. A native XML database is a solution to efficiently retrieve XML documents as basic units without requiring complicated transformation. Moreover, database compression is bound to relief the storage capacities. Hence, we use association mining techniques to compress a native XML database for solving the above problems. The frequent character data sets and frequent tag sets can be explored out and be applied to establish a set of database compression rules. The proposed method also applies dynamically mining techniques to maintain the compression rules without periodically decompressing and exploring the whole database compression again if there are any database updates. The proposed approach contributes to the native XML database both in extracting hidden information and lossless compression, respectively. The experimental results show that our compression method has powerful compression effectiveness and the static compression can reach the ratio of 75%. When we apply the porposed dynamical mining techniques, we can save 40 seconds in database compression time. |
主题分类 |
基礎與應用科學 >
資訊科學 社會科學 > 經濟學 |
参考文献 |
|