题名 |
Stability and Structure of CART and SPAN Search Generated Data Partitions for the Analysis of Low Birth Weight |
DOI |
10.6339/JDS.2012.10(1).1014 |
作者 |
Roger J. Marshall;Panagiota Kitsantas |
关键词 |
Boolean expression ; CART ; classification ; stability ; SPAN |
期刊名称 |
Journal of Data Science |
卷期/出版年月 |
10卷1期(2012 / 01 / 01) |
页次 |
61 - 73 |
内容语文 |
英文 |
英文摘要 |
Searching for data structure and decision rules using classification and regression tree (CART) methodology is now well established. An alternative procedure, search partition analysis (SPAN), is less well known. Both provide classifiers based on Boolean structures; in CART these are generated by a hierarchical series of local sub-searches and in SPAN by a global search. One issue with CART is its perceived instability, another the awkward nature of the Boolean structures generated by a hierarchical tree. Instability arises because the final tree structure is sensitive to early splits. SPAN, as a global search, seems more likely to render stable partitions. To examine these issues in the context of identifying mothers at risk of giving birth to low birth weight babies, we have taken a very large sample, divided it at random into ten non-overlapping sub-samples and performed SPAN and CART analyses on each sub-sample. The stability of the SPAN and CART models is described and, in addition, the structure of the Boolean representation of classifiers is examined. It is found that SPAN partitions have more intrinsic stability and less prone to Boolean structural irregularities. |
主题分类 |
基礎與應用科學 >
資訊科學 基礎與應用科學 > 統計 |
被引用次数 |