题名 |
A New Variable Selection Approach Inspired by Supersaturated Designs Given a Large-Dimensional Dataset |
DOI |
10.6339/JDS.2014.12(1).1183 |
作者 |
Christina Parpoula;Krystallenia Drosou;Christos Koukouvinos;Kalliopi Mylona |
关键词 |
Generalized linear model ; penalized likelihood ; supersaturated design ; trauma ; variable selection |
期刊名称 |
Journal of Data Science |
卷期/出版年月 |
12卷1期(2014 / 01 / 01) |
页次 |
35 - 52 |
内容语文 |
英文 |
英文摘要 |
The problem of variable selection is fundamental to statistical modelling in diverse fields of sciences. In this paper, we study in particular the problem of selecting important variables in regression problems in the case where observations and labels of a real-world dataset are available. At first, we examine the performance of several existing statistical methods for analyzing a real large trauma dataset which consists of 7000 observations and 70 factors, that include demographic, transport and intrahospital data. The statistical methods employed in this work are the nonconcave penalized likelihood methods (SCAD, LASSO, and Hard), the generalized linear logistic regression, and the best subset variable selection (with AIC and BIC), used to detect possible risk factors of death. Supersaturated designs (SSDs) are a large class of factorial designs which can be used for screening out the important factors from a large set of potentially active variables. This paper presents a new variable selection approach inspired by supersaturated designs given a dataset of observations. The merits and the effectiveness of this approach for identifying important variables in observational studies are evaluated by considering several two-levels supersaturated designs, and a variety of different statistical models with respect to the combinations of factors and the number of observations. The derived results are encouraging since the alternative approach using supersaturated designs provided specific information that are logical and consistent with the medical experience, which may also assist as guidelines for trauma management. |
主题分类 |
基礎與應用科學 >
資訊科學 基礎與應用科學 > 統計 |