题名

Variable Screenings in Binary Response Regressions with Multivariate Normal Predictors

并列篇名

自變數事前篩選方法應用於具備多維常態自變數之二元迴歸分析

作者

張升懋(Sheng-Mao Chang)

关键词

連結函數 ; 邏輯式迴歸 ; 機率單位迴歸 ; 必然獨立篩選 ; link function ; logistic model ; probit model ; sure independence screening

期刊名称

中國統計學報

卷期/出版年月

51卷4期(2013 / 12 / 01)

页次

427 - 444

内容语文

英文

中文摘要

對於迴歸分析而言,一個好的自變數事前篩選方法可以合理地降低迴歸問題的維度。針對事前篩選,必然獨立篩選法是一個快速的篩選方法,此法使用簡單線性迴歸的斜率來測量自變數與應變數之間的關係,斜率大者將被視為較有可能直接影響應變數的自變數,所以最後的迴歸模型只包含這些擁有大斜率的自變數。然而,若真正的迴歸模型包含二個以上的自變數,則用來篩選自變數的簡單線性迴歸模型便是一個錯的模型。因此本研究探討必然獨立篩選法的性質,當應變數為二元變數且多個自變數的線性組合透過連結函數影響應變數。在所使用的自變數服從多元常態且連結函數可以表示為常態分配函數的混合函數的條件之下,我們使用最大概似估計法與最小平方法得到不同的篩選方法並探討其理論性質與實際應用上的表現。

英文摘要

Screening before model building is a reasonable strategy to reduce the dimension of predictiors in regression problems. Sure independence screening is an efficient approach to this purpose which uses the slope estimate of a simple linear regression as a surrogate measure of the association between the response and the predictor. Therefore, the final model can be built based on those predictors with steep slopes. However, if the response is truly affected by a nontrivial linear combination of some predictors, then the simple linear regression model is a misspecified model. In this work, we investigate the performance of the sure independence screening in the view of model misspecification for binary response regressions. Both maximum likelihood screening and least square screening are studied under the assumption that predictors follow a multivariate normal distribution and both the true and working link functions belong to a class of scale mixtures of normal distributions.

主题分类 基礎與應用科學 > 統計
参考文献
  1. Albert, A.,Anderson, J.A.(1984).On the existence of maximum likelihood estimates in logistic regression models.Biometrika,71,1-10.
  2. Andrews, D.F.,Mallows, C.L.(1974).Scale mixtures of normal distributions.Journal of the Royal Statistical Society, Series B,36,99-102.
  3. Arnold, B.C.,Beaver, R.J.(2000).Hidden truncation models.Sankhyā: The Indian Journal of Statistics,62,23-35.
  4. Balakrishnan, N.(ed.)(1992).Handbook of the Logistic Distribution.New York:Marcel Dekker.
  5. Biswas, A.,Hwang, J.-S.(2002).A new bivariate binomial distribution.Statistics & Probability Letters,60,231-240.
  6. Box, G.E.P.,Tiao, G.C.(1973).Bayesian inference in statistical analysis.Addison-Wesley.
  7. Crouch, E.A.,Spiegelman, D.(1990).The evaluation of integrals of the form∫∞-∞ f(t) exp(-t2)dt: applications to logistic-normal models.Journal of the American Statistical Associations,85,464-467.
  8. Dudoit, S.,Fridlyand, J.,Speed, T.P.(2002).Comparison of discrimination methods for the classification of tumors using gene expression data.Journal of the American Statistical Association,97,77-87.
  9. Fan, J.,Lv, J.(2008).Sure independence screening for ultrahigh dimensional feature space.Journal of the Royal Statistical Society, Series B,70,849-911.
  10. Fan, J.,Song, R.(2010).Sure independence screening in generalized linear models with NP-dimensionality.The Annals of Statistics,38,3567-3604.
  11. Huang, J.,Horowitz, J.,Ma, S.(2008).Asymptotic properties of bridge estimators in sparse high-dimentional regression model.Annals of Statistics,36,587-613.
  12. Li, K.-C.,Duan, H.(1989).Regression analysis under link violation.The Annals of Statistics,17,1009-1052.
  13. McCullagh, P.,Nelder, J.A.(1989).Generalized Linear Models.CHAPMAN & HALL/CRC.
  14. Stefanski, L.A.(1990).A normal scale mixture representation of the logistic distribution.Statistics Probability Letters,11,69-70.
  15. West, M.(1987).On scale mixtures of normal distributions.Biometrika,74,664-668.