英文摘要
|
Technological advances in biomedicine, computing, and storage have led to an explosion of digital information and present new challenges in data acquisition, processing, management, transferring, and analysis. The value of big data lies in the analytical use of its information to generate knowledge and action. The goal of big data analytics is to extract knowledge from the data to draw conclusions and make decisions. The purpose of this article is to present a view of prospects of statistics in the context of big data analytics. Statistics is a very old discipline for data analysis and data inference using methods based on probability theory. Statistics and data mining techniques that are useful for big data analytics include: significance testing, classification, re-gression/prediction, cluster analysis, association rule learning, anomaly detection, and visualization. Statistical analysis provides a scientific justification to move from data to knowledge to action, and is essential to big data analytics. In addition, big data analytics requires good computer skills in information processing and programming skills as well as knowledge expertise that can be applied to the domain of applications. Statisticians can serve a leadership role in the big data movement.
|
参考文献
|
-
Jordan, J. M.,Lin, D. K. J.(2014).Statistics for Big Data: Are Statisticians Ready for Big Data?.ICSA Bulletin,26,58-65.
連結:
-
Blei, D. M.,Ng, A. Y.,Jordan, M. I.(2003).Latent Dirichlet Allocation.Journal of Machine Learning Research,3,996-1022.
-
Breiman, L.(2001).Random forest.Mach. Learning,45,5-32.
-
Brieman, L.,Friedman, J.,Olshen, R.,Stone, C.,Steinberg, D.(1995).CART: Classi cation and Regression Trees.Stanford, CA.:
-
Chen, C. H.(2002).Generalized Association Plots: Information Visualization via Iteratively Generated Correlation Matrices.Statistica Sinica,12,7-29.
-
Chen, Chun-houh.,Härdle, Wolfgang,Unwin, Antony(2008).Handbook of Data Visualization.Berlin, Germany:Springer.
-
Cleveland, William S.(1994).The Elements of Graphing Data.Summit, NJ:Hobart Press.
-
Cox, DR,Oakes, D.(1984).Analysis of survival data.London, UK:Chapman and Hall.
-
Davidian, M.,Louis, T. A.(2012).Why statistics?.Science,336,12.
-
Ginsberg, J.,Mohebbi, M. H.,Patel, R. S.,Brammer, L.,Smolinski, M. S.,Brilliant, L.(2009).Detecting influenza epidemics using search engine query data.Nature,457,1012-1014.
-
Goodnight, G.(2011).Executive Edge: Statistics make the world work better.analytics magazine
-
Guyon, I.,Weston, J.,Barnhill, S.,Vapnik, V.(2002).Gene selection for cancer classi cation using support vector machines.Machine Learning,46,389-422.
-
Haha, G. J.,Doganaksoy, N.(2011).A Career in Statistics: Beyond the Numbers.John Wiley & Sons.
-
Hastie, T.,Tibshirani, R.,Friedman, J.(2001).The Elements of Statistical Learning: Data Mining, Inference, and Prediction.Springer.
-
Jacoby, William G.(1998).Statistical Graphics for Visualizing Multivariate Data.Thousand Oaks, CA:Sage.
-
Kotsiantis, S. B.(2007).Supervised machine learning: A review of classication.Techniques Informatica,31,249-268.
-
Laney, D.(2001).,Gartner.
-
Lazer, D.,Kennedy, R.,King, G.,Vespignani, A.(2014).The Parable of Google Flu: Traps in Big Data Analysis.Science,343,1203-1205.
-
McCullagh, P.,Nelder, J. A.(1989).Generalized Linear Model.London:Chapman Hall.
-
Tukey, John W.(1977).Exploratory Data Analysis.Reading, MA:Addison-Wesley Publishing Company.
-
Uesaka, H.(2007).Sample size allocation to regions in a multiregional trial.Journal of Biopharmaceutical Statistics,19,580-594.
-
Vapnik, V.(1998).Statistical learning theory.New York:Wiley.
|