题名

迴歸模型偵測歧異點之統計方法於氣溫資料校驗的探討

并列篇名

Evaluation of Outlier Detection Algorithms in Linear Regression for Temperature Validation

作者

李美賢(Mei-Hsien Lee);陳翠玲(Tsui-Ling Chen);陳雲蘭(Yun-Lan Chen);魏裕中(Yu-Chung Wei)

关键词

氣象校驗 ; 歧異值偵測 ; 迴歸模型 ; 貝氏統計 ; Meteorological validation ; Outlier detection ; Bayesian statistics ; Regression model

期刊名称

中國統計學報

卷期/出版年月

57卷4期(2019 / 12 / 01)

页次

286 - 307

内容语文

繁體中文

中文摘要

歧異點偵測爲資料品質管制中重要的一環,氣象的資料校驗對未來準確建構預報系統,以及其他相關產業的應用有重大影響。本文針對由迴歸模型架構下偵測歧異點的概念,設定模型的反應變數與解釋變數分別爲欲校驗判斷歧異點的變項和其參考值,就頻率與貝氏統計學派常用的方法進行研究,探究各種方法於氣溫校驗的適用性,包含殘差法、配適差異法、Cook距離法及貝氏學派的預測分配不一致檢定和隨機誤差配適迴歸模型前後機率比較等方法。頻率學派的student化殘差與student化去點殘差能有效地偵測由系統誤差造成的歧異資料,而配適差異與Cook距離這兩個指標因多考慮了解釋變數的訊息,導致易挑出因極端氣候造成的資料點,但這些資料點僅因發生機會較少但並未偏離迴歸關係線;貝氏學派的檢測方法,雖可綜合現有偵測資料集與歷史資料集的訊息,但須考慮兩資料集的趨勢情況,以更適切地挑選出歧異點。本研究將提供統計領域相關人員簡易了解統計於氣象校驗上的應用,也提供氣象領域人員挑選校驗氣象資料適當統計模型的参考。

英文摘要

Data verification is a critical process to reflect factual information. Meteorological data validation, especially detecting erroneous data points, makes a huge impact on the accurate forecasting, as well as the application of other linked industries. The linear regression model that compared the relationship between verified observation as response and references as an explanatory variable is generally adopted in practical temperature validation. In this study, statistical methods for outlier detection via regression model are evaluated using simulation and real data analysis, including four Frequentist algorithms and two Bayesian approaches. For Frequentist approaches, DFFITS and Cook's distance are less appropriate than studentized and studentized deleted residuals because the data points resulted from extreme climatic rather than false observations are easy to detect. Moreover, Bayesian predictive discordancy test and random error probabilities comparison can synthesize the information of existing detection data sets and historical data sets, but it is necessary to consider the trend of the two data sets to more appropriately identify outliers. This study provides an easy understanding of Statisticians on the application of meteorological verification, as well as a reference for the selection of appropriate statistical models to calibrate of meteorological data by Meteorologists.

主题分类 基礎與應用科學 > 統計
参考文献
  1. Tan, P.-N.,Steinbach, M.,Kumar, V.(2013).Tan, P.-N., Steinbach, M. and Kumar, V. (2013). Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining..
    連結:
  2. Adikaram, K.,Hussein, M.,Effenberger, M.,Becker, T.(2014).Outlier detection method in linear regression based on sum of arithmetic progression.The Scientific World Journal,2014
  3. Aggarwal, C.C.(2015).Outlier analysis.Springer.
  4. Aggarwal, C.C.,Yu, P.S.(2001).Outlier detection for high dimensional data.Proceedings of the ACM SIGMOD Conference 2001
  5. Al-Sharea, Z.(2017).University of Arkansas.
  6. Angiulli, F.,Pizzuti, C.(2002).Fast outlier detection in high dimensional spaces.European Conference on Principles of Data Mining and Knowledge Discovery
  7. Bollen, K.A.,Jackman, R.W.,Research(1985).Regression diagnostics: An expository treatment of outliers and influential cases.Sociological Methods,13,510-542.
  8. Chaloner, K.,Brant, R.(1988).A Bayesian approach to outlier detection and residual analysis.Biometrika,75,651-659.
  9. Chebyshev, P.L. (1867). Des valeurs moyennes. Journal de mathématiques pures et appliquées, 177-184.
  10. Cook, R.D.(1977).Detection of influential observation in linear regression.Technometrics,19,15-18.
  11. Cook, R.D.,Weisberg, S.(1982).Residuals and influence in regression.New York:Chapman and Hall.
  12. Eischeid, J.K.,Bruce Baker, C.,Karl, T.R.,Diaz, H.F.(1995).The quality control of long-term climatological data using objective data analysis.Journal of applied meteorology,34,2787-2795.
  13. Estévez, J.,Gavilán, P.,Giráldez, J.V.(2011).Guidelines on validation procedures for meteorological data from automatic weather stations.Journal of Hydrology,402,144-154.
  14. Fernández-Avilés, G.,Mateu, J.(2015).Spatial and spatio-temporal geostatistical modeling and kriging.John Wiley & Sons.
  15. Fiebrich, C.A.,Crawford, K.C.(2001).The impact of unique meteorological phenomena detected by the Oklahoma Mesonet and ARS Micronet on automated quality control.Bulletin of the American Meteorological Society,82,2173-2188.
  16. Gary, K.(2003).Bayesian econometrics.Sussex, England:J Wiley and Sons.
  17. Geisser, S.(1987).Influential observations, diagnostics and discovery tests.Journal of Applied Statistics,14,133-142.
  18. Hammer, G.,Hansen, J.,Phillips, J.,Mjelde, J.,Hill, H.,Love, A.,Potgieter, A.(2001).Advances in application of climate prediction in agriculture.Agricultural systems,70,515-553.
  19. Hodge, V.,Austin, J.(2004).A survey of outlier detection methodologies.Artificial intelligence review,22,85-126.
  20. Hoeting, J.A.,Madigan, D.,Raftery, A.E.,Volinsky, C.T.(1999).Bayesian model averaging: a tutorial.Statistical science,382-401.
  21. Hubbard, K.,Goddard, S.,Sorensen, W.,Wells, N.,Osugi, T.(2005).Performance of quality assurance procedures for an applied climate information system.Journal of Atmospheric Oceanic Technology,22,105-112.
  22. Kalman, R.E.(1960).A new approach to linear filtering and prediction problems.Journal of basic Engineering,82,35-45.
  23. Motulsky, H.J.,Brown, R.E.(2006).Detecting outliers when fitting data with nonlinear regression–a new method based on robust nonlinear regression and the false discovery rate.BMC bioinformatics,7,123-1.
  24. Neter, J.,Kutner, M.H.,Nachtsheim, C.J.,Wasserman, W.(1996).Applied linear statistical models.Irwin Chicago.
  25. Niu, Z.,Shi, S.,Sun, J.,He, X.(2011).A survey of outlier detection methodologies and their applications.International Conference on Artificial Intelligence and Computational Intelligence
  26. O’Brien, K.J.,Keefer, T.N.(1985).Real-time data verification.Computer Applications in Water Resources
  27. Peña, D.,Guttman, I.(1993).Comparing probabilistic methods for outlier detection in linear models.Biometrika,80,603-610.
  28. Pipino, L.L.,Lee, Y.W.,Wang, R.Y.(2002).Data quality assessment.Communications of the ACM,45,211-218.
  29. Raftery, A.E.,Madigan, D.,Hoeting, J.A.(1997).Bayesian model averaging for linear regression models.Journal of the American Statistical Association,92,179-191.
  30. Rawlings, J.O.,Pantula, S.G.,Dickey, D.A.(2001).Applied regression analysis: a research tool.Springer Science & Business Media.
  31. Ro, K.,Zou, C.,Wang, Z.,Yin, G.(2015).Outlier detection for highdimensional data.Biometrika,102,589-599.
  32. Singh, K.,Upadhyaya, S.(2012).Outlier detection: applications and techniques.International Journal of Computer Science Issues,9,307-323.
  33. Stein, M.L.(2012).Interpolation of spatial data: some theory for kriging.Springer Science & Business Media.
  34. Ting, J.-A.,D’Souza, A.,Schaal, S.(2007).Automatic outlier detection: A Bayesian approach. Robotics and Automation.2007 IEEE International Conference on
  35. Tukey, J.W.(1977).Exploratory data analysis.Reading, Mass..
  36. West, M.(1984).Outlier models and prior distributions in Bayesian linear regression.Journal of the Royal Statistical Society. Series B,431-439.
  37. Younger, M.S.(1979).Handbook for linear regression.
  38. Zahumenský, I.(2004).Zahumenský, I. (2004). Guidelines on quality control procedures for data from automatic weather stations World Meteorological Organization, Switzerland..
  39. Zheng, Y.-Z.(2018).Feng Chia University.
  40. 方惠民,蕭松山,賴澄燦,鄭國璘(2006)。應用克利金法推估潮間帶地形資料之研究。第28 屆海洋工程研討會
  41. 李天浩,溫欣儀,陳雲蘭,陳孟詩(2014)。通用克利金法的統計結構模型選擇和參數檢定方法。103 年天氣分析與預報研討會
  42. 陳雲蘭,薛宏宇,呂致穎,陳品妤,詹智雄,沈里音(2015)。「臺灣長期氣候資料整集分析」計畫研究(1)—自動氣象站氣溫觀測值合理性檢測方法探討及分析。104年天氣分析與預報研討會
  43. 馮豐隆,高堅泰(1999)。應用克利金推估模式於降雨製圖。台大實驗林研究報告,13,155-163。
  44. 薛宏宇,呂致穎,陳翠玲(2016)。應用數值預報模式增強氣溫觀測資料偵錯研判分析。105 年天氣分析與預報研討會,: