题名

探勘不平衡資料集中之突顯樣式-以國道事故資料為實證研究

并列篇名

Mining Emerging Patterns from Imbalance Dataset-A Case Study on Freeway Accident Database

作者

鄭麗珍(Li-Chen Cheng);李麗美(Li-Mei Lee)

关键词

關聯規則分類 ; 突顯樣式 ; 不平衡資料集 ; 高速公路事故 ; 權重支持度 ; Associative Classification ; Emerging Patterns ; Imbalance Dataset ; Freeway Accident ; Weight Support

期刊名称

資訊管理學報

卷期/出版年月

21卷2期(2014 / 04 / 01)

页次

161 - 183

内容语文

繁體中文

中文摘要

在資料探勘的分類問題中,大多數演算法都是設計在資料類別分布平均的情況下去訓練分類模型。然而,在實務應用上,資料類別分布不平衡是常見的狀況,在這樣的資料集設計的分類方法是很重要的研究議題。此外,透過分類模型所找到的規則常瑣碎複雜,透過突顯樣式探勘可以整理篩選出具有區分找出兩個類別之間的顯著差異與獨特識別的規則。然而,過去沒有相關研究在不平衡資料集上作突顯樣式探勘。本研究提出一個新的研究架構,基於關聯規則分類的方法,調整資料的權重於計算支持度,以探勘出不平衡資料集之突顯樣式,並加入不同年份間的突顯樣式變化探勘。本研究以真實之國道交通事故資料集為實證基礎,此資料為一個嚴重不平衡的資料集,死亡事故僅佔全部事故資料的百分之一比例都不到。然而,主管機關一直努力探求了解死亡事故發生原因,希望可以透過各項因應措施,增進行車安全減低死亡事故發生。本研究將透過提出之研究架構,找出一般及稀有死亡事故的肇事因子間關聯,並分析不同年度間肇事因子,找出一些重要的樣式,提供交通管理單位參考。

英文摘要

Traditional associative classification is used to search frequent patterns at the balance datasets. However, most real life datasets are imbalance. To discover special rare patterns from imbalance dataset is an important job. Currently, the freeway becomes the main transportation route at Taiwan. Because of the high speed and heavy traffic, accidents at highway would cause more serious injuries than other roads. The serious injury accidents are very small part among the accident data. The impact factors of these special cases are the most important issue. This study proposes a framework to explore the most significant reasons for serious accidents. The framework combines the associative classification method with the emerging patterns mining to discover rare and serious incidents. The weight of each accident is adjusted by the severity of accident. Since the rare items can be discovered by the proposed formula of calculation support. The results of an experiment that was conducted on a real accidents data demonstrated the efficacy of the proposed approach. After analysing these accidents, we provide some suggestions.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. 吳冠宏、吳信宏、郭廣洋(2006)。應用分群技術於交通事故資料分析。品質學報,13(3),305-312。
    連結:
  2. Abdelwahab, H.T.,Abdel-Aty, M.A.(2001).Development of artificial neural network models to predict driver injury severity in traffic accidents at signalizes intersection.Transportation Research Record,1746,6-13.
  3. Agrawal, R.,Imilienski, T.,Swami, A.(1993).Mining association rules between sets of items in large databases.Proceedings of ACM SIGMOD International Conference on Management of Data,,Washington, USA:
  4. Agrawal, R.,Srikant, R.(1994).Fast algorithms for mining association rules.Proceedings of the Twentieth International Conference on Very Large Data Bases,Santiago, Chile:
  5. Alhammady, H.(2007).Mining streaming emerging patterns from streaming data.Proceedings of the IEEE/ACS International Conference on Computer Systems and Applications,Amman, Jordan:
  6. Ali, K.,Manganaris, S.,Srikant, R.(1997).Partial classification using association rules.Proceedings of the Third International Conference on Knowledge Discovery and Data Mining,Newport Beach, California, USA:
  7. Anderson, T.K.(2009).Kernel density estimation and K-means clustering to profile road accident hotspots.Accident Analysis and Prevention,41,359-364.
  8. Antonie, M.L.,Zaiane, O.R.,Coman, A.(2003).Associative Classifiers for Medical Images, Mining Multimedia and Complex Data.New York, US:Springer Berlin Heidelberg.
  9. Cao, L.,Zhao, Y.,Zhang, C.(2008).Mining impact-targeted activity patterns in imbalanced data.IEEE Transaction on Knowledge and Data Engineering,20(8),1053-1066.
  10. Ceci, M.,Appice, A.,Caruso, C.,Malerba. D.(2008).Discovering emerging patterns for anomaly detection in network connection data.Proceedings of the Seventeenth International Symposium,Toronto, Canada:
  11. Chang, L.Y.,Wang, H.Y.(2006).Analysis of traffic injury severity: an application of non-parametric classification tree techniques.Accident Analysis and Prevention,38,1019-1027.
  12. Chen, S.S.,Huang, C.K.(2013).An efficient model for mining precise quantitative association rules with multiple minimum supports.International Journal of Innovative Computing, Information and Control,9(1),207-222.
  13. Chong, M.M.,Abraham, A.,Paprzycki, M.(2005).Traffic accident analysis using machine learning paradigms.Informatica,29,89-98.
  14. Chong, M.M.,Abraham, A.,Paprzycki, M.(2004).Traffic accident analysis using decision trees and neural networks.IADIS International Conference on Applied Computing,Portugal:
  15. Delen, D.,Sharda, R.,Bessonov, M.(2006).Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks.Accident Analysis and Prevention,38(3),434-444.
  16. Depaire, B.,Wets, G.,Vanhoof, K.(2008).Traffic accident segmentation by means of latent class clustering.Accident Analysis and Prevention,40,1257-1266.
  17. Dong, G.,Li, D.,Wong, L.(2005).The use of emerging patterns in the analysis of gene expression profiles for the diagnosis and understanding of diseases.New Generation of Data Mining Applications,New Jersey, US:
  18. Dong, G.,Li, J.(1999).Effcient mining of emerging patterns: discovering trends and differences.Proceedings of the Fifth International Conference Knowledge Discovery and Data Mining,San Diego, CA, USA:
  19. García-Borroto, M.,Martínez-Trinidad, J.,Carrasco-Ochoa, J.(2012).A survey of emerging patterns for supervised classification.Artificial Intelligence Review,October, No. 6,1-17.
  20. George, T.,Ioannis, K.,Ioannis, V.(2011).PolyA-iEP: a data mining method for the effective prediction of polyadenylation sites.Expert Systems with Applications,38(10),12398-12408.
  21. Han, J.,Kamber, M.(2006).Data Mining: Concepts and Techniques.Boston, US:Elsevier.
  22. Hu, Y.H.,Chen, Y.L.(2006).Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism.Decision Support Systems,42(1),1-24.
  23. Hu, Y.H.,Wu, F.,Liao, Y.J.(2013).An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports.Journal of Systems and Software,86(5),1224-1238.
  24. Huang, C.K.(2013).Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions.Information Sciences,222,126-146.
  25. Koh, Y.S.,Rountree, N.(2005).Finding sporadic rules using apriori-inverse.Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining,Hanoi, Vietnam:
  26. Li, J.,Wong, L.(2002).Identifying good diagnostic gene proups from gene expression profiles using the concept of emerging patterns.Bioinformatics,18,725-734.
  27. Li, W.,Han, J.,Pei, J.(2001).Accurate and efficient classification based on multiple class-association rules.Proceedings of the IEEE International Conference on Data Mining,San Jose, CA, USA:
  28. Liu, B.,Hsu, W.,Ma, Y.(1998).Integrating classification and association rule mining.Proceedings of the Third International Conference on Knowledge Discovery and Data Mining,New York, USA:
  29. Liu, B.,Hsu, W.,Ma, Y.(1999).Mining association rules with multiple minimum supports.Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Disconvery and Data Mining,San Diego, CA, USA:
  30. Mussone, L.,Ferrari, A.,Oneta, M.(1999).An analysis of urban collisions using an artificial intelligence model.Accident Analysis and Prevention,31(6),705-718.
  31. Nefti, S.,Oussalah, M.(2004).A neural network approach for railway safety prediction.Proceedings of IEEE International Conference on Systems, Man and Cybernetics,Hague, Netherlands:
  32. Romero,C.,Romero, J.R.,Luna, J.M.,Ventura, S.(2010).Mining rare association rules from e-learning data.Proceedings of the 3rd International Conference on Educational Data Mining,Pittsburgh, PA, USA:
  33. Sohn, S.Y.,Lee, S.H.(2003).Data fusion, ensemble and clustering to improve the classification accuracy for the severity of road traffic accidents in Korea.Safety Science,41,1-14.
  34. Solomon, S.,Nguyen, H.,Liebowitz, J.,Agresti, W.(2006).Using data mining to improve traffic safety programs.Industrial Management & Data Systems,106(5),621-643.
  35. Szathmary, L.,Napoli, A.,Valtchev, P.(2007).Towards rare itmeset mining.Proceedings of the Ninteenth IEEE International Conference on Tools with Artificial Intelligence,Patras, Greece:
  36. Tao, F.,Murtagh, F.,Farid, M.(2003).Weighted association rule mining using weighted support and significance framework.Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington, DC, USA:
  37. Thabtah, F.,Cowling, P.,Hammoud, S.(2006).Improving rule sorting, predictive accuracy and training time in associative classification.Expert Systems with Applications,31(2),414-426.
  38. Troiano, L.,Scibelli, G.,Birtolo, C.(2009).A fast algorithm for mining rare itemsets.Proceedings of the Ninth International Conference on Intelligent Systems Design and Applications,Pisa, Italy:
  39. Veloso, A.,Meira, W., Jr.,Zaki, M.J.(2006).Lazy associative classification.Proceedings of the Sixth IEEE International Conference on Data Mining,Hong Kong, China:
  40. Wang, J.,Karypis, G.(2005).HARMONY: efficiently mining the best rules for classification.Proceedings of the Fifth SIAM International Conference on Data Mining,Newport Beach, California, USA:
  41. Weng, C.H.(2011).Mining fuzzy specific rare itemsets for education data.Knowledge-Based Systems,24,697-708.
  42. Xie, Y.,Lord, D.,Zhang, Y.(2007).Predicting motor vehicle collisions using Bayesian neural network models: an empirical analysis.Accident Analysis and Prevention,39,922-933.
  43. Yin, X.,Han, J.(2003).CPAR: classification based on predictive association rules.Proceedings of the 3rd SIAM International Conference on Data Mining,San Francisco, CA, USA:
  44. Yun, H.,Ha, D.,Hwang, B.,Ryu, K.H.(2003).Mining association rules on significant rare data using relative support.The Journal of Systems and Software,67,181-191.
  45. Zhou, L.,Yau, S.(2007).Efficient association rule mining among both frequent and infrequent items.Computers & Mathematics with Applications,54,737-749.
  46. 吳冠宏、吳信宏、郭廣洋(2004)。應用資料挖掘於交通事故資料分析。中華民國品質學會第40 屆年會高雄市分會第30 屆年會暨第10 屆全國品質管理研討會論文集,高雄,台灣:
  47. 周雍傑(2000)。碩士論文(碩士論文)。台南市,國立成功大學交通管理研究所。
  48. 林大煜(1982)。,台北市:交通部運輸研究所。
  49. 林郁志(1998)。碩士論文(碩士論文)。台南市,國立成功大學交通管理研究所。
  50. 戚培芳(1997)。碩士論文(碩士論文)。新竹市,國立交通大學交通運輸研究所。
  51. 陳文杰(2004)。碩士論文(碩士論文)。嘉義市,國立嘉義大學運輸與物流工程研究所。
  52. 陳志和(1999)。碩士論文(碩士論文)。台南市,國立成功大學交通管理研究所。
  53. 黃昶斌(2004)。碩士論文(碩士論文)。新竹市,國立交通大學交通運輸研究所。
  54. 黃湄清(2005)。碩士論文(碩士論文)。桃園縣,國立中央大學土木工程研究所。
  55. 楊思瑜(2003)。碩士論文(碩士論文)。台中市,逢甲大學交通工程與管理研究所。
  56. 蘇志哲(2003)。易肇事地點改善作業手冊之研訂。台北市:交通部運輸研究所。
被引用次数
  1. 楊亞澄、翁政雄、胡雅涵(2016)。運用關聯規則及改變探勘技術於防火牆政策規則優化。資訊管理學報,23(3),277-304。
  2. (2024)。以地理加權邏輯斯迴歸探討國道嚴重事故因子。運輸學刊,36(2),187-216。