题名 |
以大數據分析球員技術面表現、對戰組合與中華職棒歷年票房之相關性 |
并列篇名 |
Using big data to analyze the correlation between baseball players' performance, the matching teams, and fans attendance of Chinese professional baseball league |
DOI |
10.3966/10247297201712500S006 |
作者 |
許懷中(Hwai-Jung Hsu);黃致豪(Chih-Hao Huang) |
关键词 |
棒球 ; 數據棒球 ; 職業棒球票房 ; 運動大數據 ; baseball ; sabermetrics ; professional baseball attendance ; sport big data |
期刊名称 |
體育學報 |
卷期/出版年月 |
50卷S期(2017 / 12 / 02) |
页次 |
79 - 90 |
内容语文 |
繁體中文 |
中文摘要 |
緒論:在棒球比賽中,賽場上有眾多元素能吸引現場觀眾,包括三振、盜壘、全壘打、投打對決等,而這些技術面表現,以及對戰組合,哪些真正能影響比賽票房,為本研究目的。研究團隊計算中職元年至27年間公開可取得的球員表現與票房之關係,以資料驅動(data-driven)的選模方法,由數據之間的關係以及資訊理論如關聯性分析、赤池信息量準則等歸納出最相關的表現參數。方法:本研究在不做任何假設的前提下,由網路爬蟲程式完整蒐集中華職棒大聯盟官網的1990到2016年每支球隊的各項數據,從6,870場比賽,714,480筆資料中歸納出中職各隊各年度之29項攻守數據,並藉由關聯性分析、線性回歸模型進行解釋性分析,剖析球隊對戰、選手表現數據包括球隊的勝率、安打數、全壘打數、保送數、三振數、為哪一隊等,對於各隊年度票房之影響,再以大數據自動建模方式將上述數據結合球隊因素,與中華職棒從創始到2016年的27年間的各隊票房做相關性分析。結果:研究結果發現全壘打、盜壘數、打者被三振頻率、失誤數、投手不被全壘打的能力、是否為兄弟象及是否為中信鯨為對球迷進場票房影響最大的幾個因素,選出模型的決定係數(R2)為0.76,解釋力十分優秀,模型內變數之p值全數小於.05。結論:球團可依此研究做為挑選新秀之方針之一,而球員也可以依研究結果加強訓練重點,所有進場球迷最想看到的是拚勁與專注度,所以失誤對票房有最大的負面影響。而全壘打的激情、將球打進場內讓球迷看到精采攻防、投手壓制對手的能力則是對票房有最大的正面幫助。 |
英文摘要 |
Introduction: The Chinese Professional Baseball League (CPBL) has been in operation for 28 years and, among all factors of player performance, including stealing bases, homeruns, strike outs, we want to find out what are the most important factors affecting fan attendance. Method: In this research, we use big data, auto model selection method to look for significant factors from all performance associated numbers, including winning percentage, hits, home runs, walks, strikeouts, and 23 other team statistics. We use a crawler program to collect data from the Chinese Professional Baseball League (CPBL) website and exam the teams' performance in more than six thousand games versus the fans' attendance of the games from 1990 to 2016, 27 years in total. We use auto model selection to find the best model and coefficients, and the most relevant factors to the fan attendance. Result: We found that errors, home runs, K/AB, steals, and pitchers' HR/AB are the most important factors that affect the attendance of CPBL. The R^2 of the selected model is 0.76, which is high correlated. All the p values in the model are less than .05. Conclusion: While errors is an indication to the fans that a player is not focused and thus drive the fans away, home runs, stealing bases, strike outs and the pitchers ability to prevent homeruns are all important factors to attract fans. The clubs can have these ideas in mind when they draft and trade players. And coaches can adjust the team's style of playing. Fans rather see the players they support play actively, stealing some bases, than wait for a walk passively. |
主题分类 |
社會科學 >
體育學 |
参考文献 |
|
被引用次数 |
|