


Applying Text Mining Techniques to Sexual Issues on PTT feminine_sex




余采蓓(Cai-Pei Yu);施俊名(Chun-Ming Shih);郭洪國雄(Kuo-Hsiung Kuo Hung)


文字探勘 ; 性議題 ; 批踢踢實業坊 ; 網路爬蟲 ; 主題模型 ; Text Mining ; Sexual Issues ; PTT ; Web Crawler ; Topic Model




9卷2期(2019 / 01 / 01)


63 - 89




隨著全球進入資料科學的時代,巨量資料來源不僅僅只有結構的資料,文字及不具結構化的資料在我們的生活中也到處可見。使用網路蒐集資訊儼然成為上網的重要目的,挖掘民眾所關注之性相關議題便成為瞭解民眾對性的態度及性知識是相當重要的方法。本研究使用R語言撰寫爬蟲程式來自動抓取批踢踢(PTT)論壇女性性板(feminine_ sex)的文章,蒐集一個年度共1,438篇的文章,從語料庫大量的文字資訊中,我們其實很有機會在性議題裡發展出各種有潛力及有趣的應用,這正是本研究在文字探勘技術的目標。feminine_ sex板經過自然語言斷詞處理,研究結果顯示出現次數最頻繁的前三個詞彙為醫生、問題與男友。主題模型透過K-Means集群演算法,分析結果經命名後呈現大眾討論的議題大多圍繞在親密關係、避孕諮詢以及衛生醫療三個主要議題,而此研究結果亦可提供教育及醫療相關單位,實施性教育及衛教訓練的補強。


Entering the era of information science globally, we find that big data not only contain structured information but also include text and unstructured information. The use of the internet for information collection has become one of the important purposes of the internet. Therefore, it is very important that doing research on how people concerned about the sexual issues could help us to understand people's attitude on sex and their sexual knowledge. This study used the web crawler which created by R language to automatically extract the articles from the feminine sex board, collecting a total of 1,438 articles in one year. Then, from a large amount of information in the text corpus, we were actually given a chance to develop a variety of potential and interesting applications in sexual issues, which is the purpose of this study in the text mining techniques. After the word segmentation in the natural languages processing, the results showed that the three most frequent words in feminine_sex board are doctor, problems, and boyfriend. We used the K-Means cluster algorithm on the topic model. After classifying the analysis results, we get to know that the public discussion topics are mostly about three main issues, which are the intimate relationship, contraceptive counseling, and health care. Hence, we can provide the results for the respective educational and medical authorities to advocate sex education and to improve health care training on this related topic.

主题分类 社會科學 > 社會學
  1. 丁怡婷,劉志光(2010)。文字探勘技術應用於中醫診斷腦中風之研究。數據分析,5(4),41-64。
  2. 陳怡廷,陳麗如,吳姿瑩(2016)。從部落格探索客家旅遊目的地意象之研究─自然語言處理的方法與應用。戶外遊憩研究,29(2),81-111。
  3. 陳裕菘,謝邦昌,李勝輝,陳郁婷(2014)。運用文字探勘與資料採礦技術建立匯率預測模型─以人民幣兌新台幣為例。數據分析,9(1),133-146。
  4. Adriaans, P.,Zantinge, D.(1996).Data mining.Harlow, UK:Addison Wesley.
  5. Aggarwal, C. C.(2015).Data mining: The textbook.Cham, Switzerland:Springer International.
  6. Berezina, K.,Bilgihan, A.,Cobanoglu, C.,Okumus, F.(2016).Understanding satisfied and dissatisfied hotel customers: Text mining of online hotel reviews.Journal of Hospitality Marketing & Management,25,1-24.
  7. Berry, M. J. A.,Linoff, G.(1997).Data mining techniques: For marketing, sales, and customer support.New York:John Wiley & Sons.
  8. Blake, C.(2011).Text mining.Annual Review of Information Science and Technology,45(1),121-155.
  9. Cabena, P.,Hadjinian, P.,Stadler, R.,Verhees, J.,Zanasi, A.(1998).Discovering data mining: From concept to implementation.Upper Saddle River, NJ:Prentice-Hall.
  10. Cooper, A.,Delmonico, D. L.,Burg, R.(2000).Cybersex users, abusers, and compulsives: New findings and implications.Sexual Addiction & Compulsivity: The Journal of Treatment & Prevention,7,5-29.
  11. Delen, D.,Crossland, M. D.(2008).Seeding the survey and analysis of research literature with text mining.Expert Systems with Applications,34,1707-1720.
  12. George, G.,Haas, M. R.,Pentland, A.(2014).Big data and management.Academy of Management Journal,57,321-326.
  13. Han, J.,Kamber, M.(2001).Data mining: Concepts and technologies.Data Mining Concepts Models Methods & Algorithms,5(4),1-18.
  14. He, W.,Zha, S.,Li, L.(2013).Social media competitive analysis and text mining: A case study in the pizza industry.International Journal of Information Management,33,464-472.
  15. Merzel, C. R.,Vandevanter, N. L.,Middlestadt, S.,Bleakley, A.,Ledsky, R.,Messeri, P. A.(2004).Attitudinal and contextual factors associated with discussion of sexual issues during adolescent health visits.Journal of Adolescent Health,35,108-115.
  16. Mishra, P.(2016).R data mining blueprints.Birmingham, UK:Packt.
  17. Moreira, E. D., Jr,Brock, G.,Glasser, D. B.,Nicolosi, A.,Laumann, E. O.,Paik, A.(2005).Help‐seeking behaviour for sexual problems: The global study of sexual attitudes and behaviors.International Journal of Clinical Practice,59,6-16.
  18. Nicholson, S.(2006).The basis for bibliomining: Frameworks for bringing together usage-based data mining and bibliometrics through data warehousing in digital library services.Information Processing & Management,42,785-804.
  19. Piatetsky-Shapiro, G. (2017). Python overtakes R, becomes the leader in data Science, machine learning platforms. Retrieved 6 30, 2018, from https://www.kdnuggets.com/2017/08/python-overtakes-r-leader-analytics-data-science.html
  20. Plaud, J. J.,Gaither, G. A.,Weller, L. A.(1998).Gender differences in the sexual rating of words.Journal of Sex & Marital Therapy,24,13-19.
  21. Sanders, J. S.(1978).Male and female vocabularies for communicating with a sexual partner.Journal of Sex Education and Therapy,4,15-19.
  22. Stephens-Davidowitz, S. (2015). Searching for sex. Retrieved 6 30, 2018, from https://www.nytimes.com/2015/01/25/opinion/sunday/seth-stephensdavidowitz-searching-for-sex.html
  23. Sullivan, D.(2001).Document warehousing and text mining: Techniques for improving business operations, marketing, and sales.New York:Wiley.
  24. 古鐘响(2009)。高雄市,樹德科技大學人類性學研究所。
  25. 朱瑀馨(2007)。台北縣,淡江大學保險學系保險經營研究所。
  26. 曾憲雄,蔡秀滿,蘇東興,曾秋蓉,王慶堯(2005).資料探勘.台北市:旗標.
  27. 鄭天澤、陳麗霞、楊亨利、胡正文、鄭閔安(2017)。2017年台灣寬頻網路使用調查報告。台北市:財團法人台灣網路資訊中心。
  28. 鄭天澤、楊亨利、陳麗霞、胡正文、劉千鳳(2015)。2015年台灣寬頻網路使用調查報告。台北市:財團法人台灣網路資訊中心。
  29. 黄文,王正林(2015).利用R語言打通大數據的經脈.台北市:佳魁資訊.
  1. (2019)。大學生網路社群平臺巨量資料探勘之應用。教育與心理研究,42(3),79-109。
  2. (2024)。以本體論為基礎之會計分錄摘要測試-銷售及採購循環為例。電腦稽核,49,4-25。