题名 |
應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究 |
并列篇名 |
A Study on Text Mining of Chinese Rap Music in Taiwan |
DOI |
10.6853/DADH.202110_(8).0001 |
作者 |
韓怡臻(Yi-Chen Han);柯皓仁(Hao-Ren Ke) |
关键词 |
饒舌 ; 文字探勘 ; 詞頻分析 ; 分群 ; 分類 ; rap ; text mining ; word frequency analysis ; clustering ; classification |
期刊名称 |
數位典藏與數位人文 |
卷期/出版年月 |
8期(2021 / 10 / 01) |
页次 |
1 - 41 |
内容语文 |
繁體中文 |
中文摘要 |
邁入千禧年後,饒舌歌曲已逐漸進入主流音樂市場,深受年輕族群的歡迎。饒舌歌手經常透過自行創作的歌詞來抒發心情或表達對社會的批判,瞭解饒舌音樂的歌詞內容也能瞭解當代文化和社會風氣。本研究目的旨在透過文字探勘,去探索臺灣中文饒舌音樂歌詞中可能存在之主題類型。本研究首先進行詞頻分析,從整體、年代兩大面向觀察各關鍵詞的出現頻率以瞭解歌詞文本的基本內涵與詞頻分布,隨後進行了k-means分群演算法(k-means clustering)及鄰近傳播分群法之分群實驗,並利用分群結果與人工標記之結果進行支援向量機與K-近鄰演算法之分類實驗。本研究發現臺灣中文饒舌音樂歌詞近二十年來以音樂、愛情、派對的主題最為常見。分群成效方面,鄰近傳播分群法相較於k-means分群演算法會得到略好些的分群成效。分類成效方面,使用K-近鄰演算法相較於支援向量機會得到略好些的分類成效,而且透過分群結果輔助分類標記能訓練出比純人工標記還要好的音樂類歌詞二元分類模型。音樂類主題的歌詞確實存在於臺灣中文饒舌音樂歌詞中,而其他主題類型的歌詞因為有資料不平衡之問題存在,能否自成一類仍有待觀察。建議未來研究可以增加歌詞文本的收錄範圍、嘗試不同的維度縮減方式、從不同面向進行詞頻分析、偕同專家或閱聽者進行標記、使用不同的分群與分類方法。 |
英文摘要 |
After entering the millennium, rap songs have gradually entered the mainstream music market and are very popular among young people. Rappers often express their emotions or express criticism of society through their own lyrics. Understanding the content of rap music lyrics can also understand contemporary culture and social atmosphere. The purpose of this study is to explore possible thematic types in Chinese rap music lyrics in Taiwan through text mining. This study first conducted word frequency analysis, calculated the total number of occurrences of keywords in the lyrics text, and observed the frequency of each keyword to understand the basic connotation and word frequency distribution of the lyrics texts. Then, this study used k-means and affinity propagation clustering to conduct unsupervised clustering experiments. Finally, this study used the results of the clustering experiment and manual labeling with the support vector machine and the k-nearest neighbor algorithm to conduct a supervised binary classification experiment. The findings of the study show that the themes of music, love, and party are the most common themes of Chinese rap music lyrics in Taiwan in the past two decades. In terms of clustering effectiveness, the affinity propagation clustering performed slightly better than k-means. In terms of classification performance, the k-nearest neighbor algorithm outperformed the support vector machine slightly, and the labeling through the clustering results could train a binary classification model for music lyrics that is better than pure manual labeling. The lyrics with the theme of music do exist in Chinese rap music lyrics in Taiwan, and it remains to be seen whether other themes exist due to the problem of data imbalance. It is suggested that future research can increase the coverage of lyrics text, try different dimension reduction methods, analyze word frequency from different aspects, label types of lyrics by experts or listeners, and use different clustering and classification methods. |
主题分类 |
人文學 >
人文學綜合 基礎與應用科學 > 資訊科學 |
参考文献 |
|