题名 |
透過新聞文章預測股價漲跌趨勢-結合情緒分析、主題模型與模糊支持向量機 |
并列篇名 |
Sentiment and Topic Analysis on Financial News for Stock Movement Prediction by Using Fuzzy Support Vector Machine |
作者 |
郝沛毅(Pei-Yi Hao);歐仁彬(Jen-Bing Ou);黃天受(Tien-Shou Huang);林振穎(Zhen-Ying Lin);吳建生(Jian-Sheng Wu) |
关键词 |
股價預測 ; 情緒分析 ; 潛在狄利克雷分配 ; 文字探勘 ; 模糊理論 ; 支持向量機 ; stock trend prediction ; sentiment analysis ; latent dirichlet allocation ; text mining ; fuzzy theory ; support vector machine |
期刊名称 |
資訊管理學報 |
卷期/出版年月 |
25卷4期(2018 / 10 / 31) |
页次 |
363 - 395 |
内容语文 |
繁體中文 |
中文摘要 |
能夠成功預測股票漲跌趨勢明顯地有許多好處,根據效率市場假設,公司股票的價值是由當前所有可用的信息給定。當分析師、投資者和機構交易者評估當前股價時,新聞在股價估值過程中發揮重要作用。事實上,金融新聞刊載有關於公司基本面的訊息,和影響市場參與者期望的質化訊息。在大數據時代,線上新聞文章的數量持續增長,在如此巨量的文字資料面前,越來越多的機構依靠現代計算機的高速處理能力來進行文字探勘與機器學習,以建構更準確的股價趨勢預測模型。使用文章中非結構化的數據,是最具挑戰性的研究方向,也將是本研究工作的重點,在本論文中,我們將從新聞文章中萃取出隱含的主題模型與情緒資訊,此外,我們將開發一個模糊支持向量機來融合線上新聞文章內含的豐富資訊,以預測股價的漲跌趨勢。我們認為模糊理論非常適用於本研究,因為文字本身就是模糊的(例如,高低、大小),而且在漲跌趨勢之間,存在一條曖昧的模糊邊界(例如,漲0.01%與漲1%雖然都屬於上漲的類別,但是屬於的程度明顯不同)。本研究在食品類股的預測正確率最高為87%,半導體類股的正確率最高為71%,電腦周邊類股的預測正確率最高為69%,相較於傳統支持向量機透過關鍵字來預測股價漲跌趨勢的正確率僅五成多(接近於隨機猜測),本研究所提出的方法明顯優於傳統的支持向量機預測模型。 |
英文摘要 |
Purpose-In Big Data era, the amount of news articles has been increasing tremendously. In front of such a big volume of textual data, more and more institutions rely on the high processing power of modern computers for text mining and machine learning to make more accurate predictions of stock market. Discovering the fundamental data available in unstructured text is the most challenging research aspect and therefore is the goal of this work. Design/methodology/approach-In this study, we extracted the hidden topic model and emotional information from news articles. Besides, we developed a fuzzy support vector machine to merge the abundant information from the on-line news, which can be used to forecast the trend of stock prices. Fuzzy set theory is very useful for this study because the texts are fuzzy in itself (such as high/low and big/small), and there is an ambiguous boundary between rise and fall categories. For example, going up either 10% or 1% belongs to rise category, but is different in degree. Findings-As for this study, the highest forecast accuracy rate was 87% for the food-related stocks, 71% for the semiconductors-related stocks, and 69% for the computer peripheral-related stocks. When compared with traditional support vector machine, which the forecast accuracy rates of stock price trends were just over 50% (nearly to random guess), the method proposed in this study is significantly better than the forecasting model of traditional support vector machine. Research limitations/implications-This study focused only on accurately classifying the stock movement based on hidden topic and sentiment features. In our future work, we plan to investigate more complex semantic features. Practical implications-Successful predictions of stock price movement tendency have obvious advantages. According to the Efficient Market Hypothesis, the price of a stock asset is given by all information available in the moment. Financial news carries information about the firm's fundamentals and qualitative information influencing expectations of market participants. This study employs sentiment and topic analysis on financial news to predict stock movement. This can help analysts, investors and institutional traders to effectively evaluate current stock prices. Originality/value-This study is, to the best of our knowledge, the first attempt to apply fuzzy support vector machine and hidden topic/semantic features for the prediction of stock movement in Taiwan. |
主题分类 |
基礎與應用科學 >
資訊科學 社會科學 > 管理學 |
参考文献 |
|
被引用次数 |
|