


Infer Individual Customer Preference for a New Product Based on Supermarket Transaction History






購買頻率 ; 新產品偏好推理 ; 本體論模型 ; 產品個人偏好 ; 預測潛在消費者 ; 萃取產品特徵 ; 語義模型 ; 交易紀錄 ; Frequency of Choice (FOC) ; New Product Preference Inference ; Ontology Model ; Personal Preference for Products (PPFP) ; Potential Customer Identification ; Product Feature Extraction ; Semantic Model ; Transaction Data












市場上的產品不斷推陳出新。因着消費者有越來越多的產品選擇性,且隨著數據分析的進步,今預測消費者對新產品的個人偏好在精準營銷中越顯其重要。本論文研究以超市的行銷為範疇,藉由分析交易數據萃取消費者的偏好。 現有的交易數據分析方法,主要是基於他人的紀錄來預測消費者對產品的偏好。基於交易紀錄的個人化偏好推理鮮少考慮。應如何從消費者的個人歷史交易紀錄,預測其對新產品的個人偏好?下列為回答此問題相對應的挑戰: (i)交易紀錄常以表格中的數字呈現,應如何從中萃取個人偏好資訊? (ii)面對新產品與現有產品之間的差異,且未有新產品的購買紀錄下,我們應如何利用消費者交易紀錄中的產品偏好,推斷其對於新產品的偏好及購買決策? 本論文設計了一套以語義為基礎的方法─基於交易紀錄的個人偏好推理引擎(TPIE)─以應對問題和挑戰: (1)從交易紀錄中萃取個人偏好資訊 Samuelson的顯示性偏好理論(Revealed preference theory, 1972)認為消費者的偏好可由過去的購買紀錄、選擇觀察而得。當某消費者滿意於某一項產品,他極有可能重複購買。基於此考量,我們認為消費者對較常購買的產品,有較高的偏好。因此從交易紀錄中,我們計算消費者選擇某產品的頻率,視其為消費者對該產品的個人偏好(Personal Preference for Products, PPFP)。然而,受限於不同產品類別間購買頻率的差異,本論文一次僅分析一種類別的產品。 (2)基於本體論(Ontology)的產品特徵萃取方法及衡量產品相似度 為了預測消費者對新產品的購買決定,我們參考他們對「與新產品相似的既有產品」的偏好。為了定義兩產品之間的相似性,我們採用本體論模型,從「產品資料檔」中萃取產品特徵(feature)。並依照資料檔中所提供的資訊,賦予每項特徵其產品屬性(attribute)及屬性值(attribute value),以藉由計算屬性之間的差異來衡量兩個產品之間的相似度。然而,產品的每項特徵重要性不完全相同;因此,屬性值之間的級距(interval)應有不同的權重。為決定各屬性之屬性值的級距,我們從歷史消費紀錄觀察每位消費者最喜好的三樣產品,其中未改變的屬性所屬的產品特徵,即是消費者最看重的產品特徵。最後,加總所有消費者看重每項產品特徵的次數作為產品特徵重要性的比例,並將其視為屬性的值(attribute value)級距的權重。最後,我們以產品間,特徵屬性值之差異加權總和,計算產品相似度。 (3)預測新產品的個人偏好 我們假設兩樣產品越相似,消費者對其偏好也越接近。為了推斷消費者對於新產品的偏好,我們從消費者購買過的產品中,挑出與新產品最相似的產品。計算「消費者對該產品的PPFP」與「產品相似度」的乘積,視其為新產品預期的PPFP。求得新產品的預期PPFP後,我們需要制定一個閾值來決定是否視消費者為新產品的潛在消費者;例如:設定閾值為「消費者過去購買過的產品中,最喜愛的產品之PPFP的一半」。 為了評估TPIE預測潛在消費者的有效性,我們利用松青超市的兩組產品類別的實際數據進行實驗。第一組數據為牛奶類,共有65個牛奶產品。我們挑選其中一樣產品,視其為新產品,並利用其他64個產品的偏好,判斷消費者的對於新產品的偏好。每樣牛奶產品輪流作為新產品,最後取65樣產品的結果平均值,我們正確地預測約75.01%的消費者是否為潛在顧客。我們的實驗結果優於利用群體消費者偏好預測方法所得的結果。第二組數據─冰棒類,預測的準確率約為57.14%。由於數據稀疏問題,預測結果較利用群體偏好預測結果差。 本論文根據產品本體論方法所設計的TPIE個人偏好推論引擎,具體貢獻在於以消費者購買紀錄,預測其對新產品的個人偏好購買決策。具體貢獻包括: (1)利用消費者的購買頻率,從交易數據中萃取個人偏好; (2)基於產品的本體論模型,找出產品的特徵和屬性,以衡量產品之間的相似度; (3)假設若消費者對於一新產品的偏好程度達過去最喜愛之產品的一半時,即可視其為新產品的潛在購買者。 (4)正確預測75.01%的消費者是否為牛奶類新產品的潛在消費者。


New products are introduced into markets at a fast pace. With customers being increasingly selective in product choices and with the advancement of data analytics revolution, personal preference prediction has grown in importance in precision marketing of new products. In this thesis research, we consider the problem domain of supermarket marketing and sales, with the transaction data analysis as our approach to extract customers’ preferences. Current methods of transaction data analysis predict a customer’s preference for products based mainly on other customers’ records. Preference inference based on personal transaction data is barely considered. How are we going to predict a customer’s preference for a new product only based on his/her own transaction history? The challenges to address this question are as follows: (i) Transaction data is usually numbers in a tabular form. How do we extract personal preference information from the transaction data? (ii) A new product is more or less different from the existing products, and there is no purchase record of the new product yet. How could we infer a customer’s preference and purchase decision for a new product by exploiting his or her own preferences for products in the transaction database? This thesis designs an innovative semantic-based methodology – Transaction data-based Personal preference Inference Engine (TPIE) in response to the problems and challenges: (1) Personal Preference Information Extraction from Transaction Data Exploiting the revealed preference theory by Paul Samuelson, 1972, the preferences of a customer can be revealed by the purchasing choices. It is based on the fact that when a customer is satisfied with a product, he/she would repeat the purchase; thus, the more frequent a product is purchased, the better one likes it. From each customer’s transactions, we calculate the frequency every product is purchased, and consider it to be the customer’s personal preference for products (PPFP). However, due to the purchase frequency differences among categories, we analyze category by category. (2) Ontology-based Feature Extraction and Similarity Measurement To predict customers’ purchase decisions for a new product, we infer on their preferences for existing products which are similar to the new product. In order to define the similarity between two products, we adopt an ontology model-based approach to derive product features from the product profile table. The attributes and the associated attribute values of each feature are then assigned based on information in the product profile table. Similarity between two products is calculated by measuring the attribute differences. Since the importance of individual product features varies, the intervals of attribute values of individual features should be weighted differently. To decide the intervals of attribute values, we seek out each customer’s most cared feature by sorting through the unchanged attribute of their three most favored products. And count the feature each customer cares the most as the attribute interval weight ratios of importance of each feature. We then calculate the product similarity by the weighted sum of feature attribute value differences between the new product and the existing products. (3) Personal Preference Prediction of New Products We reason that the more similar two products are, the closer customer’s preferences for them will be. To infer a customer’s preference for the new product, we identify the similarity between the new product and the purchased products. The customer’s expected PPFP for the new product is a sum of PPFP multiplied by the similarity of each purchased product. Finally, we select a threshold of PPFP for deciding whether one customer may be considered a potential buyer of the new product, for example, half of the PPFP of his/her favorite product from the purchase history. To evaluate the effectiveness of predicting potential customers of TPIE, two real datasets from Matsusei Supermarket are chosen for experiment. First dataset is tested on milk category, consisting of 65 products. We divide the database into the training dataset of 64 products and a test product as the selected new product, i.e., each milk product takes turn to be the new product. In the validation experiment, the outcome outperforms the group preference approach. We correctly predict 75.01% of the customers that whether he/she is a potential customer to the new milk product. The second dataset is tested on popsicle category, with 57.14% accuracy rate. However, the result is worse than group preference approach because of data sparsity. The contribution of this thesis is an innovative design of a semantic-based ontological methodology, TPIE, for predicting personal preferential purchase decision to a new product based on personal transaction data. Specifically, contributions include: (1) Personal preference extraction from the transaction data, based on customer’s frequency of choice; (2) Representing each product based on ontology model product features extraction and attributes assignment for similarity measurements; (3) Assume when a customer’s expected new product PPFP reaches half of the PPFP of the most favored purchased product, he/she is considered to be a potential customer of the new product. (4) Achievement of identifying whether a customer is potential customer of a new milk product with 75.01% correction rate.

