题名

电子商务顾客评论的热点话题分析

并列篇名

E-commerce Customer Reviews hot topic Analysis

DOI

10.6338/JDA.201606_11(3).0001

作者

蔡越(Yue Tsai);郭鹏(Peng Kuo);方匡南(Kuang-Nan Fang)

关键词
期刊名称

Journal of Data Analysis

卷期/出版年月

11卷3期(2016 / 06 / 01)

页次

1 - 15

内容语文

簡體中文

中文摘要

买家评论文本数据是电子商务领域一种重要的数据形式,利用对评论文本数据的分析,电商卖家可以直接了解顾客对产品的态度与建议,提取顾客关注的热点问题,可以进行顾客分类、实现精准营销,改进和提高生产和服务等;买家可以提取所关注属性的相关评价,了解舆论情感倾向,提高购物决策效率。但是大数据环境下海量文本的出现给文本数据的有效利用带来了一定的困难,结构化处理后的文本数据的高维特性给电子商务文本聚类等分析带来了新的挑战。本文我们主要研究当词条数目(变量数)远远大于评论文本数(样本数)时如何归纳顾客评论以及提取热点话题,我们抓取了亚马逊中国站热门产品kindle 的评论文本,通过惩罚高斯混合模型聚类方法,同时进行文本聚类和有效词条筛选,实现了大规模评论文本的有效、快速、自动聚类,为后续更加精细的商业分析提供了良好的分析基础。

英文摘要

Buyer comment text data is an important field of electronic commerce form data using analysis of comment text information, the electricity supplier sellers can understand the attitude and customer product recommendations, extract the hot issues of concern of customers, customer classification can be realized precision marketing, improve and increase the production and services; buyers can extract property interest related evaluation, understand the sentiment of public opinion, to improve the efficiency of decision-making shop. But under the big data environment, the emergence of mass text to the effective use of text data has brought some difficulties, high-dimensional nature of post-processing of structured text information, the text clustering to analyze e-commerce has brought new challenges. In this paper, we focus on when the number of entries (number of variables) is much larger than the number of reviews summarized how the text (sample number) Reviews and extract a hot topic, we crawl Amazon China station Hot Products kindle comment text, by penalizing Gaussian Mixture model clustering, text clustering and simultaneous effective entry screening, to achieve a large-scale text commentary effective, fast, automatic clustering for the subsequent more sophisticated business analysis provides a good basis for analysis.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. Dave, K.,Lawrence, S.,Pennock, D. M.(2003).Mining the peanut gallery: Opinion extraction and semantic classification of product reviews.Proceedings of the 12th international conference on World Wide Web
  2. Dempster, A. P.,Laird, N. M.,Rubin, D. B.(1977).Maximum likelihood from incomplete data via the EM algorithm.Journal of the royal statistical society. Series B (methodological),1-38.
  3. Hatagami, Y.,Matsuka, T.(2009).Text mining with an augmented version of the bisecting k-means algorithm.Neural Information Processing
  4. Maugis-Rabusseau, C.,Michel, B.(2013).Adaptive density estimation for clustering with Gaussian mixtures.ESAIM: Probability and Statistics,17,698-724.
  5. Pan, W.,Shen, X.(2007).Penalized model-based clustering with application to variable selection.The Journal of Machine Learning Research,8,1145-1164.
  6. Willett, P.(1980).Document clustering using an inverted file approach.Journal of Information Science,2(5),223-231.
  7. Yao, M.,Pi, D.,Cong, X.(2012).Chinese text clustering algorithm based k-means.Physics Procedia,33,301-307.
  8. Zhao, P.,Rocha, G.,Yu, B.(2006).,Department of Statistics, UC Berkeley.
  9. 王和勇、蓝金炯(2015)。面向海量高维数据的文本主题发现。情报杂志,34(11),162-167。
  10. 张亮、李敏强(2007)。一种有限混合模型对无监督文本聚类的广义方法。模式识别与人工智能,20(5),698-703。