题名

Practicability of Ensemble Artificial Neural Network Models for a Classification Task: An Optimal Approach for Reproducing Classification Practices of Health Consumers Generated Text on Social Media

并列篇名

集成式人工神經網絡模型於分類實務之可行性:以社群媒體之健康消費者資訊分類為例

DOI

10.6182/jlis.202206_20(1).001

作者

Sukjin You;Min Sook Park;Soohyung Joo

关键词

Automated Classification ; Deep Learning ; Artificial Neural Network ; Ensemble Classification Model ; Knowledge Organization ; 自動分類 ; 深度學習 ; 人工神經網絡 ; 集成分類模型 ; 知識組織

期刊名称

圖書資訊學刊

卷期/出版年月

20卷1期(2022 / 06 / 01)

页次

1 - 30

内容语文

英文

中文摘要

This paper reports the classification accuracy of artificial neural network (ANN) models in reproducing health consumers' classification practices in social media. Social media have driven the growth of unstructured text data across domains including health, which motivates researchers to reconsider the epistemological approach to automated classification. This study compared the performance of several types of ANN models and ensemble models based on classification results and the integration of multiple ANN structures. To train these models, two dictionaries were employed: health consumers' terms extracted from questions and answers in the health categories of Yahoo!Answers and MeSH terms. All three types of individual classifiers demonstrated accuracies of around 90%. In particular, the fully connected ANN with two layers produced relatively higher classification performances than a convolutional neural network and long short-term memory. Ensemble models based on classification results outperformed not only the ensemble models based on the integration of heterogeneous ANN structures but also individual deep-learning models. The combination of questions and best answers were found to be most effective as a training dataset to build an accurate prediction model. The findings suggest that ANN models can be an effective assistive tool in classifying online health resources generated by health consumers in natural language.

英文摘要

本文運用人工神經網絡(Artificial Neural Network, ANN)模型,再現社群媒體中健康資訊分類實務之準確性。本研究透過Yahoo!Answers健康類別之問答,提取健康資訊術語,並輔以醫學主題詞表(MeSH terms),訓練並比較數種類型的ANN模型和集成式模型的效能。研究顯示,ANN模型分類準確率約90%;其中,深度神經網絡(Deep Neural Network, DNN)與卷積神經網絡(Convolutional Neural Network, CNN)和長短期記憶模型(long short-term memory, LSTM)相比,分類表現更佳。基於分類結果的集成模型不僅優於以基於異質ANN結構的集成模型,也優於單一深度學習模型;本研究也發現問題和最佳答案的組合是最有效的訓練集,並可以建構準確的預測模型。研究結果顯示,ANN模型可有效輔助分類健康消費者以自然語言生成之線上健康資訊。

主题分类 人文學 > 圖書資訊學
参考文献
  1. Abbas, J.(2010).Structures for organizing knowledge: Exploring taxonomies, ontologies, and other schemas.Neal-Schuman.
  2. Agatonovic-Kustrin, S.,Beresford, R.(2000).Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research.Journal of Pharmaceutical & Biomedical Analysis,22(5),717-727.
  3. Andersen, N.,Sö,vist, T.(2012).,University of Copenhagen.
  4. Apté, C.,Damerau, F.,Weiss, S.(1994).Automated learning of decision rules for text categorization.ACM Transactions on Information Systems,12(3),233-251.
  5. Assefa, S.(2007).University of North Texas.
  6. Bates, M. J.(Ed.),Maack, M. N.(Ed.)(2003).Encyclopedia of Library & Information Science.CRC Press.
  7. Bian, S.,Wang, W.(2007).On diversity and accuracy of homogeneous and heterogeneous ensembles.International Journal of Hybrid Intelligent Systems,4(2),103-128.
  8. Breiman, L.(1996).Bagging predictors.Machine Learning,24(2),123-140.
  9. Brownlee, J. (2019). Ensemble learning methods for deep learning neural networks. Machine Learning Mastery. https://machinelearningmastery.com/ensemblemethods-for-deep-learning-neural-networks
  10. Calefato, F.,Lanubile, F.,Novielli, N.(2016).Moving to stack overflow: Best-answer prediction in legacy developer forums.Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
  11. Cline, R. J. W.,Haynes, K. M.(2001).Consumer health information seeking on the Internet: The state of the art.Health Education Research,16(6),671-692.
  12. Dahlberg, I.(2006).Knowledge organization: A new science?.Knowledge Organization,33(1),11-19.
  13. Dervin, B. (1983, May). An overview of sensemaking research: Concepts, methods and results [Paper presentation]. Annual Meeting of the International Communication Association, Dallas, TX, United States. http://communication.sbs.ohio-state.edu/sense-making/art/artdervin83.html
  14. Dumais, S.,Platt, J.,Heckerman, D.,Sahami, M.(1998).Inductive learning algorithms and representations for text categorization.Proceedings of the Seventh International Conference on Information and Knowledge Management
  15. Efron, B.,Hastie, T.(2016).Computer age statistical inference: Algorithms, evidence, and data science.Cambridge University Press.
  16. Er, O.,Cetin, O.,Bascil, S.,Temurtas, F.(2016).A comparative study on parkinson’s disease diagnosis using neural networks and artificial immune system.Journal of Medical Imaging & Health Informatics,6(1),264-268.
  17. Freund, Y.,Schapire, R. E.(1996).Experiments with a new boosting algorithm.Proceedings of the Thirteenth International Conference on International Conference on Machine Learning
  18. Gazan, R.(2011).Social Q&A.Journal of the American Society for Information Science & Technology,62(12),2301-2312.
  19. Golub, K.(2019).Automatic subject indexing of text.Knowledge Organization,46(2),104-121.
  20. Golub, K.,Soergel, D.,Buchanan, G.,Tudhope, D.,Lykke, M.,Hiom, D.(2016).A framework for evaluating automatic indexing or classification in the context of retrieval.Journal of the Association for Information Science & Technology,67(1),3-16.
  21. Gross, T.,Taylor, A. G.(2005).What have we got to lose? The effect of controlled vocabulary on keyword searching results.College & Research Libraries,66(3),212-230.
  22. Harper, F. M.,Moy, D.,Konstan, J. A.(2009).Facts or friends? Distinguishing informational and conversational questions in social Q&A sites.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
  23. Hartmann, J.,Huppertz, J.,Schamp, C.,Heitmann, M.(2019).Comparing automated text classification methods.International Journal of Research in Marketing,36(1),20-38.
  24. Hjørland, B.(2008).What is know ledge organization (KO)?.Knowledge Organization,35(2/3),86-101.
  25. Hjørland, B.(2014).Theories of knowledge organization—Theories of knowledge.Knowledge Organization,40(3),169-181.
  26. Hjørland, B.(2018).Indexing: Concepts and theory.Knowledge Organization,45(7),609-639.
  27. Hjørland, B.(2007).Semantics and knowledge organization.Annual Review of Information Science & Technology,41(1),367-405.
  28. Hughes, M.,Li, I.,Kotoulas, S.,Suzumura, T.(2017).Medical text classification using convolutional neural networks.Studies in Health Technology & Informatics,235,246-250.
  29. Ibekwe-Sanjuan, F.,Bowker, G.(2017).Implications of big data for knowledge organization.Knowledge Organization,44(3),187-198.
  30. Jacob, P.(Ed.)(2014).Text-based intelligent systems: Current research and practice in information extraction and retrieval.Psychology Press.
  31. Kalantari, A.,Kamsin, A.,Shamshirband, S.,Gani, A.,Alinejad-Rokny, H.,Chronopoulos, A. T.(2018).Computational intelligence approaches for classification of medical data: State-of-the-art, future challenges and research directions.Neurocomputing,276(7),2-22.
  32. Kamel Boulos, M. N.,Wheeler, S.(2007).The emerging Web 2.0 social software: An enabling suite of sociable technologies in health and health care education.Health Information & Libraries Journal,24(1),2-23.
  33. Khan, J.,Wei, J. S.,Ringnér, M.,Saal, L. H.,Ladanyi, M.,Westermann, F.,Berthold, F.,Schwab, M.,Antonescu, C. R.,Peterson C,Meltzer, P. S.(2001).Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.Nature Medicine,7(6),673-679.
  34. Kim, S.(2013).An exploratory study of usercentered indexing of published biomedical images.Journal of the Medical Library Association,101(1),73-76.
  35. Kim, S.,Oh, J. S.,Oh, S.(2008).Best-answer selection criteria in a social Q&A site from the user-oriented relevance perspective.Proceedings of the American Society for Information Science & Technology,44(1),1-15.
  36. Kim, T.-Y.,Cho, S.-B.(2018).Web traffic anomaly detection using C-LSTM neural networks.Expert Systems with Applications,106,66-76.
  37. Kim, Y.(2014).,未出版
  38. Lewis, D. D.,Ringuette, M.(1994).A comparison of two learning algorithms for text categorization.Third annual symposium on document analysis & information retrieval
  39. Li, Q.,Lu, S. C.(2008).Collaborative tagging applications and approaches.IEEE MultiMedia,15(3),14-21.
  40. Lin, H.,Jia, J.,Guo, Q.,Xue, Y.,Li, Q.,Huang, J.,Cai, L.,Feng, L.(2014).User-level psychological stress detection from social media using deep neural network.Proceedings of the 22nd ACM international conference on multimedia
  41. Liu, F.,Antieau, L. D.,Yu, H.(2011).Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain.Journal of Biomedical Informatics,44(6),1032-1038.
  42. McCallum, A.,Nigam, K.(1998).A comparison of event models for naive Bayes text classification.Learning for text categorization: Papers from the 1998 AAAI workshop Technical Reports Vol. WS-98- 05))
  43. McKinsey Global Institute. (2017). Artificial intelligence the next digital frontier.https://www.calpers.ca.gov/docs/boardagendas/201801/full/day1/06-technology-background.pdf
  44. McRoy, S.,Jones, S.,Kurmally, A.(2016).Toward automated classification of consumers’ cancer-related questions with a new taxonomy of expected answer types.Health Informatics Journal,22(3),523-535.
  45. Messai, R.,Simonet, M.,Bricon-Souf, N.,Mousseau, M.(2010).Characterizing consumer health terminology in the breast cancer field.Studies in Health Technology & Informatics,160(Pt. 2),991-994.
  46. Norton, M.(2010).Introductory concepts in information science.Information Today.
  47. Oh, S.,Worrall, A.(2013).Health answer quality evaluation by librarians, nurses, and users in social Q&A.Library & Information Science Research,35(4),288-298.
  48. Oh, S.,Worrall, A.,Yi, Y. J.(2011).Quality evaluation of health answers in Yahoo! Answers: A comparison between experts and users.Proceedings of the American Society for Information Science & Technology,48(1),1-3.
  49. Oh, S.,Zhang, Y.,Park, M. S.(2016).Cancer information seeking in social question and answer services: Identifying health-related topics in cancer questions on Yahoo! Answers.Information Research,21(3)
  50. Peters, I.(2009).Folksonomies. Indexing and retrieval in Web 2.0.K. G. Saur.
  51. Pierre, J. M.(2001).,未出版
  52. Poikonen, T.,Vakkari, P.(2009).L a y persons’ and professionals’ nutrition related vocabularies and their matching to a general and a specific thesaurus.Journal of Information Science,35(2),232-243.
  53. Reilly, T(2007).What is Web 2.0: Design patterns and business models for the next generation of software.Communications & Strategies,1(1),17.
  54. Sarasohn-Kahn, J. (2008). The wisdom of patients: Health care meets online social media. California Health Care Foundation. https://www.chcf.org/wp-content/uploads/2017/12/PDF-HealthCareSocialMedia.pdf
  55. Sarker, A.,Gonzalez, G.(2015).Portable automatic text classification for adverse drug reaction detection via multi-corpus training.Journal of Biomedical Informatics,53,196-207.
  56. Sebastiani, F.(2002).Machine learning in automated text categorization.ACM Computing Surveys,34(1),1-47.
  57. Seedorff, M.,Peterson, K. J.,Nelsen, L. A.,Cocos, C.,McCormick, J. B.,Chute, C. G.,Pathak, J.(2013).Incorporating expert terminology and disease risk factors into consumer health vocabularies.Pacific Symposium on Biocomputing,421-432.
  58. Shah, C.,Oh, S.,Oh, J. S.(2009).Research agenda for social Q&A.L ibrary & Information Science Research,31(4),205-209.
  59. Shiri, A.(2013).Linked data meets big data: A knowledge organization systems perspective.Advances in Classification Research Online,24(1),16-20.
  60. SimilarWeb. (2019). Answers.yahoo.com traffic overview. https://www.similarweb.com/website/answers.yahoo.com#overview
  61. Smiraglia, R. P.(2015).Domain analysis for knowledge organization: Tools for ontology extraction.Chandos.
  62. Smiraglia, R. P.(Ed.),Lee, H.-L.(Ed.)(2012).Cultural frames of knowledge.Ergon-Verlag.
  63. Smiraglia, R. P.,Cai, X.(2017).Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization.Knowledge Organization,44(3),215-233.
  64. Smith, C. A.,Wicks, P. J.(2008).PatientsLikeMe: Consumer health vocabulary as a folksonomy.AMIA Annual Symposium proceedings
  65. Svenonius, E.(2000).The intellectual foundation of information organization.MIT press.
  66. Tennis, J. T.(2008).Epistemology, theory, and methodology in knowledge organization: Toward a classification, metatheory, and research framework.Knowledge Organization,35(2/3),102-112.
  67. U.S. National Library of Medicine. (2018). Medical subject headings. https://www.nlm.nih.gov/mesh/filelist.html
  68. Weller, K.(2010).Knowledge representation in the social semantic web.
  69. Wolpert, D. H.(1992).Stacked generalization.Neural Networks,5(2),241-259.
  70. Xu, W., & Rudnicky, A. (2000). Can artificial neural network sklearn language models? In G. Dinghua (Chair), Sixth International Conference on Spoken Language Processing (pp. 202-205). International Speech Communication Association. https://www.isca-speech.org/archive/archive_papers/icslp_2000/ i00_1202.pdf
  71. Zeng, Q. T.,Tse, T.(2006).Exploring and developing consumer health vocabularies.Journal of the American Medical Informatics Association,13(1),24-29.
  72. Zhang, G. P.(2000).Neural Networks For Classification:A Survey.Applications and Reviews,30(4),451-462.
  73. Zhang, X., Wu, J., He, Z., Liu, X., & Su, Y. (2018). Medical exam question answering with large- scale reading comprehension. arXiv. https://arxiv.org/abs/1802.10279
  74. Zhao, Y.,Zhang, J.(2017).Consumer health information seeking in social media: A literature review.Health Information & Libraries Journal,34(4),268-283.