This article elaborates on the possible best practice of developing databases for institutional research and analysis, based on the knowledge of Educational Science, Library Science, and Information Engineering, years of experience in developing educational databases, and a recent survey of related technology and products. Several developing options are compared to show their benefits and disadvantages under different conditions. Three representative analysis tasks are reported to verify and show the synergy of the mentioned ideas and experience. In particular, this article proposes a sustainable workflow: (1) data collection and aggregation, (2) cataloguing, (3) regulation, (4) archiving, and (5) usage, and describes their must-known caveats. The application situations of data normalization and de-normalization are described. Capability of domestic vendors of related products is briefly mentioned based on a proof-of-concept testing. And finally, real-world institutional analyses are conducted to share our experience. Overall, the first four processes in the above workflow are most time-consuming and costly. Once data have been well prepared, recent visualization analysis tools allow users to easily discover meaningful patterns and inspire hypotheses, and allow them to explore the database to find evidence to support their hypotheses and decisions. In the future, we expect that event evolution simulation techniques, which allow users to foresee the results given various input scenarios, could play an important role in educational data analysis, in addition to the maturing data visualization tools.
紀馥安、許清芳(2015)。運用開放軟體R 處理大型教育資料庫。當代教育研究季刊,23(4),121-153。
詹文男、羅瑋君( 2001 ) 。高等資料庫報告- 資料倉儲。取自http://www.mgt.ncu.edu.tw/~ylchen/database/DataWarehousing.doc[Chan, W. N., & Lo, W. C. (2001). Advanced database report-Data warehousing.Retrieved from http://www.mgt.ncu.edu.tw/~ylchen/database/DataWarehousing.doc]
Borthakur, D. (2007). The hadoop distributed file system: Architecture and design. Hadoop Project Website. Retrieved from https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.4/docs/hdfs_design.pdf
Jones, L. (2015). How to build a data warehouse [The Higher Education Data Warehousing Forum]. Retrieved from http://hedw.org/hedwpresentation/howtobuild-a-data-warehouse/
鍾沛原、曾賢寶、楊嘉麗、李柏毅、蔡一郎(2014)。電腦機房異地備援機制參考指引。取自http://download.icst.org.tw/attachfilecomm/我國電腦機房異地備援機制參考指引.pdf[Chung, P. Y., Tseng, H. B., Yang, J. L., Lee, B. Y., & Tsai, Y. L. (2014). Reference guide to remote backup for computer & data center. Retrieved from http://download.icst.org.tw/attachfilecomm/%E6%88%91%E5%9C%8B%E9%9B%BB%E8%85%A6%E6%A9%9F%E6%88%BF%E7%95%B0%E5%9C%B0%E5%82%99%E6%8F%B4%E6%A9%9F%E5%88%B6%E5%8F%83%E8%80%83%E6%8C%87%E5%BC%95.pdf]
李欣宜(2015 年2 月17 日)。美國Top 4 技術長寶立明:大數據即將在五年內消失。數位時代。取自http://www.bnext.com.tw/。[Li, X. Y. (2015, February 17). United States top 4 chief technology officer Stephen Brobst: Big data is about to disappear within five years. Business Next. Retrieved from http://www.bnext.com.tw/]
Data Quality Campaign. (2014). Teacher data literacy: It's about time-A brief for state policymakers. Retrieved from http://dataqualitycampaign.org/wp-content/uploads/2015/06/DQC-Data-Literacy-Brief.pdf
Bahr, P. R.(2009).Educational attainment as process: Using hierarchical discrete-time event history analysis to model rate of progress.Research in Higher Education,50(7),691-714.
Codd, E. F.(1970).A relational model of data for large shared data banks.Communications of the ACM,13(6),377-387.
Dey, E. L.(1997).Working with low survey response rates: The efficacy of weighting adjustments.Research in Higher Education,38(2),215-227.
Diem, A.,Wolter, S. C.(2013).The use of bibliometrics to measure research performance in education sciences.Research in Higher Education,54(1),86-114.
Enslein, K.(Ed.),Ralston, A.(Ed.),Wilf, H. S.(Ed.)(1997).Statistical methods for digital computers.New York, NY:Wiley.
Feldman, R.,Sanger, J.(2006).The text mining handbook: Advanced approaches in analyzing unstructured data.Cambridge, England:Cambridge University Press.
Fellegi, I. P.,Sunter, A. B.(1969).A theory for record linkage.Journal of the American Statistical Association,64(328),1183-1210.
Ghemawat, S.,Gobioff, H.,Leung, S.-T.(2003).The Google file system.ACM SIGOPS Operating Systems Review,37(5),29-43.
Graf, S.,Kinshuk,Liu, T.-C.(2009).Supporting teachers in identifying students' learning styles in learning management systems: An automatic student modelling approach modelling approach.Educational Technology & Society,12(4),3-14.
Han, J.,Kamber, M.,Pei, J.(2011).Data mining: Concepts and techniques.Burlington, MA:Morgan Kaufmann.
Henschen, D.(2014).16 top big data analytics platforms.InformationWeek
Hossler, D.,Kuh, G.,Olsen, D.(2001).Finding fruit on the vines: Using higher education research and institutional research to guide institutional policies and strategies.Research in Higher Education,42(3),211-221.
Howard, R. D.(Ed.),McLaughlin, G. W.(Ed.),Knight, W. E.(Ed.)(2012).The handbook of institutional research.Hoboken, NJ:John Wiley & Sons.
Inmon, W. H.(1992).Building the data warehouse.Hoboken, NJ:John Wiley & Sons.
Jin, L.,Li, C.,Mehrotra, S.(2003).Efficient record linkage in large data sets.Proceedings of the Eighth International Conference on Database Systems for Advanced Applications,137-146.
Kuhn, D.(Ed.)(2013).Pro oracle database 12c administration.New York, NY:Apress.
Larose, D. T.(2014).Discovering knowledge in data: An introduction to data mining.Hoboken, NJ:Wiley.
Law, A. M.,Kelton, W. D.(2000).Simulation modeling and analysis.New York, NY:McGraw Hill.
Müller, H.,Freytag, J.-C.(2003).Problems, methods and challenges in comprehensive data cleansing.
Narayan, R.(1988).Data dictionary: Implementation, use, and maintenance.Upper Saddle River, NJ:Prentice-Hall.
Nonaka, I.(1994).A dynamic theory of organizational knowledge creation.Organization Science,5(1),14-37.
Plaisant, C.(2004).The challenge of information visualization evaluation.Proceedings of the Working Conference on Advanced Visual Interfaces
Rice, J. A.(2006).Mathematical statistics and data analysis.Boston, MA:Duxbury Press.
Rios-Aguilar, C.(2015).Using big (and critical) data to unmask inequities in community colleges.New Directions for Institutional Research,163,43-57.
Schoenecker, C.(2010).The benefits of a comprehensive, integrated, and granular data system for community and technical college institutional research.New Directions for Institutional Research,147,81-108.
Small, H. G.,Koenig, M. E. D.(1977).Journal clustering using a bibliographic coupling method.Information Processing & Management,13(5),277-288.
Tillett, B. B.(2004).Authority control: State of the art and new perspectives.Cataloging & Classification Quarterly,38(3-4),23-41.
Tseng, Y.-H.,Lin, C.-J.,Lin, Y.-I.(2007).Text mining techniques for patent analysis.Information Processing & Management,43(5),1216-1247.
Tseng, Y.-H.,Tsay, M.-Y.(2013).Journal clustering of Library and Information Science for subfield delineation using the bibliometric analysis toolkit: CATAR.Scientometrics,95(2),503-528.
Ware, C.(2012).Information visualization: Perception for design.Burlington, MA:Morgan Kaufmann.
Zhang, H.,Wang, Y.,Han, J.(2011).Middleware design for integrating relational database and NOSQL based on data dictionary.Proceedings of the International Conference on Transportation, Mechanical, and Electrical Engineering
彭森明(2013)。高等教育校務研究的理念與應用。臺北市=Taipei, Taiwan:高等教育=Higher Education。
彭森明(2010)。大學生學習成果評量:理論、實務與應用。臺北市=Taipei, Taiwan:高等教育=Higher Education。