题名 |
Bringing Search to the Economic Census - The NAPCS Classification Tool |
DOI |
10.6339/24-JDS1147 |
作者 |
Clayton Knappenberger |
关键词 |
natural language processing ; neural networks ; search ; survey collection |
期刊名称 |
Journal of Data Science |
卷期/出版年月 |
22卷3期(2024 / 07 / 01) |
页次 |
409 - 422 |
内容语文 |
英文 |
中文摘要 |
The North American Product Classification System (NAPCS) was first introduced in the 2017 Economic Census and provides greater detail on the range of products and services offered by businesses than what was previously available with just an industry code. In the 2022 Economic Census, NAPCS consisted of 7,234 codes and respondents often found that they were unable to identify correct NAPCS codes for their business, leaving instead written descriptions of their products and services. Over one million of these needed to be reviewed by Census analysts in the 2017 Economic Census. The Smart Instrument NAPCS Classification Tool (SINCT) offers respondents a low latency search engine to find appropriate NAPCS codes based on a written description of their products and services. SINCT uses a neural network document embedding model (doc2vec) to embed respondent searches in a numerical space and then identifies NAPCS codes that are close to the search text. This paper shows one way in which machine learning can improve the survey respondent experience and reduce the amount of expensive manual processing that is necessary after data collection. We also show how relatively simple tools can achieve an estimated 72% top-ten accuracy with thousands of possible classes, limited training data, and strict latency requirements. |
主题分类 |
基礎與應用科學 >
資訊科學 基礎與應用科學 > 統計 |