Classification of Village Development Index at Regency/Municipality Level Using Bayesian Network Approach with K-Means Discretization
DOI:
https://doi.org/10.34123/jurnalasks.v14i1.390Abstract
Village development has been one of the most important targets of government policies in Indonesia in order to fully optimize its potential. Under Law 06 Year 2014 on Villages, local governments from regency/municipality level to village level are required to understand their respective village potentials in order to increase the village potentials in their regions. In this paper, we build and analyze the Bayesian network methods to classify the village development index at regency/municipality and gain a better understanding of the causal relationships between independent variables of the village potential status. Using a web scraping method of information retrieval, data are collected from the Ministry of Village, Development of Disadvantaged Regions, and Transmigration (Kemendesa) website, and Village Development Evaluation (Indeks Pembangunan Desa—IPD) of Statistics Indonesia (BPS) publication in 2018 data. Further, we combine the discretization using the K-Means clustering method to handle the continuous nature of retrieved data. An extensive comparison of different learning structures of the Bayesian Network is performed, which includes the learning structure of Naive Bayes, Maximum Spanning Tree with weighted Spearman correlation coefficient, Hill Climbing search, and Tabu Search during the construction of Bayesian networks. For fairness evaluation, all constructed models are built using 80% data as a training set and the remaining 20% as a testing set. The results show that Bayesian network approach can be applied in village development index status classification where the construction using maximum spanning tree with K-Means data discretization gain the best performance of 90.69% accuracy.
Downloads
References
[2] Bappenas, “Narration: The National Medium-Term Development Plan for 2020-2024”, pp.1-320, 2020.
[3] L.A. Schintler, C.L. McNeely. “Encyclopedia of Big Data”, Springer International Publishing AG. 2017. DOI 10.1007/978-3-319-32001-4_483-1
[4] Effendy, D. A., Kusrini, K., & Sudarmawan, S. (2017). Algoritma K-Means untuk Diskretisasi Numerik Kontinyu Pada Klasifikasi Intrusion Detection System Menggunakan Naive Bayes. E-Proceedings KNS&I STIKOM Bali, 61-66.
[5] Purwadi, I. (2009). Penerapan bayesian network dalam penetapan daerah tertinggal [skripsi]. Bogor: Departemen Statistika Fakultas Matematika dan Ilmu Pengetahuan Alam, Institut Pertanian Bogor.
[6] Zhang, Z., Zhang, J., Wei, Z., Ren, H., Song, W., Pan, J., ... & Qiu, L. (2019). Application of tabu search-based Bayesian networks in exploring related factors of liver cirrhosis complicated with hepatic encephalopathy and disease identification. Scientific reports, 9(1), 1-8. [5] Gámez, J. A., Mateo, J. L., & Puerta, J. M. (2011). Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Mining and Knowledge Discovery, 22(1), 106-148.
[7] Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine learning, 29(2), 131-163.
[8] Ang, S. L., Ong, H. C., & Low, H. C. (2016). Classification Using the General Bayesian Network. Pertanika Journal of Science & Technology, 24(1).
[9] De Blasi, R. A., Campagna, G., & Finazzi, S. (2021). A dynamic Bayesian network model for predicting organ failure associations without predefining outcomes. Plos one, 16(4), e0250787.
[10] Jun-wu, L. I., Guo-ning, L. I., & Ding, Z. H. A. N. G. (2020). Application of CS-PSO algorithm in Bayesian network structure learning. Journal of Measurement Science & Instrumentation, 11(1).
[11] Berrar, D. (2018). Bayes’ theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics; Elsevier Science Publisher: Amsterdam, The Netherlands, 403-412.
[12] Khan, A., & Zubair, S. (2020, July). Expansion of Regularized Kmeans Discretization Machine Learning Approach in Prognosis of Dementia Progression. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE.
[13] Sari, D. P., Rosadi, D., & Effendie, A. R. (2019). K-means and bayesian networks to determine building damage levels. Telkomnika, 17(2).
[14] Sari, D. P., Rosadi, D., Effendie, A. R., & Danardono, D. (2021). Discretization methods for Bayesian networks in the case of the earthquake. Bulletin of Electrical Engineering and Informatics, 10(1), 299-307.
[15] Maryono, D., Hatta, P., & Ariyuana, R. (2018, March). Implementation of numerical attribute discretization for outlier detection on mixed attribute dataset. In 2018 International Conference on Information and Communications Technology (ICOIACT) (pp. 715-718). IEEE.
[16] Gerber, S., Pospisil, L., Navandar, M., & Horenko, I. (2020). Low-cost scalable discretization, prediction, and feature selection for complex systems. Science advances, 6(5), eaaw0961.