Optimizing Prediction of Customer Deposit Bank Marketing by Logistic Regression
Abstract
The study aims to determine the impact of the Logistic Regression method on the classification of customer bank deposits, using a public UCI Bank Marketing dataset, which contains customer-specific information of bank deposit telemarketing activities. Data has a binomial label consisting of 'yes' for subscribers and 'no' for non-subscribers. The data preprocessing phase is done with downsampling to make the amount of data more symmetrical, then data selection and data transformation to ensure that the data used values are consistent, attribute selection to select the attributes most accurately used and give significant influence. Classification is done using the Logistic Regression algorithm. Data is shared using a split method with 90% training data and 10% testing data, with the aim of optimizing the training process. The performance result consists of an accuracy of 88.53%, a classification error value of 11.4%, can be categorized as low, showing only a few errors produced by the algorithm model, a kappa value of 0.68 close to 1, so it is categorized well, a low RMSE rating of 0.3 indicates a model accurate, and a high AUC percentage of 93.4% indicates the correct algority used in this dataset, because it produces a good performance value.
References
[2] V. Bunga Tiara, A. M. Siregar, D. S. K. Kusumaningrum, and T. Rohana, “Bank Customer Segmentation Model Using Machine Learning,” J. Nas. Pendidik. Tek. Inform., vol. 13, no. 1, pp. 66–79, 2024.
[3] S. Zahi and B. Achchab, “Modeling Car Loan Prepayment Using Supervised Machine Learning,” Procedia Comput. Sci., vol. 170, pp. 1128–1133, 2020.
[4] D. Y. Utami, E. Nurlelah, and F. N. Hasan, “Comparison of Neural Network Algorithm, Naive Bayes and Logistic Regression To Find the Highest Accuracy in Diabetes,” vol. 5, no. July, pp. 53–64, 2021.
[5] J. Shepherd, D. Candia, and F. Fuller Bbosa, “A Comparison of Logistic Regression, Modified Logistic Regression and Naïve Bayes Models for Classifying HIV Viral Load Suppression: The Case of Zombo District in Uganda,” London J. Med. Heal. Res., vol. 23, no. 13, 2023.
[6] M. A. Suhendra, “Ketepatan Klasifikasi Pemberian Kartu Keluarga Sejahtera di Kota Semarang Menggunakan Metode Regresi Logistik Biner dan Metode Chaid,” vol. 9, pp. 64–74, 2020.
[7] D. Imamovic, E. Babovic, and N. Bijedic, “Prediction of mortality in patients with cardiovascular disease using data mining methods,” 2020 19th Int. Symp. INFOTEH-JAHORINA, no. March, pp. 1–4, 2020.
[8] M. Ripai, U. Hayati, W. Widyawati, and H. Susana, “Pengklasifikasian Surat Pemberitahuan Pajak Daerah Menggunakan Metode Regresi Logistik Biner Untuk Mengetahui Patuh Dan Tidak Patuh Dalam Pembayaran Pajak Daerah,” vol. 06, no. 01, pp. 27–33, 2022.
[9] R. S. Koszalinski, A. Khojandi, and X. Li, “Missing Data, Data Cleansing, and Treatment From a Primary Study: Implications for Predictive Models,” Comput. Informatics Nurs., no. August, pp. 367–371, 2018.
[10] M. S. Paolella, Linear Models and Time-Series Analysis. usa: Wiley, 2019.
[11] S. Sarosa, Eksplorasi dan Analisis Data Bisnis, 1st ed. Daerah Istimewa Yogyakarta: Penerbit PT Kanisius (Anggota IKAPI), 2023.
[12] A. Mutiarachim and J. Tyoso, “Pelatihan Pembuatan Media Promosi Mudah dan Menarik dengan Aplikasi Canva untuk UMKM di Desa Blerong Kabupaten Demak,” J. Pengabdi. Masy. Nusant., vol. 4, no. 1, pp. 1–8, 2024.
[13] M. North, Data mining for the masses. .
[14] S. Fatima et al., “Data Mining Methods and Obstacles: A Comprehensive Analysis,” J. Comput. Biomed. Informatics, vol. 6, no. 1, 2024.
[15] R. C. Hill, W. E. Griffiths, and G. C. Lim, Principles of Econometrics, 5th ed. Wiley, 2018.
[16] K. N. R. Kumar, Econometrics, 1st ed. USA: CRC Press LLC, 2020.
[17] J. M. Wooldridge, Introductory Econometrics A Modern Approach, 7th ed. Boston: Cengage, 2020.
[18] J. H. Stock and M. W. Watson, Introduction to Econometrics, 4th ed. Pearson, 2020.
[19] M. North, Data Mining for the Masses. 2012.
[20] M. Ismail, H. Abas, R. Ramli, and R. L. Yussof, “A Review of Classification on Credit Repayment Default Behaviour using Machine Learning Algorithms,” Proc. Comput. Sci., vol. 2023, no. December, pp. 1–7, 2024.
[21] D. Ramyachitra and P. Manikandan, “Imbalanced Dataset Classification and Solutions : A Review,” Int. J. Comput. Bus. Res., vol. 5, no. 4, 2014.
[22] Google, “Imbalanced Data,” Google. [Online]. Available: https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data. [Accessed: 22-Jan-2024].
[23] A. Bisri and R. Rachmatika, “Prediksi Kelulusan Mahasiswa Menggunakan Metode Machine Learning Pada Level Data untuk Menangani Ketidakseimbangan Kelas,” 2019.
[24] D. Martin and W. Powers, “Evaluation : From precision , recall and F-measure to ROC , informedness , markedness & correlation EVALUATION : FROM PRECISION , RECALL AND F-MEASURE TO ROC , INFORMEDNESS , MARKEDNESS & CORRELATION,” no. May, 2015.
[25] S. Setiawan, “Membicarakan Precision, Recall, dan F1-Score,” Medium.com, 2020. [Online]. Available: https://stevkarta.medium.com/membicarakan-precision-recall-dan-f1-score-e96d81910354. [Accessed: 22-Jan-2024].
[26] P. Christen, D. J. Hand, and N. Kirielle, “A Review of the F-Measure : Its History , Properties , Criticism ,” vol. 56, no. 3, 2023.
[27] R. T. Silangen and Y. Matdoan, “Klasifikasi Hasil Seleksi Kompotensi Dasar CPNS Menggunakan Metode Decision Tree,” vol. 5, no. 2, pp. 69–75, 2022.
Copyright (c) 2024 Atika Mutiarachim
This work is licensed under a Creative Commons Attribution 4.0 International License.