Comparison of Density-Based Spatial Clustering of Applications with Noise (DBSCAN), K-Means and X-Means Algorithms on Shopping Trends Data

Authors

  • Vina Wulandari Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
  • Yulia Syarif Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
  • Zhevin Alfian Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
  • Muhammad Adil Althof Sivas Cumhuriyet University, Turkey
  • Maylina Mufidah Sakarya University, Turkey

DOI:

https://doi.org/10.57152/ijatis.v1i1.1135

Keywords:

Davies-Bouldin Index, DBSCAN, K-Means, Shopping Data, X-Means

Abstract

This study extensively compares the efficacy of three clustering algorithms of DBSCAN, K-Means, and X-Means in analyzing shopping trend data, utilizing the Davies-Bouldin Index (DBI) for group validity assessment. The dataset, sourced from Kaggle.com, encompasses various customer attributes. Results indicate that the DBSCAN algorithm demonstrates superior cluster validity, outperforming K-Means and X-Means. Specifically, with an Eps value of 0.3 and MinPts value of 3, DBSCAN achieves an optimal DBI value of 0.1973. K-Means follows with a DBI value of 2.2958, and X-Means attains its best value (2.5663) with k=3. This research underscores the pivotal role of clustering algorithms in understanding shopping trends and customer preferences, offering valuable insights into their comparative performance.

References

R. Kirk, “Lifestyle centers, the next boom and bust after shopping malls? Governance, public-private partnerships, and Guy Debord’s spectacle in Dallas-Fort Worth,” Cities, vol. 133, Feb. 2023, doi: 10.1016/j.cities.2022.104155.

M. Akkaya, “Understanding the impacts of lifestyle segmentation & perceived value on brand purchase intention: An empirical study in different product categories,” European Research on Management and Business Economics, vol. 27, no. 3, Sep. 2021, doi: 10.1016/j.iedeen.2021.100155.

M. K. Gupta and P. Chandra, “A comprehensive survey of data mining,” International Journal of Information Technology, vol. 12, no. 4, pp. 1243–1257, Dec. 2020, doi: 10.1007/s41870-020-00427-7.

P. D. Inas Azizah, “Penerapan Probabilistic Neural Network pada Klasifikasi Berat Bayi Baru Lahir,” Jurnal Riset Statistika, vol. 1, no. 2, pp. 152–159, Feb. 2022, doi: 10.29313/jrs.v1i2.524.

K. P. Sinaga and M.-S. Yang, “Unsupervised K-Means Clustering Algorithm,” IEEE Access, vol. 8, pp. 80716–80727, 2020, doi: 10.1109/ACCESS.2020.2988796.

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electronics (Switzerland), vol. 9, no. 8. MDPI AG, pp. 1–12, Aug. 01, 2020. doi: 10.3390/electronics9081295.

M. Mughnyanti, S. Efendi, and M. Zarlis, “Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation,” IOP Conf Ser Mater Sci Eng, vol. 725, no. 1, p. 012128, Jan. 2020, doi: 10.1088/1757-899X/725/1/012128.

K. S. Pranata, A. A. S. Gunawan, and F. L. Gaol, “Development clustering system IDX company with k-means algorithm and DBSCAN based on fundamental indicator and ESG,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 319–327. doi: 10.1016/j.procs.2022.12.142.

S. Chowdhury, N. Helian, and R. Cordeiro de Amorim, “Feature weighting in DBSCAN using reverse nearest neighbours,” Pattern Recognit, vol. 137, May 2023, doi: 10.1016/j.patcog.2023.109314.

J. Penerapan, T. Informasi, D. Komunikasi, G. B. Kaligis, and S. Yulianto, “IT-Explore Analisa Perbandingan Algoritma K-Means, K-Medoids, dan X-Means Untuk Pengelompokkan Kinerja Pegawai (Studi Kasus: Sekretariat DPRD Provinsi Sulawesi Utara),” 2022.

D. Deng, “DBSCAN Clustering Algorithm Based on Density,” in 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), IEEE, Sep. 2020, pp. 949–953. doi: 10.1109/IFEEA51475.2020.00199.

Y. Roh, G. Heo, and S. E. Whang, “A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective,” IEEE Trans Knowl Data Eng, vol. 33, no. 4, pp. 1328–1347, Apr. 2021, doi: 10.1109/TKDE.2019.2946162.

H. Huang, B. Wei, J. Dai, and W. Ke, “Data Preprocessing Method For The Analysis Of Incomplete Data On Students In Poverty,” in 2020 16th International Conference on Computational Intelligence and Security (CIS), IEEE, Nov. 2020, pp. 248–252. doi: 10.1109/CIS52066.2020.00060.

S.-A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, “Data preprocessing in predictive data mining,” Knowl Eng Rev, vol. 34, p. e1, Jan. 2019, doi: 10.1017/S026988891800036X.

M. N. Sidqi, D. P. Rini, and S. Samsuryadi, “Optimization of Deep Neural Networks with Particle Swarm Optimization Algorithm for Liver Disease Classification,” Computer Engineering and Applications, vol. 12, no. 1, 2023.

A. Gere, “Recommendations for validating hierarchical clustering in consumer sensory projects,” Curr Res Food Sci, vol. 6, Jan. 2023, doi: 10.1016/j.crfs.2023.100522.

J. Fang, Z. Xie, H. Cheng, B. Fan, H. Xu, and P. Li, “Anomaly detection of diabetes data based on hierarchical clustering and CNN,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 71–78. doi: 10.1016/j.procs.2022.01.010.

R. Cordeiro de Amorim and V. Makarenkov, “On k-means iterations and Gaussian clusters,” Neurocomputing, vol. 553, Oct. 2023, doi: 10.1016/j.neucom.2023.126547.

Q. Huang, S. Chen, and Y. Li, “Selection of seismic noise recording by K-means,” Case Studies in Construction Materials, vol. 19, Dec. 2023, doi: 10.1016/j.cscm.2023.e02363.

M. M. Putri, C. Dewi, E. Permata Siam, G. Asri Wijayanti, N. Aulia, and R. Nooraeni, “Comparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020 Komparasi DBSCAN dan K-Means Clustering pada Pengelompokan Status Desa di Jawa Tengah Tahun 2020,” Jurnal Matematika, Statistika dan Komputasi, vol. 17, no. 3, pp. 394–404, 2021, doi: 10.20956/j.v17i3.11704.

M. Civera, L. Sibille, L. Zanotti Fragonara, and R. Ceravolo, “A DBSCAN-based automated operational modal analysis algorithm for bridge monitoring,” Measurement (Lond), vol. 208, Feb. 2023, doi: 10.1016/j.measurement.2023.112451.

W. Jing, C. Zhao, and C. Jiang, “An improvement method of DBSCAN algorithm on cloud computing,” Procedia Comput Sci, vol. 147, pp. 596–604, 2019, doi: 10.1016/j.procs.2019.01.208.

N. Nurhaliza and Mustakim, “Clustering of Data Covid-19 Cases in the World Using DBSCAN Algorithms,” IJIRSE: Indonesian Journal of Informatic Research and Software Engineering, vol. 1, no. 1, pp. 01–08, 2021.

M. Mughnyanti, S. Efendi, and M. Zarlis, “Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jan. 2020. doi: 10.1088/1757-899X/725/1/012128.

R. Adhitama, A. Burhanuddin, and A. Febriani, “Penerapan X Means Clustering Pada UMKM Kab Banyumas Yang Mendukung Mega Shifting Consumer Behavior Akibat Covid-19,” Journal of Informatics, Information System, Software Engineering and Applications, vol. 4, no. 1, pp. 71–80, 2021.

M. Usama et al., “Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges,” IEEE Access, vol. 7, pp. 65579–65615, 2019, doi: 10.1109/ACCESS.2019.2916648.

I. Rizky Mahartika and A. Wibowo, “Data Mining Klasterisasi dengan Algoritme K-Means untuk Pengelompokkan Provinsi Berdasarkan Konsumsi Bahan Bakar Minyak Nasional,” in Prosiding Seminar Nasional Sisfotek (Sistem Informasi dan Teknologi), 2019, pp. 89–91.

Downloads

Published

2024-01-10