Comparative Analysis of Classification of K-Nearest Neighbor (KNN) Algorithm and Decision Tree in Breast Cancer Using Rapidminer

Ema Rosida; Andri Firmansyah; Suherman

doi:10.59890/ijarss.v2i12.48

Authors

Ema Rosida Universitas Pelita Bangsa, Bekasi
Andri Firmansyah Universitas Pelita Bangsa, Bekasi
Suherman Universitas Pelita Bangsa, Bekasi

DOI:

https://doi.org/10.59890/ijarss.v2i12.48

Keywords:

Breast Cancer, K-Nearest Neighbor (KNN), Decision Tree, ROC Curve, Matrix Confusion

Abstract

Breast cancer ranks first with the highest number of cancer cases in Indonesia, and is also one of the first contributors to cancer deaths. Breast cancer is one of the most commonly suffered by Indonesian people, especially women. Breast cancer is usually divided into two types: benign, or usually called benign and malignant, or usually called malignant. Benign cancer is usually characterized by small rounded and soft lumps. The prevalence of cancer is moving rapidly in the developing world due to the increasing number of lifestyles that can trigger the disease, such as smoking, physical inactivity, and going on a “westernized” diet. The aim of this study is to obtain accurate algorithm information with breast cancer dataset and to provide pattern information or model for early detection of breast cancer and to see results from comparison of 2 decision tree algorithms C4.5 and K-Nearest Neighbor (KNN) in breast cancer classification. This research method uses Cross-Standard Industry for Data Mining (CRISP-DM) consisting of Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. In the fields of medicine, finance, marketing, and social science, data mining is a popular tool for performing proven analysis. The study will compare Decision Tree C4.5 and K-Nearest Neighbor (KNN) to classify breast cancer. The issue of this study is which algorithms have a high degree of accuracy that can be used with breast cancer datasets and can provide information about patterns or models for early detection of breast cancer. The results of the study conducted using CRISP-DM showed that K-Nearest Neighbor (KNN) has the highest accuracy value with 97.14% and its AUC value is 0.0976. Its AUC value also indicates an excellent classification, with AUC values between 0.90 and 1.00.

References

1. N. A. Madyaningrum and Sulastri, “Analisa Prediksi Kekambuhan Kanker Payudara Dengan Menggunakan K-Nearest Neighbor,” Proceeding SINTAK 2019, pp. 180–185, 2019.

2. Fahrurrozi and Wasilah, “Deteksi Dini Kanker Payudara Menggunakan Algoritma K-Nearest Neighbor (KNN) Dan Decision Tree C-45,” Teknika, vol. 17, no. 2, pp. 427–434, 2023, [Online]. Available: https://jurnal.polsri.ac.id/index.php/teknika/article/view/7565

3. B. A. Farahdiba, D. Yusuf, and S. Nugroho, “Klasifikasi Kanker Payudara Menggunakan Algoritma Gain Ratio.”

4. V. Angkasa and J. J. Pangaribuan, “Information System Development Komparasi Tingkat Akurasi Random Forest Dan Knn Untuk Mendiagnosis Penyakit Kanker Payudara,” J. Inf. Syst. Dev., vol. 7, no. 1, pp. 37–38, 2022, [Online]. Available: http://dx.doi.org/10.19166/xxxx

5. S. A. Mohammed, S. Darrab, S. A. Noaman, and G. Saake, “Analysis of breast cancer detection using different machine learning techniques,” in Communications in Computer and Information Science, Springer, 2020, pp. 108–117. doi: 10.1007/978-981-15-7205-0_10.

6. Y. Findawati, I. R. I. Astutik, A. S. Fitroni, I. Indrawati, and N. Yuniasih, “Comparative analysis of Naïve Bayes, K Nearest Neighbor and C.45 method in weather forecast,” J. Phys. Conf. Ser., vol. 1402, no. 6, 2019, doi: 10.1088/1742-6596/1402/6/066046.

7. D. Derisma and F. Febrian, “Perbandingan Teknik Klasifikasi Neural Network, Support Vector Machine, dan Naive Bayes dalam Mendeteksi Kanker Payudara,” Bina Insa. Ict J., vol. 7, no. 1, p. 53, 2020, doi: 10.51211/biict.v7i1.1343.

8. M. Abdul Jabbar, E. Hasmin, C. Susanto, W. Musu, and I. Artikel, “Komparasi Algoritma Decision Tree, Naive Bayes, dan K-Nearest Neighbors dalam Klasifikasi Kanker Payudara Comparison of Decision Tree Algorithms, Naive Bayes, and K-Nearest Neighbors in Breast Cancer Classification,” Oktober, vol. 14, no. 3, pp. 258–270, 2022, [Online]. Available: https://www.doi.org/10.22303/csrid.14.3.2022.258- 270

9. Hidayati, F. S. Rahmat Suwandi, D. Ediana, and F. Keperawatan dan Kesehatan Masyarakat Universitas Prima Nusantara Bukittinggi Sumatera Barat, “Pengalaman Pasien Pertama Kali Terdiagnosis Kanker Paru Ditinjau Dari Teori the Five Stages of Grieving Article Information a B S T R a K,” vol. 14, pp. 70–073, 2023, [Online]. Available: http://ejurnal.stikesprimanusantara.ac.id/

10. R. Wulandari, W. Wijayanti, E. Hapsari, D. Widyastutik, and S. Putri H, “Upaya Peningkatan Ketrampilan Kader dalam Deteksi Dini kanker Payudara dengan Pemeriksaan Payudara Sendiri (SADARI) di Posyandu Tanggul Asri RW 10 Kelurahan Kadipiro Kecamatan Banjarsari Surakarta,” J. Salam Sehat Masy., vol. 3, no. 2, pp. 47–52, 2022, doi: 10.22437/jssm.v3i2.18171.

11. D. R. Aini Silvi Astuti, Yunia Renny Andhikatias, “Efektivitas Pendidikan Kesehatan Sadari Terhadap Tingkat Pengetahuan Remaja Putri Tentang Deteksi Dini Kanker Payudara Di Tegalsari Bendungan,” Angew. Chemie Int. Ed. 6(11), 951–952., vol. 2, 2019.

12. N. Destria, “Sistem Pendukung Keputusan Perusahaan yang Berprestasi dalam Sektor Indutri dengan Metode Weighted Product,” J. Ris. Sist. Inf. dan Teknol. Inf., vol. 3, no. 2, pp. 1–11, 2021, doi: 10.52005/jursistekni.v3i2.88.

13. Nawangsih, I. Melani, S. Fauziah, and A. I. Artikel, “Pelita Teknologi Prediksi Pengangkatan Karyawan Dengan Metode Algoritma C5.0 (Studi Kasus Pt. Mataram Cakra Buana Agung,” J. Pelita Teknol., vol. 16, no. 2, pp. 24–33, 2021.

14. F. Kirsten, Prediction (of metaphor). 2021.

15. T. C. F. Polo and H. A. Miot, “Aplicações da curva ROC em estudos clínicos e experimentais,” J. Vasc. Bras., vol. 19, pp. 13–16, 2020, doi: 10.1590/1677-5449.200186.

16. K. Erwansyah, “Implementasi Data Mining Untuk Menganalisa Hubungan Data Penjualan Produk Bahan Kimia Terhadap Persedian Stok Barang Menggunakan Algoritma FP ( Frequent Pattern ) Growth Pada PT . Grand Multi Chemicals,” J. Teknol. Sist. Inf. dan Sist. Komput. TGD (J- SISKO TECH), vol. 2, no. 2, pp. 30–40, 2019.

17. E. Manurung1 and P. S. Hasugian2, “DATA MINING TINGKAT PESANAN INVENTARIS KANTOR MENGGUNAKAN ALGORITMA APRIORI PADA KEPOLISIAN DAERAH SUMATERA UTARA,” 2019.

18. L. Setiyani, M. Wahidin, D. Awaludin, and S. Purwani, “Analisis Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Metode Data Mining Naïve Bayes : Systematic Review,” Fakt. Exacta, vol. 13, no. 1, p. 35, 2020, doi: 10.30998/faktorexacta.v13i1.5548.

19. H. Nasrullah, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Produk Laris,” J. Ilm. Ilmu Komput., vol. 7, no. 2, pp. 45–51, 2021, doi: 10.35329/jiik.v7i2.203.

20. Y. E. Fadrial, “Algoritma Naive Bayes Untuk Mencari Perkiraan Waktu Studi Mahasiswa Naive Bayes Algorithm for Finding Student Estimated Time Students,” J. Inf. Technol. Comput. Sci., vol. 4, no. 1, pp. 20–29, 2021.

21. B. Hermanto and A. Jaelani, “PENERAPAN DATA MINING UNTUK PREDIKSI PENERIMA BANTUAN PANGAN NON TUNAI (BPNT) DI DESA WANACALA MENGGUNAKAN METODE NAÏVE BAYES.”

22. P. Putra, A. M. H. Pardede, and S. Syahputra, “Analisis Metode K-Nearest Neighbour (Knn) Dalam Klasifikasi Data Iris Bunga,” J. Tek. Inform. Kaputama, vol. 6, no. 1, pp. 297–305, 2022, [Online]. Available: https://garuda.kemdikbud.go.id/documents/detail/2458300

23. Mustakim, R. Hastarimasuci, P. Papilo, Zarkasih, Zaitun, and A. Nazir, “Variable Selection to Determine Majors of Student using K-Nearest Neighbor and Naïve Bayes Classifier Algorithm,” J. Phys. Conf. Ser., vol. 1363, no. 1, 2019, doi: 10.1088/1742-6596/1363/1/012057.

24. M. Reza Noviansyah, T. Rismawan, D. Marisa Midyanti, J. Sistem Komputer, and F. H. MIPA Universitas Tanjungpura Jl Hadari Nawawi, “Penerapan Data Mining Menggunakan Metode K-Nearest Neighbor Untuk Klasifikasi Indeks Cuaca Kebakaran Berdasarkan Data Aws (Automatic Weather Station) (Studi Kasus: Kabupaten Kubu Raya),” J. Coding, Sist. Komput. Untan, vol. 06, no. 2, pp. 48–56, 2018.

25. D. Setiawati, I. Taufik, J. Jumadi, and W. B. Zulfikar, “Klasifikasi Terjemahan Ayat Al-Quran Tentang Ilmu Sains Menggunakan Algoritma Decision Tree Berbasis Mobile,” J. Online Inform., vol. 1, no. 1, p. 24, 2016, doi: 10.15575/join.v1i1.7.

26. K. P. Keputusan, “Rujukan Decision Tree 3,” vol. 11, no. November, pp. 243–257, 2020.

27. R. Rustam, S. Rahmatullah, S. Supriyato, and S. Wahyuni, “Penerapan Data Mining Untuk Prediksi Penjualan Produk Triplek Pada Pt Puncak Menara Hijau Mas,” J. Inf. dan Komput., vol. 8, no. 2, pp. 75–86, 2020, doi: 10.35959/jik.v8i2.186.

28. B. G. Sudarsono, M. I. Leo, A. Santoso, and F. Hendrawan, “Analisis Data Mining Data Netflix Menggunakan Aplikasi Rapid Miner,” JBASE - J. Bus. Audit Inf. Syst., vol. 4, no. 1, pp. 13–21, 2021, doi: 10.30813/jbase.v4i1.2729.

29. S. James and C. Alley, “Working Paper Series by,” vol. 55, no. 97, pp. 1023– 1038, 2007.

30. S. Haryati, A. Sudarsono, and E. Suryana, “Implementasi Data Mining Untuk Memprediksi Masa Studi Mahasiswa Menggunakan Algoritma C4.5 (Studi Kasus: Universitas Dehasen Bengkulu),” J. Media Infotama, vol. 11, no. 2, pp. 130–138, 2015.