Analisis Sentimen Coretax: Perbandingan Pelabelan Data Manual, Transformers-Based, dan Lexicon-Based pada Performa IndoBERT
Sentiment Analysis of Coretax: A Comparison of Manual, Transformers-Based, and Lexicon-Based Data Labeling on IndoBERT Performance
Keywords:
Analisis Sentiment, Coretax, Indobert, Pelabelan Data, TransformerAbstract
Analisis sentimen terhadap opini publik di media sosial menjadi tantangan signifikan karena kompleksitas bahasa informal dan volume data yang besar. Penelitian ini bertujuan untuk mengevaluasi pengaruh lima pendekatan pelabelan data manual, IndoBERT , IndoBERT weet, RoBERTa , dan InSet Lexicon terhadap performa model Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) dalam klasifikasi sentimen terkait isu Coretax. Sebanyak 8.035 tweet dikumpulkan, diproses, dan dilabeli menggunakan masing-masing pendekatan. Dataset hasil pelabelan kemudian digunakan untuk melatih ulang model IndoBERT, yang dievaluasi menggunakan metrik akurasi, F1-score, confusion matrix, dan kurva Receiver Operating Characteristic-Area Under the Curve (ROC-AUC). Hasil menunjukkan bahwa pelabelan otomatis menggunakan Indonesian Bidirectional Encoder Representations from Transformers for Tweet (IndoBERTweet) menghasilkan metrik tertinggi F1-Score (0,9802), tetapi mengalami dominasi kelas netral yang menunjukkan overfitting. Pelabelan manual menghasilkan distribusi kelas yang lebih merata meskipun dengan metrik lebih rendah F1-Score (0,8684), sedangkan Robustly Optimized BERT Pretraining Approach (RoBERTa) menunjukkan keseimbangan terbaik antara performa metrik dan distribusi label. InSet Lexicon dan IndoBERT menunjukkan kecenderungan bias terhadap kelas tertentu. Simpulan dari penelitian ini menegaskan bahwa efektivitas pelabelan tidak hanya ditentukan oleh skor metrik, tetapi juga oleh distribusi kelas yang seimbang untuk menghasilkan model yang adil dan dapat digeneralisasi.
Downloads
References
J. Mannayong, M. R. S, H. Herling, And M. Faisal, “Transformasi Digital Dan Partisipasi Masyarakat: Mewujudkan Keterlibatan Publik Yang Lebih Aktif,” J. Adm. Publik, Vol. 20, No. 1, Pp. 53–75, Jun. 2024, Doi: 10.52316/Jap.V20i1.260.
Y. O. Nainggolan, E. S. Sihombing, And S. Gulo, “Media Sebagai Agen Perubahan?: Studi Peran Media Dalam Pemberdayaan Masyarakat Dalam Era Digital,” Vol. 01, No. 03, Pp. 156–163, 2025.
F. Arianty, “Implementation Challenges And Opportunities Coretax Administration System On The Efficiency Of Tax Administration,” J. Vokasi Indones., Vol. 12, No. 2, P. 98, Dec. 2024, Doi: 10.7454/Jvi.V12i2.1227.
H. T. Ilyas, S. D. Devano, And S. H. Herdianti, “The Effect Of Tax Planning And The Implementation Of The Core Tax Administration System On Taxpayer Compliance,” Eduvest - J. Univers. Stud., Vol. 5, No. 3, Pp. 3326–3338, Mar. 2025, Doi: 10.59188/Eduvest.V5i3.44798.
C. Korat And A. Munandar, “Penerapan Core Tax Administration System (Ctas) Langkah Meningkatkan Kepatuhan Perpajakan Di Indonesia,” J. Ris. Akunt. Politala, Vol. 8, No. 1, Pp. 16–29, Mar. 2025, Doi: 10.34128/Jra.V8i1.453.
D. T. Della Nabila, L. T. Jumaidi, B. Anggun, H. Lestari, And M. Firmansyah, “Jurnal Abdimas?: Pengabdian Dan Pengembangan Masyarakat Penyederhanaan Proses Perpajakan Melalui Penggunaan Core Tax Administration System Sebagai Sistem Pajak Terbaru,” Vol. 6, No. 2, Pp. 89–93, 2024, Doi: Https://Doi.Org/10.30630/Jppm.V6i2.1635.
T. Purnomo, A. Sadiqin, And R. Arvita, “Analisis Implementasi Aplikasi Pajak Coretax Dalam Meningkatkan Kepatuhan Dan Efisiensi Pelaporan Pajak Di Indonesia,” Vol. 3, No. 2, Pp. 114–118, 2025, Doi: Https://Doi.Org/10.63200/Jebmass.V3i2.181.
R. Rahmawati And N. Nurcahyani, “Coretax System Dalam Upaya Reformasi Administrasi Perpajakan, Apa Urgensinya?,” J. Financ., Vol. 6, No. 1, Pp. 1–8, Jan. 2025, Doi: 10.51977/Financia.V6i1.1980.
D. T. Attaulah And D. Soyusiawaty, “Analisis Sentimen Program Makan Siang Gratis Di Twitter/X Menggunakan Metode Bi-Lstm,” Edumatic J. Pendidik. Inform., Vol. 9, No. 1, Pp. 294–303, Apr. 2025, Doi: 10.29408/Edumatic.V9i1.29725.
A. D. H. Setiawan And W. Maharani, “Understanding Public Sentiments On The 2024 Presidential Election Through Bert-Powered Analysis,” Edumatic J. Pendidik. Inform., Vol. 9, No. 1, Pp. 89–98, Apr. 2025, Doi: 10.29408/Edumatic.V9i1.29267.
T. M. Permata Aulia, N. Arifin, And R. Mayasari, “Perbandingan Kernel Support Vector Machine (Svm) Dalam Penerapan Analisis Sentimen Vaksinisasi Covid-19,” Sintech (Science Inf. Technol. J., Vol. 4, No. 2, Pp. 139–145, Oct. 2021, Doi: 10.31598/Sintechjournal.V4i2.762.
Y. Asri, W. N. Suliyanti, D. Kuswardani, And M. Fajri, “Pelabelan Otomatis Lexicon Vader Dan Klasifikasi Naive Bayes Dalam Menganalisis Sentimen Data Ulasan Pln Mobile,” Petir, Vol. 15, No. 2, Pp. 264–275, Nov. 2022, Doi: 10.33322/Petir.V15i2.1733.
P. Ayuningtyas, S. Khomsah, And S. Sudianto, “Pelabelan Sentimen Berbasis Semi-Supervised Learning Menggunakan Algoritma Lstm Dan Gru,” Jiska (Jurnal Inform. Sunan Kalijaga), Vol. 9, No. 3, Pp. 217–229, Sep. 2024, Doi: 10.14421/Jiska.2024.9.3.217-229.
M. Jafarlou And M. M. Kubek, “Reducing Labeling Costs In Sentiment Analysis Via Semi-Supervised Learning,” In The 2024 8th International Conference On Natural Language Processing And Information Retrieval (Nlpir 2024), Okayama, Japan, 2024., Oct. 2024, Pp. 1–12.
H. Firda, P. Putra, N. R. Oktadini, P. E. Sevtiyuni, And A. Meiriza, “Comparison Of Rating-Based And InSet Lexicon-Based Labeling In Sentiment Analysis Using Svm (Case Study: Gobiz Application Reviews On Google Play Store),” Sistemasi, Vol. 14, No. 2, P. 516, Mar. 2025, Doi: 10.32520/Stmsi.V14i2.4795.
D. Musfiroh, U. Khaira, P. E. P. Utomo, And T. Suratno, “Analisis Sentimen Terhadap Perkuliahan Daring Di Indonesia Dari Twitter Dataset Menggunakan InSet Lexicon,” Malcom Indones. J. Mach. Learn. Comput. Sci., Vol. 1, No. 1, Pp. 24–33, Mar. 2021, Doi: 10.57152/Malcom.V1i1.20.
J. F. Kusuma And A. Chowanda, “Indonesian Hate Speech Detection Using Indobertweet And Bilstm On Twitter,” Joiv Int. J. Informatics Vis., Vol. 7, No. 3, Pp. 773–780, Sep. 2023, Doi: 10.30630/Joiv.7.3.1035.
H. Imaduddin, F. Y. A’la, And Y. S. Nugroho, “Sentiment Analysis In Indonesian Healthcare Applications Using Indobert Approach,” Int. J. Adv. Comput. Sci. Appl., Vol. 14, No. 8, Pp. 113–117, 2023, Doi: 10.14569/Ijacsa.2023.0140813.
F. Koto, J. H. Lau, And T. Baldwin, “Indobertweet: A Pretrained Language Model For Indonesian Twitter With Effective Domain-Specific Vocabulary Initialization,” In Proceedings Of The 2021 Conference On Empirical Methods In Natural Language Processing, Stroudsburg, Pa, Usa: Association For Computational Linguistics, 2021, Pp. 10660–10668. Doi: 10.18653/V1/2021.Emnlp-Main.833.
N. A. Semary, W. Ahmed, K. Amin, P. P?awiak, And M. Hammad, “Improving Sentiment Classification Using A RoBERTa -Based Hybrid Model,” Front. Hum. Neurosci., Vol. 17, No. December, Pp. 1–10, Dec. 2023, Doi: 10.3389/Fnhum.2023.1292010.
D. C. Febrianto, M. A. Fitriani, M. Afrad, And M. A. Khadija, “Aspect Based Sentiment Analysis Menggunakan Indobert Model Terhadap Review Pengunjung Objek Wisata Baturraden,” Melek It Inf. Technol. J., Vol. 10, No. 2, Pp. 157–166, Dec. 2024, Doi: 10.30742/Melekitjournal.V10i2.358.
H. S. Rifai, S. Febrianti, And I. Santoso, “Analisis Sentimen Tanggapan Masyarakat Terhadap Cyberbullying Di Media Sosial Menggunakan Algoritma Naïve Bayes ( Nb),” J. Ikraith-Informatika, Vol. 7, No. 2, Pp. 183–196, 2023.
D. Normawati And S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J. Sains Komput. Inform. (J-Sakti, Vol. 5, No. 2, Pp. 697–711, 2021.
H. Taofiqurrohman, W. Wufron, And F. F. Roji, “Prediksi Harga Saham Telkom Menggunakan Prophet: Analisis Pengaruh Sentimen Publik Terhadap Kehadiran Starlink,” Malcom Indones. J. Mach. Learn. Comput. Sci., Vol. 5, No. 2, Pp. 484–495, Mar. 2025, Doi: 10.57152/Malcom.V5i2.1796.
U. Khairani, V. Mutiawani, And H. Ahmadian, “Pengaruh Tahapan Preprocessing Terhadap Model Indobert Dan Indobertweet Untuk Mendeteksi Emosi Pada Komentar Akun Berita Instagram,” J. Teknol. Inf. Dan Ilmu Komput., Vol. 11, No. 4, Pp. 887–894, Aug. 2024, Doi: 10.25126/Jtiik.1148315.
M. R. Saputra And S. Agustian, “Bulletin Of Computer Science Research Klasifikasi Sentimen Pada Dataset Yang Terbatas Menggunakan Algoritma Convolutional Neural Network,” Vol. 5, No. 4, Pp. 522–531, 2025, Doi: Https://Doi.Org/10.47065/Bulletincsr.V5i4.613.
S. Adi Nugraha, “Penerapan Lexicon Based Untuk Analisis Sentimen Masyarakat Indonesia Terhadap Danantara,” Jati (Jurnal Mhs. Tek. Inform., Vol. 9, No. 3, Pp. 4949–4957, May 2025, Doi: 10.36040/Jati.V9i3.13836.
R. Firdaus, I. Asror, And A. Herdiani, “Lexicon-Based Sentiment Analysis Of Indonesian Language Student Feedback Evaluation,” Indones. J. Comput., Vol. 6, No. 1, Pp. 1–12, 2021, Doi: Https://Doi.Org/10.34818/Indojc.2021.6.1.408.
D. I. Putri, A. N. Alfian, M. Y. Putra, And P. D. Mulyo, “Indobert Model Analysis: Twitter Sentiments On Indonesia’s 2024 Presidential Election,” J. Appl. Informatics Comput., Vol. 8, No. 1, Pp. 7–12, Jul. 2024, Doi: 10.30871/Jaic.V8i1.7440.
Y. Wiciaputra, J. Young, And A. Rusli, “Bilingual Text Classification In English And Indonesian Via Transfer Learning Using Xlm-RoBERTa ,” Int. J. Adv. Soft Comput. Its Appl., Vol. 13, No. 3, Pp. 73–87, Dec. 2021, Doi: 10.15849/Ijasca.211128.06.
E. M. Pusung And I. N. Dewi, “Optimasi RoBERTa Dengan Hyperparameter Tuning Untuk Deteksi Emosi Berbasis Teks,” J. Nas. Teknol. Dan Sist. Inf., Vol. 10, No. 3, Pp. 240–248, Feb. 2025, Doi: 10.25077/Teknosi.V10i3.2024.240-248.
A. H. Wildan And V. R. S. Nastiti, “Perbandingan Kinerja Pre-Trained Indobert-Base Dan Indobert-Lite Pada Klasifikasi Sentimen Ulasan Tiktok Tokopedia Seller Center Dengan Model Indobert,” Jsii (Jurnal Sist. Informasi), Vol. 11, No. 2, Pp. 13–20, Sep. 2024, Doi: 10.30656/Jsii.V11i2.9168.
R. Illahi, S. Agustian, S. K. Riau, S. Baru, And K. Pekanbaru, “Klasifikasi Sentimen Menggunakan Bidirectional Lstm Dan Indobert Dengan Dataset Terbatas,” J. Sist. Inf., Vol. 7, No. 1, Pp. 74–84, 2025, Doi: Https://Doi.Org/10.31849/Zn.V7i1.25091.
Andriani Marshanda Putri, Widya Khafa Nofa, And Dewi Anggraini Puspa Hapsari, “Penerapan Metode Bert Untuk Analisis Sentimen Ulasan Pengguna Aplikasi Segari Di Google Play Store,” J. Ilm. Tek., Vol. 4, No. 1, Pp. 89–104, Jan. 2025, Doi: 10.56127/Juit.V4i1.1902.
S. S. Sabrina, D. F. Shiddieq, And F. F. Roji, “Comparative Analysis Of Svm And Bert For Sentiment And Sarcasm Detection In The Boycott Of Israeli Products On Platform X,” Sinkron, Vol. 9, No. 2, Pp. 872–883, May 2025, Doi: 10.33395/Sinkron.V9i2.14723.
B. Couvy-Duchesne Et Al., “Linear Mixed Models Minimise False Positive Rate And Enhance Precision Of Mass Univariate Vertex-Wise Analyses Of Grey-Matter,” In 2020 Ieee 17th International Symposium On Biomedical Imaging (Isbi), Ieee, Apr. 2020, Pp. 404–407. Doi: 10.1109/Isbi45749.2020.9098719.
R. Bold, H. Al-Khateeb, And N. Ersotelos, “Reducing False Negatives In Ransomware Detection: A Critical Evaluation Of Machine Learning Algorithms,” Appl. Sci., Vol. 12, No. 24, P. 12941, Dec. 2022, Doi: 10.3390/App122412941.
S. Riyanto, I. S. Sitanggang, T. Djatna, And T. D. Atikah, “Comparative Analysis Using Various Performance Metrics In Imbalanced Data For Multi-Class Text Classification,” Int. J. Adv. Comput. Sci. Appl., Vol. 14, No. 6, Pp. 1082–1090, 2023, Doi: 10.14569/Ijacsa.2023.01406116.
E. Richardson, R. Trevizani, J. A. Greenbaum, H. Carter, M. Nielsen, And B. Peters, “The Receiver Operating Characteristic Curve Accurately Assesses Imbalanced Datasets,” Patterns, Vol. 5, No. 6, P. 100994, Jun. 2024, Doi: 10.1016/J.Patter.2024.100994.
F. S. Mulyo, “Building A Sentiment Classification Model Using Indobert,” Medium.Com. Accessed: Apr. 29, 2025. [Online]. Available: Building A Sentiment Classification Model Using Indobert
M. R. Alfarid, “Sentiment Analysis Of Tapera Policy Using Indobertweet,” Medium.Com. Accessed: Apr. 29, 2025. [Online]. Available: Https://Medium.Com/@Ridhoalfarid95/Sentiment-Analysis-Of-Tapera-Policy-Using-Indobertweet-43c332701efe
A. Agustyawan, “Comprehensive Guide To Training Your Own Sentiment Analysis Model With RoBERTa ,” Medium.Com. Accessed: Apr. 29, 2025. [Online]. Available: Https://Arifagustyawan.Medium.Com/Comprehensive-Guide-To-Training-Your-Own-Sentiment-Analysis-Model-With-RoBERTa -0eea180d78b2
N. Rana, “Lexicon Vs. Transformer-Based Models For Sentiment Analysis,” Medium.Com. Accessed: Jul. 11, 2025. [Online]. Available: Https://Rananeha.Medium.Com/Lexicon-Vs-Transformer-Based-Models-For-Sentiment-Analysis-56a80c8f4b10
A. Kaba, “How To Use Zero-Shot Classification For Sentiment Analysis,” Towards Data Science. Accessed: Jul. 11, 2025. [Online]. Available: Https://Towardsdatascience.Com/How-To-Use-Zero-Shot-Classification-For-Sentiment-Analysis-Abf7bd47ad25/
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Agnia Suci Rizkia, Wufron Wufron, Fikri Fahru Roji

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright © by Author; Published by Institut Riset dan Publikasi Indonesia (IRPI)
This Indonesian Journal of Machine Learning and Computer Science is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.