BERT-Based Grammatical Error Analysis in Indonesia Senior High School Essays

Authors

  • Syarifuddin Tundreng FKIP Universitas Sembilanbelas November Kolaka, Indonesia
  • Heri Alfian Universitas Sembilanbelas November Kolaka, Indonesia
  • Parsya Kartika Universitas Sembilanbelas November Kolaka, Indonesia
  • Azka Airin Nisa Universitas Sembilanbelas November Kolaka, Indonesia

DOI:

https://doi.org/10.33394/jollt.v14i2.18551

Keywords:

BERT-based learning model, Grammar error analysis, Automated writing evaluation, Writing skills, Natural language processing

Abstract

In high-resource languages, automated grammatical error detection has rapidly evolved; however, there are still few technologies that are comparable for Bahasa Indonesia, especially in secondary school settings. Although spelling, morphology, syntax, and diction are common problems for Indonesian senior high school students, AI-assisted feedback systems specifically designed for Indonesian writing are still in their infancy. The use of IndoBERT-base for grammatical error analysis in 82 senior high school student essays totaling 10,911 words is examined in this work. Following two expert raters' hand annotation, 1,872 grammatical mistakes were found in four different categories. Prior to analysis utilizing a refined IndoBERT-base model, the essays underwent pre-processing procedures including as tokenization, normalization, and alignment with gold-standard annotations. F1-score, which is calculated by comparing predicted labels with teacher-validated error tags, accuracy, precision, and recall were used to assess the model's performance. The model demonstrated good agreement (80%) with human raters and correctly identified 1,594 mistakes, yielding a detection rate of 85.1%. Due to their contextual and semantic complexity, syntax and diction showed reduced accuracy, whereas spelling and morphology identification showed especially good performance. These results suggest that automated grammatical analysis of Indonesian student writing can be successfully supported by transformer-based models. Nonetheless, shortcomings in managing discourse-level interdependence underscore the ongoing significance of human assessment. The study supports the incorporation of hybrid human–AI feedback systems to improve writing teaching in the classroom and advances the development of AI-assisted grammar tools for Indonesian education.

Author Biographies

Syarifuddin Tundreng, FKIP Universitas Sembilanbelas November Kolaka

Indonesian Language Education, Faculty of Teacher Training and Education, Universitas Sembilanbelas November Kolaka, Indonesia

Heri Alfian, Universitas Sembilanbelas November Kolaka

Indonesian Language Education, Faculty of Teacher Training and Education, Universitas Sembilanbelas November Kolaka, Indonesia

Parsya Kartika, Universitas Sembilanbelas November Kolaka

Indonesian Language Education, Faculty of Teacher Training and Education, Universitas Sembilanbelas November Kolaka, Indonesia

Azka Airin Nisa, Universitas Sembilanbelas November Kolaka

English Language Education, Faculty of Teacher Training and Education, Universitas Sembilanbelas November Kolaka, Indonesia

References

Abro, A. A., Talpur, M. S. H., & Jumani̇, A. K. (2023). Natural Language Processing Challenges and Issues: A Literature Review. Gazi University Journal of Science, 36(4), 1522–1536. https://doi.org/10.35378/gujs.1032517

Ahmad, A., & Why, Dr. N. K. (2024). Automated Grading Using Natural Language Processing and Semantic Analysis. SSRN. https://doi.org/10.2139/ssrn.4999531

Alharbi, W. (2023). AI in the Foreign Language Classroom: A Pedagogical Overview of Automated Writing Assistance Tools. Education Research International, 2023, 1–15. https://doi.org/10.1155/2023/4253331

Aziz, Z. A., Fitriani, S. S., & Amalina, Z. (2020). Linguistic errors made by Islamic university EFL students. Indonesian Journal of Applied Linguistics, 9(3), 735–748. https://doi.org/10.17509/ijal.v9i3.23224

Bosse, M.-L., Brissaud, C., & Le Levier, H. (2021). French Pupils’ Lexical and Grammatical Spelling from Sixth to Ninth Grade: A Longitudinal Study. Language and Speech, 64(1), 224–249. https://doi.org/10.1177/0023830920935558

Chang, C. H. C., Nastase, S. A., & Hasson, U. (2022). Information flow across the cortical timescale hierarchy during narrative construction. Proceedings of the National Academy of Sciences, 119(51), e2209307119. https://doi.org/10.1073/pnas.2209307119

Daqiqil Id, I., Saputra, H., Syamsudhuha, S., Kurniawan, R., & Andriyani, Y. (2024). Sentiment analysis of student evaluation feedback using transformer-based language models. Indonesian Journal of Electrical Engineering and Computer Science, 36(2), 1127. https://doi.org/10.11591/ijeecs.v36.i2.pp1127-1139

Dizon, G., & Gayed, J. M. (2024). A systematic review of Grammarly in L2 English writing contexts. Cogent Education, 11(1), 2397882. https://doi.org/10.1080/2331186X.2024.2397882

Ferris, D., & Eckstein, G. (2020). Language matters: Examining the language-related needs and wants of writers in a first-year university writing course. Journal of Writing Research, 12(vol. 12 issue 2), 321–364. https://doi.org/10.17239/jowr-2020.12.02.02

Hattie, J., Crivelli, J., Van Gompel, K., West-Smith, P., & Wike, K. (2021). Feedback That Leads to Improvement in Student Essays: Testing the Hypothesis that “Where to Next” Feedback is Most Powerful. Frontiers in Education, 6, 645758. https://doi.org/10.3389/feduc.2021.645758

Jazuli, A., Widowati, & Kusumaningrum, R. (2024). Optimizing Aspect-Based Sentiment Analysis Using BERT for Comprehensive Analysis of Indonesian Student Feedback. Applied Sciences, 15(1), 172. https://doi.org/10.3390/app15010172

Keller-Margulis, M. A., Mercer, S. H., & Matta, M. (2021). Validity of automated text evaluation tools for written-expression curriculum-based measurement: A comparison study. Reading and Writing, 34(10), 2461–2480. https://doi.org/10.1007/s11145-021-10153-6

Kornev, A. N., & Balčiūnienė, I. (2021). Lexical and Grammatical Errors in Developmentally Language Disordered and Typically Developed Children: The Impact of Age and Discourse Genre. Children, 8(12), 1114. https://doi.org/10.3390/children8121114

Mahdun, M., Chan, M. Y., Yap, N. T., Mohd Kasim, Z., & Wong, B. E. (2022). Production Errors and Interlanguage Development Patterns of L1 Malay ESL Learners in the Acquisition of the English Passive. Issues in Language Studies, 11(1), 74–90. https://doi.org/10.33736/ils.4023.2022

Mahmood, S. A., & Abdulsamad, M. A. (2024). Automatic assessment of short answer questions: Review. Edelweiss Applied Science and Technology, 8(6), 9158–9176. https://doi.org/10.55214/25768484.v8i6.3956

Mahriyuni, M., Pramuniati, I., & Sitinjak, D. R. (2024). Interlanguage development among the learners of Indonesian language in Paris. Indonesian Journal of Applied Linguistics, 14(1), 206–219. https://doi.org/10.17509/ijal.v14i1.70394

Mannix, I. A., & Yulianti, E. (2024). Academic expert finding using BERT pre-trained language model. International Journal of Advances in Intelligent Informatics, 10(2), 280. https://doi.org/10.26555/ijain.v10i2.1497

Nückles, M., Roelle, J., Glogger-Frey, I., Waldeyer, J., & Renkl, A. (2020). The Self-Regulation-View in Writing-to-Learn: Using Journal Writing to Optimize Cognitive Load in Self-Regulated Learning. Educational Psychology Review, 32(4), 1089–1126. https://doi.org/10.1007/s10648-020-09541-1

Özçift, A., Akarsu, K., Yumuk, F., & Söylemez, C. (2021). Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): An empirical case study for Turkish. Automatika, 62(2), 226–238. https://doi.org/10.1080/00051144.2021.1922150

Parameswari, D. A., Manickam, R., Dhas.J, J. A., Kumar, M. V., & Manikandan, A. (2024). Error Analysis in Second Language Writing: An Intervention Research. World Journal of English Language, 14(3), 130. https://doi.org/10.5430/wjel.v14n3p130

Rahmanova, G., Eksi, G. Y., Shahabitdinova, S., Nasirova, G., Sotvoldiyev, B., & Miralimova, S. (2024). Enhancing Writing Skills with Social Media-Based Corrective Feedback. World Journal of English Language, 15(1), 252. https://doi.org/10.5430/wjel.v15n1p252

Singh, S., & Mahmood, A. (2021). The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures. IEEE Access, 9, 68675–68702. https://doi.org/10.1109/ACCESS.2021.3077350

Terzioğlu, Y., & Bensen Bostanci, H. (2020). A Comparative Study of 10th Grade Turkish Cypriot Students’ Writing Errors. Sage Open, 10(1), 2158244020914541. https://doi.org/10.1177/2158244020914541

Tucudean, G., Bucos, M., Dragulescu, B., & Caleanu, C. D. (2024). Natural language processing with transformers: A review. PeerJ Computer Science, 10, e2222. https://doi.org/10.7717/peerj-cs.2222

Willis, J., Gibson, A., Kelly, N., Spina, N., Azordegan, J., & Crosswell, L. (2021). Towards faster feedback in higher education through digitally mediated dialogic loops. Australasian Journal of Educational Technology, 22–37. https://doi.org/10.14742/ajet.5977

Yulianti, E., & Nissa, N. K. (2024). ABSA of Indonesian customer reviews using IndoBERT: Single- sentence and sentence-pair classification approaches. Bulletin of Electrical Engineering and Informatics, 13(5), 3579–3589. https://doi.org/10.11591/eei.v13i5.8032

Zhang, C., Shao, Y., Yuan, Y., & Shen, W. (2025). Artificial Intelligence Reshapes Creativity: A Multidimensional Evaluation. PsyCh Journal, pchj.70042. https://doi.org/10.1002/pchj.70042

Zheng, X., & Zhang, J. (2025). The usage of a transformer based and artificial intelligence driven multidimensional feedback system in english writing instruction. Scientific Reports, 15(1), 19268. https://doi.org/10.1038/s41598-025-05026-9

Downloads

Published

2026-04-17

How to Cite

Tundreng, S., Alfian, H., Kartika, P., & Nisa, A. A. (2026). BERT-Based Grammatical Error Analysis in Indonesia Senior High School Essays. JOLLT Journal of Languages and Language Teaching, 14(2), 666–679. https://doi.org/10.33394/jollt.v14i2.18551

Issue

Section

Articles

Citation Check