Predicting Student Academic Performance with Machine Learning: A Systematic Literature Review

by I. L. Ismail, Jamal. N, M. S. Asrulsani, M. Z. A. Chek, Rinda Nariswari, Z. H. Zulkifli

Published: March 24, 2026 • DOI: 10.47772/IJRISS.2026.100300005

Abstract

Predicting student academic performance has become an essential research focus in higher education as institutions seek to improve retention rates, academic success, and educational quality. The increasing availability of educational datasets through student information systems and learning management systems provides opportunities for applying machine learning techniques to predict academic outcomes and identify at-risk students.
This study presents a systematic literature review (SLR) of machine learning approaches used for predicting student performance in higher education. The review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework to ensure transparency and replicability.
Peer-reviewed studies published between 2015 and 2025 were collected from major academic databases including Scopus, Web of Science, IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library, and Google Scholar. The screening process resulted in a final selection of relevant studies examining predictive models in educational data mining and learning analytics.
The results indicate that Random Forest, Support Vector Machines (SVM), Decision Trees, Logistic Regression, and Artificial Neural Networks are the most frequently used algorithms for student performance prediction. Several studies demonstrate predictive accuracy ranging between 70% and 95%, indicating the effectiveness of machine learning models for identifying students at risk of academic failure.
The most influential predictive features include previous academic performance, attendance records, LMS engagement, assignment submissions, and demographic characteristics. The review also identifies several research gaps, including limited use of explainable artificial intelligence, insufficient cross-institution datasets, ethical concerns related to student data, and underutilization of deep learning methods.
The findings highlight the importance of integrating predictive analytics into educational decision-making systems and developing interpretable models that support early intervention strategies in higher education.