Predicting Student Retention and Dropout Rates in Cronasia Foundation College Inc. Using Educational Data Mining and Machine Learning Regression Techniques

by Leomil Jay Duran

Published: December 18, 2025 • DOI: 10.47772/IJRISS.2025.91100456

Abstract

This study investigated the potential for machine learning (ML) and educational data mining (EDM) capabilities in predicting retention and dropout rates at Cronasia Foundation College Inc. (CFCI). Predicting student persistence is a founding element of improving and enabling retention efforts in higher education institutions. However, understanding what contributes to retention and dropout still presents complications. Hence, this study crafted predictive models via the analysis of historical academic records, levels of engagement, socio-economic level, and psychological components and examined a dataset of 9,100 student records (75% training and 25% testing). According to the performance of the models analyzed using a total of five machine learning classifiers (Decision Trees, Random Forest, Support Vector Machines, Neural Networks, and Logistic Regression), and the models have been analyzed using F1-score, recall, accuracy, and precision. The accuracy from the models we analyzed from the highest being the model that was Neural Network with 80.42%, which had precision of 0.840, recall of 0.895, and F1-score of 0.867 for retention (Class 0); and precision of 0.692, recall of 0.582, and an F1-score of 0.632 for dropout (Class 1). Random Forest and Decision Tree had similar accuracy with Random Forest's accuracy being 79.90% with an F1-score for dropout of 0.622, and Decision Tree's accuracy was 80.58%. Logistic regression performed with the lowest accuracy of 73.98%, and had poor recall associated with dropout; inducing action for a number of academic leaders to begin intervening and take responsibility in helping retain students prior to their exit point, or after their first semester. The study found that retention was most strongly related to important variables such as intrinsic academic performance, attendance, and scholarship status. The results of the study can aid in data-based decision-making in higher education by helping institutions develop focused programs to increase retention and decrease dropout.