PSO-Weighted Ensemble of Bags-of-Word and N-Gram Classifiers for YouTube Spam Detection

by Mohd Fairuz Iskandar Othman, Mohd Najwan Md Khambari, Mohd Zaki Mas’ud, Nor Azman Mat Ariff, Taqwan Thamrin

Published: December 23, 2025 • DOI: 10.47772/IJRISS.2025.91100545

Abstract

YouTube spam comments degrade user experience as well as increasing security and monetization risks, highlights the need for resilient automated detection system. The YouTube spam detection system has progressed from relying on single classifiers to incorporating ensemble-based system. However, current YouTube spam ensemble system typically train all base classifiers on homogeneous feature representations and rely on equal or fixed weighting schemes, which limits error diversity and prevents the ensemble from adapting to the varying strengths of individual models. This study proposed a Particle Swarm Optimization weighted ensemble that combined multiple n‑gram and BoW classifiers to build spam detection models. Six single classifiers using 1‑gram to 5‑gram character features and BoW features were combined into ensemble configurations with equal weighting and PSO‑optimized weighting, then evaluated on five YouTube spam datasets spanning Eminem, Katy Perry, LMFAO, Psy, and Shakira datasets. Results demonstrated that PSO‑weighted ensembles consistently outperformed the best single classifier on every dataset, with improvements ranging from 1.0 to 1.5 percentage points and accuracies from 91.65% to 96.79%. The all n‑grams plus BoW with PSO‑optimized weights ensemble delivered robust performance across all datasets, with PSO gains over equal weighting of 0.2 to 0.7 percentage points. These findings confirmed that combining character n‑gram and BoW features captured complementary spam patterns, and that PSO‑based weighting provided an adaptive mechanism for classifier integration. The proposed approach offered a good, generalizable solution for automated spam detection across diverse YouTube comments and social media platforms without extensive manual tuning.