Evaluating Visual Accuracy in AI-Generated Images of Malaysian-themed Icons
by Azahar Harun, Mohd Zaki Mohd Fadil, Tengku Shahril Norzaimi Tengku Hariffadzillah
Published: January 28, 2026 • DOI: 10.47772/IJRISS.2026.10100164
Abstract
Generative AI models are becoming widely accessible, enabling users across diverse backgrounds to create unprecedented artwork. However, this accessibility raises questions regarding the accuracy with which these models portray real-world subjects. Therefore, this paper examines three prominent generative AI models—Midjourney, DALL-E, and Stable Diffusion—to evaluate their efficacy in generating images of specific Malaysian-themed icons. Utilizing simple text prompts, the research phase was rigorously recorded and evaluated by an expert panel based on the Visual Appeal Rating Scale (VARS), encompassing eight criteria: Reliability, Consistency, Credibility, Professionalism, Aesthetics, Artistry, Harmony, and Balance. The results of the study indicate notable differences in model performance depending upon subject complexity. Midjourney emerged as the preeminent leader (Overall Mean: 3.25), exhibiting remarkable skill in culinary portrayal, attaining "Near Perfect" expert agreement on the aesthetics of the Nasi Lemak images. Stable Diffusion achieved a close second place (Overall Mean: 3.23), demonstrating proficiency in managing intricate structural geometry (Landmarks) and portraiture; yet, its elevated scores frequently coincided with "Slight" agreement, signifying considerable subjectivity in its technical performance. DALL-E was positioned third as a generalist model, yielding balanced albeit frequently contentious outcomes among specialists. A significant "Cultural Accuracy Gap" was identified across all models, wherein the representation of particular cultural icons (Politician) and intricate architecture (Landmark) was considerably more difficult than that of broad subjects (Food). DALL-E demonstrated significant inability in depicting the Malaysian politician, due to ethical concern. The study indicates that the existing generative AI models are specialized rather than universal; achieving high visual fidelity necessitates the deliberate selection of the model most appropriate for the specific aesthetic or structural requirements of the assignment.