Automated Financial Data Extraction Using Large Language Models: An Application of OpenAI Apis

by Jasrul Nizam Ghazali, Mohamad Norzamani Sahroni, Mohd Azry Abdul Malik, Mohd Muhaimin Chuweni, Sharifalillah Nordin

Published: February 26, 2026 • DOI: 10.47772/IJRISS.2026.10200120

Abstract

Financial data extraction, traditionally a manual and labour-intensive process, is being revolutionized by artificial intelligence (AI) and machine learning (ML). However, understanding financial documents remains a significant challenge for individuals without specialized financial knowledge due to complex terminology and concepts. This study addresses this gap by designing, developing, and evaluating an AI-powered financial data extraction system tailored for non-financial individuals. The system integrates Optical Character Recognition (OCR) for text extraction from document images statements, invoices, receipts) and leverages the OpenAI platform's advanced Natural Language Processing (NLP) capabilities to organize, interpret, and explain financial information in a user-friendly manner. A Waterfall development methodology was employed, encompassing requirements gathering via questionnaires with target users, system architecture design, implementation using Python libraries and OpenAI API, and rigorous testing, including functionality tests and user evaluations. Results from functionality testing confirmed the system's ability to accurately process various document types. User evaluation, involving finance staff assessing the system's potential for non-expert users, yielded overwhelmingly positive feedback, with high ratings for accuracy, usability, efficiency, and the significant impact of AI/ML integration in enhancing the depth and speed of analysis. The findings demonstrate the system's potential to improve financial literacy and empower individuals in managing personal finances by making complex financial data more accessible and understandable.